# Pandas - Python Data Analysis Library

### What is Pandas?
Pandas is Python's version of Excel but much faster once we get the hang of it

### Why should we learn Pandas?
Pandas provides the easiest way to work through large data sets and visualize patterns

The real power of Pandas is in data cleanup and data visualization

Works with a variety of data sources, most common is CSV

### What do I need to know?
Series vs Data Frame

Accessing Data

Data manipulation and cleanup

Data Visualization with MatPlotLib, Chart Studio, Seaborn, Cufflinks and Plotly

### Getting Started with Pandas

Pandas has 2 main ways to look at data, Series and DataFrame

`
pip install pandas
`

`
import pandas as pd
`

###### Series
Series is a single Column with multiple rows

One-Dimensional Array, ie list converted to column

`pd.Series(list1, index)`

###### Data Frame
DataFrame is comprised of mulitple columns and multiple rows

Two-Dimensional Array, ie dictionary or nested list converted to a table

`pd.DataFrame(data, index, columns)`


### Series

###### Create a series
Create a Pandas Series (Single Column)

With Lists

`
data = [10,20,30]
index = ['a','b','c']
pd.Series(data=data, index=index)
`

With Dictionary (automatically extracts labels)

`
data = {'a': 10, 'b': 20, 'c': 30}
pd.Series(data=data)
`

Output:

`
a    10
b    20
c    30
`

###### Indexing

Most all functions in Pandas are done off the index so it's good to know

`
ser1 = pd.Series([1,2,3,4], ['Red','Green','Yellow','Blue'])
ser1['Red']
ser1[0]
`

###### Combining Series
Can use Mathematical operations +, - , * , /

For indexes that don't exist, Pandas will fill it in with a Nan (from numpy)

`
ser1 = pd.Series([1,2,3,4], ['Red','Green','Yellow','Blue'])
ser2 = pd.Series([1,2,3,4], ['Red','Green','Yellow','Purple'])
`

`
print(ser1 + ser2)
Blue      NaN
Green     4.0
Purple    NaN
Red       2.0
Yellow    6.0
`


In [117]:
import pandas as pd
import numpy as np

In [63]:
# Creating a Series with a List

data = [10,20,30]
ser = pd.Series(data=data)

In [64]:
ser

0    10
1    20
2    30
dtype: int64

In [65]:
ser[0]

10

In [69]:
# Creating a Series with a Dictionary

data = {'a':10, 'b':20,'c':30}
ser_dictionary = pd.Series(data=data)

In [75]:
ser_dictionary['c']

30

In [84]:
# Indexing with a list data set

data = [10.0,20.0,30.0]
index = ['a','b','c']
ser = pd.Series(data=data, index=index, dtype=int, name='First Series')

In [85]:
ser

a    10
b    20
c    30
Name: First Series, dtype: int64

In [132]:
# Indexing

ser1 = pd.Series([1,2,3,4], ['Blue', 'Red', 'Orange', 'Green'])
ser2 = pd.Series([10,20,30,40], ['Blue', 'Red', 'Orange', 'Purple'])

In [133]:
ser1[0]

1

In [134]:
ser2[1:3]

Red       20
Orange    30
dtype: int64

In [135]:
# Combining with Math operations

type((ser1 / ser2)['Green'])


numpy.float64

In [136]:
ser1 + ser2

Blue      11.0
Green      NaN
Orange    33.0
Purple     NaN
Red       22.0
dtype: float64

In [120]:
print(type(np.nan))

<class 'float'>


In [128]:
ser_no_index_1 = pd.Series([1,2,3,4,5])
ser_no_index_2 = pd.Series([10,20,30,40])

In [129]:
ser_no_index_1

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [130]:
ser_no_index_2

0    10
1    20
2    30
3    40
dtype: int64

In [131]:
ser_no_index_1 + ser_no_index_2

0    11.0
1    22.0
2    33.0
3    44.0
4     NaN
dtype: float64

In [141]:
concat_series = pd.concat([ser1, ser2], axis=1)

In [142]:
concat_series

Unnamed: 0,0,1
Blue,1.0,10.0
Red,2.0,20.0
Orange,3.0,30.0
Green,4.0,
Purple,,40.0


In [143]:
type(concat_series)

pandas.core.frame.DataFrame