* Pandas - High-Performance open source library for data analysis
* Process datasets of different formats - time series, tabular data, matrix data
* Import data from csv,json & database
* Provides extensive operations like slice,subset,merging, groupby,shaping etc.
* Handling missing data
* Doing statiscal analysis
* Pandas objects are consumed by scikit-learn,tensorflow

In [2]:
import pandas as pd

#### Two Data Structures
* Series - 1D NumPY array with indexed column
* DataFrame - Tabular data with hetrogenous columns

#### creation of series

In [3]:
ser1=pd.Series(data=[1,2,3,4,5], index=['A','B','C','D','E'])

In [4]:
ser1

A    1
B    2
C    3
D    4
E    5
dtype: int64

In [6]:
ser1.index

Index(['A', 'B', 'C', 'D', 'E'], dtype='object')

In [7]:
ser1.values

array([1, 2, 3, 4, 5], dtype=int64)

#### accessing elements from series object

In [15]:
ser = pd.Series(data=[5,6,6,7], index=['a','b','c','d'])

#### Access by index values
* Value is inclusive

In [16]:
ser1[:'c']

A    1
B    2
C    3
D    4
E    5
dtype: int64

#### series object with default index

In [10]:
ser2 = pd.Series([1,2,3])

In [11]:
ser2

0    1
1    2
2    3
dtype: int64

#### convert dictionary to series

In [12]:
db = {'abc':'hello','def':'yello','jkl':'good'}

In [13]:
pd.Series(db)

abc    hello
def    yello
jkl     good
dtype: object

#### Convert scalar value to series¶

In [14]:
pd.Series(0, index=['a','b','c'])

a    0
b    0
c    0
dtype: int64

#### Accessing series

In [15]:
ser1 = pd.Series(data=[5,6,6,7], index=['a','b','c','d'])

In [16]:
#Access by index values
#Value is inclusive
ser1[:'c']

a    5
b    6
c    6
dtype: int64

In [17]:
#Access by index numbers
#Index number is exclusive
ser1[:2]

a    5
b    6
dtype: int64

In [18]:
ser1['b':'d']

b    6
c    6
d    7
dtype: int64

In [19]:
#Append - Combine two series
ser1.append(ser2)

a    5
b    6
c    6
d    7
0    1
1    2
2    3
dtype: int64

#### coverting series to dictionary

In [20]:
ser1.to_dict()

{'a': 5, 'b': 6, 'c': 6, 'd': 7}

#### DataFrames
* Analogous to spreadsheet
* Collection od series
* mutable - contents changeable
* hetrogenous - different cols with different type of data

#### Create dataframe from multiple series

In [21]:
ser1 = pd.Series([100,200,300,400], index=['a','b','c','d'])

In [22]:
ser2 = pd.Series([222,333,444,555,666], index=['a','c','d','b','e'])

In [23]:
df = pd.DataFrame({
    's1':ser1,
    's2':ser2
})

In [24]:
df

Unnamed: 0,s1,s2
a,100.0,222
b,200.0,555
c,300.0,333
d,400.0,444
e,,666


In [25]:
ser3 = pd.Series(data=[1,2,3,4,5], index=['a','b','b','c','d'])

#### Access column

In [26]:
df['s1']

a    100.0
b    200.0
c    300.0
d    400.0
e      NaN
Name: s1, dtype: float64

#### Accessing with double bracket returns a dataframe

In [28]:
df[['s1','s2']]

Unnamed: 0,s1,s2
a,100.0,222
b,200.0,555
c,300.0,333
d,400.0,444
e,,666


#### Delete a column

In [29]:
del df['s1']

In [30]:
df

Unnamed: 0,s2
a,222
b,555
c,333
d,444
e,666


#### Add a new column¶

In [35]:
df['s3'] = df.s2 + 100

In [36]:
df

Unnamed: 0,s2,s3,"(s3, s4)"
a,222,322,322
b,555,655,655
c,333,433,433
d,444,544,544
e,666,766,766


In [37]:
s4 = pd.Series('hello', index=df.index)

In [38]:
s4

a    hello
b    hello
c    hello
d    hello
e    hello
dtype: object

In [39]:
df['s4'] = s4

In [44]:
del df[('s3','s4')]

In [45]:
df

Unnamed: 0,s2,s3,s4
a,222,322,hello
b,555,655,hello
c,333,433,hello
d,444,544,hello
e,666,766,hello


#### Create dataframe from numpy

In [46]:
import numpy as np

In [None]:
pd.DataFrame(np.)