# Getting Started with Pandas

Pandas is a great library for dealing with the tabular content. It is used extensively in data analysis. \\ 
Let's <code>import</code> te Pandas library.

In [42]:
import pandas as pd 
from pandas import Series , DataFrame

### Series

Series is a 1D array like object. We use the syntax <code>pd.series(<*object*>)</code>. Whenever we declare a series they are stored as the values and their indexes by default those indexes are 0,1,2,3 etc. </br>
Let's see an example for it 

In [43]:
List=["ram","shyam","kanha"]
obj=pd.Series(List)
obj

0      ram
1    shyam
2    kanha
dtype: object

Let's access the indexes and the values seperately. </br>
To get indexes we use <code>obj.indexes</code>and to get values we use <code>obj.values</code>.

In [44]:
print(obj.values)
print(obj.index)

['ram' 'shyam' 'kanha']
RangeIndex(start=0, stop=3, step=1)


We can assign indexes with our choice by passing the labels to the <code>index</code> attribute.

In [45]:
obj2=pd.Series(["ram","shyam","kanha"],index=["first friend","second friend","third friend"])
print(obj2)

first friend       ram
second friend    shyam
third friend     kanha
dtype: object


In [46]:
obj2=pd.Series(["ram","shyam","kanha"],index=["first friend","second friend","third friend"])
print(obj2)

first friend       ram
second friend    shyam
third friend     kanha
dtype: object


We can explicitly input the index and the values and we can also edit them.

In [47]:
#Accessing the value 
obj2["third friend"]
obj2["forth friend"]="kanhaiya"
obj2

first friend          ram
second friend       shyam
third friend        kanha
forth friend     kanhaiya
dtype: object

Let us define a series **Num** and see about the **boolean indexing**.

In [48]:
Num=pd.Series([2,4,6,-4,-2,-8,8,5],index=["a","b","c","d","e","f","g","h"])
Num

a    2
b    4
c    6
d   -4
e   -2
f   -8
g    8
h    5
dtype: int64

In [49]:
Num[Num>0]

a    2
b    4
c    6
g    8
h    5
dtype: int64

Let update the values which are less than 0 by 0.

In [50]:
Num[Num<0]=0
Num

a    2
b    4
c    6
d    0
e    0
f    0
g    8
h    5
dtype: int64

Let's make it more interesting by operating the series with scalars

In [51]:
Art=pd.Series([23,45,53,67,23,41])
print(Art+5)
print(Art*Art)
print(Art**2)
print(1/Art)

0    28
1    50
2    58
3    72
4    28
5    46
dtype: int64
0     529
1    2025
2    2809
3    4489
4     529
5    1681
dtype: int64
0     529
1    2025
2    2809
3    4489
4     529
5    1681
dtype: int64
0    0.043478
1    0.022222
2    0.018868
3    0.014925
4    0.043478
5    0.024390
dtype: float64


We can define Series by a fixed length ordered dictionary mapping from **indexes** to **values**.

NumPy mathematical, boolean functions and operations can be used on an object. 

## Dictionary to a series

As we have seen earlier that series can be defined as a fixed length orderered dictionary mapped from indexes to the values. </br> Let's create a series by a dictionary.

In [52]:
data={"first friend":"Ram","second friend":"shyam","third friend":"rakesh"}
sdata=pd.Series(data)
sdata

first friend        Ram
second friend     shyam
third friend     rakesh
dtype: object

We can update the indexes of the indexes of the series in the order we want 

In [53]:
sdata=pd.Series(data,index=["second friend","first friend", "third friend"])
sdata

second friend     shyam
first friend        Ram
third friend     rakesh
dtype: object

In [54]:
sdata=pd.Series(data,index=["first friend","second friend","third friend","forth friend"])
sdata

first friend        Ram
second friend     shyam
third friend     rakesh
forth friend        NaN
dtype: object

In [55]:
sdata["forth friend"]="Rajesh"

In [56]:
sdata

first friend        Ram
second friend     shyam
third friend     rakesh
forth friend     Rajesh
dtype: object

As we saw the some values are <code>nan</code>. During data analysis it is important sometimes that what values are missing. To find those values we can use the syntax like <code>pd.isnull(<*series*>)</code> or <code>pd.notnull(<*series*>)</code>.

### Name Attribute

In [57]:
sdata.name="Friendship"
sdata.index.name="Serial"

In [58]:
sdata

Serial
first friend        Ram
second friend     shyam
third friend     rakesh
forth friend     Rajesh
Name: Friendship, dtype: object

## DataFrame

DataFrame is the rectangular table of data. a collection of data and has both rows and columns as indexes. </br>
So this is a tabular collection of data. Let's understand how this DataFrame looks like. This DataFrame can be seen like a dictionary where each key is containing a value which is  itself a series.

In [59]:
data_dict={"lovinish":[121,"soni","delhi"],"asmit":[123,"tripathi","lucknow"],"subhrajyoti":[134,"adhikari","bengal"]}
df=pd.DataFrame(data_dict)
df

Unnamed: 0,lovinish,asmit,subhrajyoti
0,121,123,134
1,soni,tripathi,adhikari
2,delhi,lucknow,bengal


<code><*DataFrame*>.head()</code>- This displays the first 5 rows of the DataFrame.</br>
 We can give **column** name by oursleves using this syntax <code>pd.DataFrame(data,columns=<*list of column name*>)</code> </br>
 Another way where we can declare the **index** too is the syntax <code>pd.DataFrame(data,columns=<*list of column name*>,index=<*list of index name*>)

* <code><*dataframe*>.columns</code>- The syntax gives the name of the columns of the datframe.
* <code><*dataframe*>[<*attribute*>]</code>- By this method new columns can be created in a dataframe.


In [60]:
print(df.columns)
print(df.index)

Index(['lovinish', 'asmit', 'subhrajyoti'], dtype='object')
RangeIndex(start=0, stop=3, step=1)


In [71]:
students=pd.DataFrame(data_dict, index=["RNo","Surname","Location"])
print(students)
print(students.columns)
print(students.index)

         lovinish     asmit subhrajyoti
RNo           121       123         134
Surname      soni  tripathi    adhikari
Location    delhi   lucknow      bengal
Index(['lovinish', 'asmit', 'subhrajyoti'], dtype='object')
Index(['RNo', 'Surname', 'Location'], dtype='object')


In [72]:
#Adding new column
students["Sagar"]=["122","Bisht","Uttarakhand"]
students

Unnamed: 0,lovinish,asmit,subhrajyoti,Sagar
RNo,121,123,134,122
Surname,soni,tripathi,adhikari,Bisht
Location,delhi,lucknow,bengal,Uttarakhand


In [73]:
#Accessing a column
students.Sagar

RNo                 122
Surname           Bisht
Location    Uttarakhand
Name: Sagar, dtype: object

In [74]:
#Accessing a row by .loc
students.loc["Location"]

lovinish             delhi
asmit              lucknow
subhrajyoti         bengal
Sagar          Uttarakhand
Name: Location, dtype: object

In [75]:
#Assigning a scalar value to a row
students.loc["Year of admission"]=2023
students

Unnamed: 0,lovinish,asmit,subhrajyoti,Sagar
RNo,121,123,134,122
Surname,soni,tripathi,adhikari,Bisht
Location,delhi,lucknow,bengal,Uttarakhand
Year of admission,2023,2023,2023,2023


In [76]:
#Assigning an array to a new column
import numpy as np
students["entry"]=np.arange(4)
students.name="Student"
students.index.name="Details"
students

Unnamed: 0_level_0,lovinish,asmit,subhrajyoti,Sagar,entry
Details,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
RNo,121,123,134,122,0
Surname,soni,tripathi,adhikari,Bisht,1
Location,delhi,lucknow,bengal,Uttarakhand,2
Year of admission,2023,2023,2023,2023,3


In [80]:
students=students.T
students

Details,RNo,Surname,Location,Year of admission
lovinish,121,soni,delhi,2023
asmit,123,tripathi,lucknow,2023
subhrajyoti,134,adhikari,bengal,2023
Sagar,122,Bisht,Uttarakhand,2023
entry,0,1,2,3


In [82]:
#Using boolean inputs
#students["late roll no"]=int(students.RNo)>122

TypeError: cannot convert the series to <class 'int'>

In case we dont want to change the original data then we need to copy the DataFrame by the syntax <code><*DataFrame*>.copy</copy>.

### Another form of DataFrame - Nested dictionary of dictionaries

In [84]:
#DataFrame of nested dictionary of dictionaries
dStu={"sagar":{"Rno":33,"year":"first","age":18},"lovinish":{"Rno":35,"year":"second","age":19},"asmit":{"Rno":36,"year":"third","age":20}}
Stu=pd.DataFrame(dStu)
Stu

Unnamed: 0,sagar,lovinish,asmit
Rno,33,35,36
year,first,second,third
age,18,19,20


In [88]:
#Specifying an explicit index
Stud=DataFrame(dStu,index=["year","age","Rno","surname"])
Stud




Unnamed: 0,sagar,lovinish,asmit
year,first,second,third
age,18,19,20
Rno,33,35,36
surname,,,


In [90]:
Stude=Stud.T
Stude

Unnamed: 0,year,age,Rno,surname
sagar,first,18,33,
lovinish,second,19,35,
asmit,third,20,36,


### Dictionary of Series

In [92]:
pdata={"year":Stude["sagar"][-1],"age":Stude["lovinish"],"Rno":Stude["asmit"]}
Stude_df=pd.DataFrame(pdata)

KeyError: 'sagar'

## Essential Functionality

### Reindexing 

We can reindex a series or DataFrame by the syntax <code><*series*2>=<*series*>.reindex[_,_,_,_]</code>

In [98]:
obj=pd.Series(["ram","shyam","rohit"],index=["a","b","c"])
print(obj)
obj2=obj.reindex(["b","a","c"])
obj2

a      ram
b    shyam
c    rohit
dtype: object


b    shyam
a      ram
c    rohit
dtype: object

![img](p1.png)

## Dropping entry from an axis

In [104]:
students

Details,RNo,Surname,Location,Year of admission
lovinish,121,soni,delhi,2023
asmit,123,tripathi,lucknow,2023
subhrajyoti,134,adhikari,bengal,2023
Sagar,122,Bisht,Uttarakhand,2023
entry,0,1,2,3


In [106]:
#Dropping a row
students.drop("entry")

Details,RNo,Surname,Location,Year of admission
lovinish,121,soni,delhi,2023
asmit,123,tripathi,lucknow,2023
subhrajyoti,134,adhikari,bengal,2023
Sagar,122,Bisht,Uttarakhand,2023


In [108]:
#Dropping a column 
students.drop("Year of admission",axis=1)

Details,RNo,Surname,Location
lovinish,121,soni,delhi
asmit,123,tripathi,lucknow
subhrajyoti,134,adhikari,bengal
Sagar,122,Bisht,Uttarakhand
entry,0,1,2


In case if you want to change the actual value we use the keyword <code>inplace=True</code>.

## Indexing, Selection and Filtering

Indexing in pandas is same as that of NumPy.

### .loc and .iloc

In [114]:
#Accessing the value by .loc
students.loc["asmit","Location"]

'lucknow'

In [115]:
#Accessing the value by .iloc
students.iloc[1,2]

'lucknow'

### Indexing functions with slices

In [118]:
students.loc[:'subhrajyoti',:"Location"]

Details,RNo,Surname,Location
lovinish,121,soni,delhi
asmit,123,tripathi,lucknow
subhrajyoti,134,adhikari,bengal


## Arithmetic and data alignment

In [120]:
df1=pd.DataFrame(np.arange(9).reshape((3,3)),columns=list("bcd"),index=["p","q","r"])
df2=pd.DataFrame(np.arange(9).reshape((3,3)),columns=list("ecd"),index=["s","q","r"])
print(df1)
print(df2)
print(df1+df2)

   b  c  d
p  0  1  2
q  3  4  5
r  6  7  8
   e  c  d
s  0  1  2
q  3  4  5
r  6  7  8
    b     c     d   e
p NaN   NaN   NaN NaN
q NaN   8.0  10.0 NaN
r NaN  14.0  16.0 NaN
s NaN   NaN   NaN NaN


In [121]:
df1.add(df2,fill_value=0)

Unnamed: 0,b,c,d,e
p,0.0,1.0,2.0,
q,3.0,8.0,10.0,3.0
r,6.0,14.0,16.0,6.0
s,,1.0,2.0,0.0
