# Pandas Library in Python for Data Science

Pandas is a Python Library used for working with data sets. It has functions for analyzing,cleaning,exploring and manipulating data.

In [11]:
# To use Pandas, we import:
import pandas as pd
# Since writing pandas again and again is a tedious task. Therefore we will call pandas as pd. 

### To create Dataframe from dictonary : 
1. Create Dictonary with column name as your key and row elements of that column as your values in a list.
2. Use _pd.DataFrame(dictonary name)_

In [187]:
data = {'name': ['nick', 'david', 'joe', 'ross'],'age': ['5', '10', '5', '6']}
data = pd.DataFrame(data)
data

Unnamed: 0,name,age
0,nick,5
1,david,10
2,joe,5
3,ross,6


* To extract any column to work with:
**Dataframe name[["column name"]]**

In [59]:
x=data[["age"]]
x

Unnamed: 0,age
0,5
1,10
2,5
3,6


* To extract multiple columns: **Dataframe name[["C1","C2","C3",.......,"Cn"]]**

In [61]:
y=data[["age","name"]]
y


Unnamed: 0,age,name
0,5,nick
1,10,david
2,5,joe
3,6,ross


* To extract element using its row and column index : **Dataframe name.iloc[row index,column index]**

In [63]:
data.iloc[0,1]

'5'

In [65]:
data.iloc[2]

name    joe
age       5
Name: 2, dtype: object

In [67]:
data.iloc[1,0]

'david'

In [69]:
# To slice data frame via indexing.
data.iloc[0:3,0:2]

Unnamed: 0,name,age
0,nick,5
1,david,10
2,joe,5


* To change dataframe's index: **Dataframe name.index=[list of new index]**

In [81]:
data_new =data
data_new.index=['a','b','c','d']
data_new

Unnamed: 0,name,age
a,nick,5
b,david,10
c,joe,5
d,ross,6


In [83]:
data.index=[9,10,12,13]
data

Unnamed: 0,name,age
9,nick,5
10,david,10
12,joe,5
13,ross,6


* To make any row your index: **Dataframe name.set_index("row name")**

In [156]:
data.set_index('age')

Unnamed: 0_level_0,name
age,Unnamed: 1_level_1
5,nick
10,david
5,joe
6,ross


**Q: How can we make this new data set our original data set?**

**->Using inplace function, we can replace our newly formed data set with original one.**

In [166]:
data.set_index("age",inplace=True)
data

Unnamed: 0_level_0,name
age,Unnamed: 1_level_1
5,nick
10,david
5,joe
6,ross


* To use your original data set/ reverse index changes: **Dataframe name.reset_index(inplace=True)**

In [169]:
data.reset_index(inplace=True)
data
#we have replaced the data set but set command still working which has set age as index

Unnamed: 0,age,name
0,5,nick
1,10,david
2,5,joe
3,6,ross


* To extract unique entries from a column : **Dataframe name['Column name'].unique()**

In [85]:
u=data['age'].unique()
u

array(['5', '10', '6'], dtype=object)

* To check specific condition: **dataframe name["Row/Col"] comparison operator condition**

In [93]:
t1=data["age"]>=5
t1
#it returns boolean value

9     True
10    True
12    True
13    True
Name: age, dtype: bool

* To find maximum value in row from data set: **Dataframe name.row name.max()**

In [58]:
data.age.max()

'6'

* To find minimum value from data set: **Dataframe name.row name.min()**

In [61]:
data.age.min()

'10'

* To find mean value from data set: **Dataframe name.row name.mean()**

In [90]:
# Let's try to find mean and standard deviation of a data set

In [92]:
dict1 = {'Driver': ['Hamilton', 'Vettel', 'Raikkonen',
                    'Verstappen', 'Bottas', 'Ricciardo'],
        'Points': [408, 320, 251, 249, 247, 170]}
dict1=pd.DataFrame(dict1)
dict1


Unnamed: 0,Driver,Points
0,Hamilton,408
1,Vettel,320
2,Raikkonen,251
3,Verstappen,249
4,Bottas,247
5,Ricciardo,170


* To find mean of a rows: **Dataframe name.col name.mean()**

In [97]:
dict1.Points.mean()

274.1666666666667

* To find standard deviation of a rows: **Dataframe name.col name.std()**

In [103]:
dict1.Points.std()

80.95780794133859

* To get whole column where certain condition is met: **Dataframe name[Dataframe name[Condition]]**

In [116]:
dict1[dict1.Points==dict1.Points.max()]

Unnamed: 0,Driver,Points
0,Hamilton,408


* To get specific row where condition meet:**DataFrame name[['row you want to display']][condition]**

In [122]:
dict1[["Driver"]][dict1.Points==dict1.Points.max()]
# remember: [["Row"]]

Unnamed: 0,Driver
0,Hamilton


### Using CSV and Excel Files to Work with data:

To extract csv file and use its data:
**pd.read_csv("File Path")**

To extract excel file and use its data:
**pd.read_excel("File Path")**

In [191]:
df=pd.read_excel(r"C:\Users\Rishabh\Downloads\Note.xlsx")
df

Unnamed: 0,Date,Temp,Wind,Rem
0,2025-01-01,20 F,10,Sun
1,2025-02-01,23,9,Rain
2,2025-03-01,-90,,
3,2025-04-01,30,5,Sun
4,2025-05-01,,4 kmph,
5,2025-06-01,45,,Rain


* To remove n rows while extraxting csv/excel file: **pd.read_excel("file path",skiprows=n)**


In [193]:
df_skip=pd.read_excel(r"C:\Users\Rishabh\Downloads\Note.xlsx",skiprows=2)
df_skip
# last n rows will be removed

Unnamed: 0,2025-02-01 00:00:00,23,9,Rain
0,2025-03-01,-90.0,,
1,2025-04-01,30.0,5,Sun
2,2025-05-01,,4 kmph,
3,2025-06-01,45.0,,Rain


* To read only n rows from data set: **pd.read_excel("file path",nrows=n)**

In [196]:
df_n=pd.read_excel(r"C:\Users\Rishabh\Downloads\Note.xlsx",nrows=2)
df_n
# starting n rows appear as o/p

Unnamed: 0,Date,Temp,Wind,Rem
0,2025-01-01,20 F,10,Sun
1,2025-02-01,23,9,Rain


* To find number of rows and cols of a Data set: **Dataframe name.shape**

In [120]:
df.shape
# o/p: (row,col)

(6, 4)

* To extract Statistics of Data: **DataFrame name.describe()**

In [240]:
df.describe()

Unnamed: 0,Date
count,6
mean,2025-03-17 04:00:00
min,2025-01-01 00:00:00
25%,2025-02-08 00:00:00
50%,2025-03-16 12:00:00
75%,2025-04-23 12:00:00
max,2025-06-01 00:00:00


* To extract few rows from head of dataframe: **Dataframe name.head()**

In [124]:
df.head()

Unnamed: 0,Date,Temp,Wind,Rem
0,2025-01-01,20 F,10,Sun
1,2025-02-01,23,9,Rain
2,2025-03-01,-90,,
3,2025-04-01,30,5,Sun
4,2025-05-01,,4 kmph,


* To extract n rows from head of dataframe: **Dataframe name.head(n)**

In [130]:
df.head(2)
# remember indexing starts from 0

Unnamed: 0,Date,Temp,Wind,Rem
0,2025-01-01,20 F,10,Sun
1,2025-02-01,23,9,Rain


* To extract few rows from tail of dataframe: **Dataframe name.tail()**

In [136]:
df.tail()

Unnamed: 0,Date,Temp,Wind,Rem
1,2025-02-01,23.0,9,Rain
2,2025-03-01,-90.0,,
3,2025-04-01,30.0,5,Sun
4,2025-05-01,,4 kmph,
5,2025-06-01,45.0,,Rain


* To extract n rows from tail of dataframe: **Dataframe name.tail(n)**

In [140]:
df.tail(3)

Unnamed: 0,Date,Temp,Wind,Rem
3,2025-04-01,30.0,5,Sun
4,2025-05-01,,4 kmph,
5,2025-06-01,45.0,,Rain


* Extract rows using index: **Dataframe name[starting index:ending index]**

In [143]:
df[0:6]

Unnamed: 0,Date,Temp,Wind,Rem
0,2025-01-01,20 F,10,Sun
1,2025-02-01,23,9,Rain
2,2025-03-01,-90,,
3,2025-04-01,30,5,Sun
4,2025-05-01,,4 kmph,
5,2025-06-01,45,,Rain


In [145]:
df[2:5]

Unnamed: 0,Date,Temp,Wind,Rem
2,2025-03-01,-90.0,,
3,2025-04-01,30.0,5,Sun
4,2025-05-01,,4 kmph,


* To extract entire datframe : **Dataframe name[:] / Dataframe name**

In [148]:
df[:]

Unnamed: 0,Date,Temp,Wind,Rem
0,2025-01-01,20 F,10,Sun
1,2025-02-01,23,9,Rain
2,2025-03-01,-90,,
3,2025-04-01,30,5,Sun
4,2025-05-01,,4 kmph,
5,2025-06-01,45,,Rain


In [152]:
df

Unnamed: 0,Date,Temp,Wind,Rem
0,2025-01-01,20 F,10,Sun
1,2025-02-01,23,9,Rain
2,2025-03-01,-90,,
3,2025-04-01,30,5,Sun
4,2025-05-01,,4 kmph,
5,2025-06-01,45,,Rain


* To extraxt headings/ no. of cols of a data set: **Dataframe name.columns**

In [155]:
df.columns
# its "Columns" not "Column"

Index(['Date', 'Temp', 'Wind', 'Rem'], dtype='object')

* To remove units of measurement from data set: **Dataframe name.replace({'col name1':'[A-Za-z]','col name2:'[A-za-z]',...},' ',regex=True)**


In [17]:
new_df=df.replace({'Temp':'[A-Za-z]','Wind':'[A-Za-z]'},' ',regex=True)
new_df
# Leave space b/w ' '

Unnamed: 0,Date,Temp,Wind,Rem
0,2025-01-01,20.0,10.0,Sun
1,2025-02-01,23.0,9.0,Rain
2,2025-03-01,-90.0,,
3,2025-04-01,30.0,5.0,Sun
4,2025-05-01,,4.0,
5,2025-06-01,45.0,,Rain


**Q: What happens when we don't specify column name?**

In [212]:
# It will remove all alpahbets that are present in data set.
Q= df.replace('[A-Za-z]',' ',regex=True)
Q

Unnamed: 0,Date,Temp,Wind,Rem
0,2025-01-01,20.0,10.0,
1,2025-02-01,23.0,9.0,
2,2025-03-01,-90.0,,
3,2025-04-01,30.0,5.0,
4,2025-05-01,,4.0,
5,2025-06-01,45.0,,


* To replace a list of values with another list of values: **Dataframe name.replace([Value1,Value2,...],[new value1,new value2,....])**

In [223]:
rep=df.replace([10,9,45],[1,90,54])
rep

Unnamed: 0,Date,Temp,Wind,Rem
0,2025-01-01,20 F,1,Sun
1,2025-02-01,23,90,Rain
2,2025-03-01,-90,,
3,2025-04-01,30,5,Sun
4,2025-05-01,,4 kmph,
5,2025-06-01,54,,Rain


* To replace -ve value with any other value: **Dataframe name.replace(value to be replaced, new value)**

In [15]:
df1=df.replace(-90,90)
df1

Unnamed: 0,Date,Temp,Wind,Rem
0,2025-01-01,20 F,10,Sun
1,2025-02-01,23,9,Rain
2,2025-03-01,90,,
3,2025-04-01,30,5,Sun
4,2025-05-01,,4 kmph,
5,2025-06-01,45,,Rain


* To convert dataset into csv/excel file: **Dataframe name.to_excel/csv("new dataframe name")**

* To remove default Jupyter index from our csv/excel file while converting: **Dataframe name.to_excel/csv("new dataframe name",index=False)**

* To get specific cols in your dataset while converting: **Dataframe name.to_excel/csv("new file name",columns=['C1','C2',...,'Cn'])**

* To avoid header while converting:**Dataframe name.to_excel/csv("new dataframe name",header=False)** 