### **Pandas Introduction**

**What is Pandas?**

- Pandas is a Python library used for working with data sets.

- It has functions for analyzing, cleaning, exploring, and manipulating data.

**Why Use Pandas?**

- Pandas allows us to analyze big data.

- Pandas can clean messy data sets, and make them readable and relevant.

- Relevant data is very important in data science.

**What Can Pandas Do?**

Pandas gives you answers about the data. Like:

- Is there a correlation between two or more columns?

- What is average value?

- Max value?

- Min value?

- Pandas are also able to delete rows that are not relevant, or contains wrong values, like empty or NULL values. This is called cleaning the data.

- It provides data structures like Series (1D) and DataFrame (2D) to handle tabular data.

- It’s widely used in data cleaning, transformation, and analysis.

### **Installation**

In [3]:
!pip install pandas



In [8]:
import pandas as pd

In [10]:
s = pd.Series([1, 3, 5, 7, 9])

print(s)

0    1
1    3
2    5
3    7
4    9
dtype: int64


In [13]:
data = {'Name': ['Abhay', 'Parthav', 'Kartik'], 'Age': [20, 30, 40]}

df = pd.DataFrame(data)

df

Unnamed: 0,Name,Age
0,Abhay,20
1,Parthav,30
2,Kartik,40


In [14]:
df.loc[1]

Unnamed: 0,1
Name,Parthav
Age,30


In [15]:
df.loc[0, 'Name']

'Abhay'

In [18]:
df.loc[[0, 1], ['Name', 'Age']]

Unnamed: 0,Name,Age
0,Abhay,20
1,Parthav,30


In [20]:
df.loc[df['Age'] > 21]

Unnamed: 0,Name,Age
1,Parthav,30
2,Kartik,40


In [25]:
df.iloc[:, :]

Unnamed: 0,Name,Age
0,Abhay,20
1,Parthav,30
2,Kartik,40


**USING IRIS DATASET**

In [26]:
import seaborn as sns

df = sns.load_dataset('iris')

In [27]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [28]:
df.tail()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica
149,5.9,3.0,5.1,1.8,virginica


In [29]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


In [30]:
df.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333
std,0.828066,0.435866,1.765298,0.762238
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


In [32]:
df.isnull().sum()

Unnamed: 0,0
sepal_length,0
sepal_width,0
petal_length,0
petal_width,0
species,0


In [39]:
df.loc[3, 'sepal_length'] = None

In [34]:
df.loc[2, 'sepal_length']

nan

In [35]:
df

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica


In [40]:
df.loc[3, 'petal_width'] = None

In [37]:
df.loc[2, 'petal_width']

nan

In [41]:
df.isnull().sum()

Unnamed: 0,0
sepal_length,2
sepal_width,0
petal_length,0
petal_width,2
species,0


In [42]:
df_cleaned = df.dropna()

In [43]:
df_cleaned.info()

<class 'pandas.core.frame.DataFrame'>
Index: 148 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  148 non-null    float64
 1   sepal_width   148 non-null    float64
 2   petal_length  148 non-null    float64
 3   petal_width   148 non-null    float64
 4   species       148 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.9+ KB


In [44]:
df_cleaned.isnull().sum()

Unnamed: 0,0
sepal_length,0
sepal_width,0
petal_length,0
petal_width,0
species,0


In [46]:
df.isnull().sum()

Unnamed: 0,0
sepal_length,2
sepal_width,0
petal_length,0
petal_width,2
species,0


In [47]:
df_filled = df.drop(df.columns[-1], axis=1)

In [48]:
df_filled.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,,3.2,1.3,
3,,3.1,1.5,
4,5.0,3.6,1.4,0.2


In [49]:
df_filled.isnull().sum()

Unnamed: 0,0
sepal_length,2
sepal_width,0
petal_length,0
petal_width,2


In [52]:
df_filled.fillna(df_filled.mean(), inplace=True)

In [53]:
df_filled

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,5.100000,3.5,1.4,0.200000
1,4.900000,3.0,1.4,0.200000
2,5.859459,3.2,1.3,1.212838
3,5.859459,3.1,1.5,1.212838
4,5.000000,3.6,1.4,0.200000
...,...,...,...,...
145,6.700000,3.0,5.2,2.300000
146,6.300000,2.5,5.0,1.900000
147,6.500000,3.0,5.2,2.000000
148,6.200000,3.4,5.4,2.300000


In [54]:
df_filled.isnull().sum()

Unnamed: 0,0
sepal_length,0
sepal_width,0
petal_length,0
petal_width,0


In [None]:
df_filled.fillna(df_filled.mean(), inplace=True)

In [None]:
df_filled

In [None]:
df_filled.isnull().sum()

In [56]:
setosa_df = df[df['species'] == 'setosa']

setosa_df

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,,3.2,1.3,,setosa
3,,3.1,1.5,,setosa
4,5.0,3.6,1.4,0.2,setosa
5,5.4,3.9,1.7,0.4,setosa
6,4.6,3.4,1.4,0.3,setosa
7,5.0,3.4,1.5,0.2,setosa
8,4.4,2.9,1.4,0.2,setosa
9,4.9,3.1,1.5,0.1,setosa


In [57]:
df.groupby('species').sum()

Unnamed: 0_level_0,sepal_length,sepal_width,petal_length,petal_width
species,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
setosa,241.0,171.4,73.1,11.9
versicolor,296.8,138.5,213.0,66.3
virginica,329.4,148.7,277.6,101.3


In [58]:
versicolor = df[df['species'] == 'versicolor']

versicolor.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
50,7.0,3.2,4.7,1.4,versicolor
51,6.4,3.2,4.5,1.5,versicolor
52,6.9,3.1,4.9,1.5,versicolor
53,5.5,2.3,4.0,1.3,versicolor
54,6.5,2.8,4.6,1.5,versicolor


In [None]:
df.groupby('species')['sepal_length'].mean()

In [59]:
df.sort_values(by = 'sepal_length', ascending=True)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
13,4.3,3.0,1.1,0.1,setosa
42,4.4,3.2,1.3,0.2,setosa
38,4.4,3.0,1.3,0.2,setosa
8,4.4,2.9,1.4,0.2,setosa
41,4.5,2.3,1.3,0.3,setosa
...,...,...,...,...,...
122,7.7,2.8,6.7,2.0,virginica
135,7.7,3.0,6.1,2.3,virginica
131,7.9,3.8,6.4,2.0,virginica
2,,3.2,1.3,,setosa
