# ***Day-10***
![PKC_2023](day_10.jpg)

## **Subject: Pandas in 10 minutes on Iris Dataset**
 **Author: Kaleem Ullah**\
 **Date:** 2023-01-10\
 **Email:** kaleemrao417@outlook.com

## **Contents**
[01- What is iris](#q-what-is-iris)\
[02- What is iris dataset](#q-what-is-iris-dataset)\
[03- My aim in this Notebook](#q-my-aim-in-this-notebook)\
[04- Importing libraries and dataset](#step-1-import-libraries-and-dataset)\
[05- Basic information](#step-2-see-the-basic-section)\
[06- Columns](#step-3-columns-in-data)\
[07- Conversion dataframe in arrays](#step-4-convert-dataframe-into-numpy-arrays)\
[08- Type of data](#step-5-type-of-dataframe)\
[09- Shape of data](#step-6-shape-of-dataframe)\
[10- Info of data](#step-7-info-of-dataframe)\
[11- Stats of data](#step-8-stats-in-dataframe)\
[12- Transpose the data](#step-9-transpose-the-dataframe)\
[13- Sort data by Axis](#step-10-sort-by-axis)\
[14- Sort data by Value](#step-11-sort-by-values)\
[15- Slecting Single Column](#step-12-selecting-a-single-column)\
[16- Slecting Multiple Columns](#step-13-selecting-multiple-columns)\
[17- Slicing Rows](#step-14-slicing-rows)\
[18- Slicing at Index](#step-15-slicing-at-specific-indexposition)\
[19- Slicing at Index Range](#step-16-slicing-multiple-rows-and-column)\
[20- Boolian Indexing](#step-17-boolian-indexing)\
[21- Filtering](#step-18-filtering-by-isin)\
[22- Setting](#step-19-setting)\
[23- Creating NaN Values](#step-20-creating-missing-data)\
[24- Dropping Nan Values](#step-21-dropping-missing-data)\
[25- Reset Index](#step-22-reset-index-after-dropping-nan)\
[26- Filling NaN Values](#step-23-filling-missing-data)

### **Q. What is Iris**
**Ans.**\
iris, (genus Iris), genus of about 300 species of plants in the family Iridaceae, including some of the world’s most popular and varied garden flowers. The diversity of the genus is centred in the north temperate zone, though some of its most handsome species are native to the Mediterranean and central Asian areas.
![iris](iris.jpg)

### **Q. What is Iris Dataset**
**Ans.**
* The Iris flower data set or Fisher’s Iris data set is a multivariate data set introduced by the British statistician and biologist **Ronald Fisher** in his **1936** paper The use of multiple measurements in taxonomic problems as an example of linear discriminant analysis.
* The data set consists of 50 samples from each of three species of Iris **(Iris setosa, Iris virginica and Iris versicolor)**. Four features were measured from each sample: the length and the width of the sepals and petals, in centimetres. Based on the combination of these four features, Fisher developed a linear discriminant model to distinguish the species from each other.

### **Q. My aim in this notebook**
**Ans.**\
I am going to use Pandas to analyze the Iris Dataset following 10 minutes tutorial of [Pandas](https://pandas.pydata.org/docs/user_guide/10min.html)

### **Step-1: Import Libraries and Dataset**

In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
df = sns.load_dataset('iris')

### **Step-2: See the Basic section**

In [15]:
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa


In [16]:
df.tail()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica
149,5.9,3.0,5.1,1.8,virginica


### **Step-3: Columns in data**

In [17]:
df.columns

Index(['sepal_length', 'sepal_width', 'petal_length', 'petal_width',
       'species'],
      dtype='object')

### **Step-4: Convert Dataframe into Numpy Arrays**

In [18]:
df.to_numpy()

array([[5.1, 3.5, 1.4, 0.2, 'setosa'],
       [4.9, 3.0, 1.4, 0.2, 'setosa'],
       [4.7, 3.2, 1.3, 0.2, 'setosa'],
       [4.6, 3.1, 1.5, 0.2, 'setosa'],
       [5.0, 3.6, 1.4, 0.2, 'setosa'],
       [5.4, 3.9, 1.7, 0.4, 'setosa'],
       [4.6, 3.4, 1.4, 0.3, 'setosa'],
       [5.0, 3.4, 1.5, 0.2, 'setosa'],
       [4.4, 2.9, 1.4, 0.2, 'setosa'],
       [4.9, 3.1, 1.5, 0.1, 'setosa'],
       [5.4, 3.7, 1.5, 0.2, 'setosa'],
       [4.8, 3.4, 1.6, 0.2, 'setosa'],
       [4.8, 3.0, 1.4, 0.1, 'setosa'],
       [4.3, 3.0, 1.1, 0.1, 'setosa'],
       [5.8, 4.0, 1.2, 0.2, 'setosa'],
       [5.7, 4.4, 1.5, 0.4, 'setosa'],
       [5.4, 3.9, 1.3, 0.4, 'setosa'],
       [5.1, 3.5, 1.4, 0.3, 'setosa'],
       [5.7, 3.8, 1.7, 0.3, 'setosa'],
       [5.1, 3.8, 1.5, 0.3, 'setosa'],
       [5.4, 3.4, 1.7, 0.2, 'setosa'],
       [5.1, 3.7, 1.5, 0.4, 'setosa'],
       [4.6, 3.6, 1.0, 0.2, 'setosa'],
       [5.1, 3.3, 1.7, 0.5, 'setosa'],
       [4.8, 3.4, 1.9, 0.2, 'setosa'],
       [5.0, 3.0, 1.6, 0.

### **Step-5: Type of dataframe**

In [19]:
df.dtypes

sepal_length    float64
sepal_width     float64
petal_length    float64
petal_width     float64
species          object
dtype: object

### **Step-6: Shape of dataframe**

In [20]:
df.shape

(150, 5)

### **Step-7: Info of dataframe**

In [14]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


### **Step-8: Stats in dataframe**

In [21]:
df.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
count,150.0,150.0,150.0,150.0
mean,5.843333,3.057333,3.758,1.199333
std,0.828066,0.435866,1.765298,0.762238
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


### **Step-9: Transpose the dataframe**

In [22]:
df.T

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,140,141,142,143,144,145,146,147,148,149
sepal_length,5.1,4.9,4.7,4.6,5.0,5.4,4.6,5.0,4.4,4.9,...,6.7,6.9,5.8,6.8,6.7,6.7,6.3,6.5,6.2,5.9
sepal_width,3.5,3.0,3.2,3.1,3.6,3.9,3.4,3.4,2.9,3.1,...,3.1,3.1,2.7,3.2,3.3,3.0,2.5,3.0,3.4,3.0
petal_length,1.4,1.4,1.3,1.5,1.4,1.7,1.4,1.5,1.4,1.5,...,5.6,5.1,5.1,5.9,5.7,5.2,5.0,5.2,5.4,5.1
petal_width,0.2,0.2,0.2,0.2,0.2,0.4,0.3,0.2,0.2,0.1,...,2.4,2.3,1.9,2.3,2.5,2.3,1.9,2.0,2.3,1.8
species,setosa,setosa,setosa,setosa,setosa,setosa,setosa,setosa,setosa,setosa,...,virginica,virginica,virginica,virginica,virginica,virginica,virginica,virginica,virginica,virginica


### **Step-10: Sort by axis**

* **axis=1** means sort by columns

In [23]:
df.sort_index(axis=1, ascending=False)

Unnamed: 0,species,sepal_width,sepal_length,petal_width,petal_length
0,setosa,3.5,5.1,0.2,1.4
1,setosa,3.0,4.9,0.2,1.4
2,setosa,3.2,4.7,0.2,1.3
3,setosa,3.1,4.6,0.2,1.5
4,setosa,3.6,5.0,0.2,1.4
...,...,...,...,...,...
145,virginica,3.0,6.7,2.3,5.2
146,virginica,2.5,6.3,1.9,5.0
147,virginica,3.0,6.5,2.0,5.2
148,virginica,3.4,6.2,2.3,5.4


* **axis=0** means sort by row

In [24]:
df.sort_index(axis=0, ascending=False)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
149,5.9,3.0,5.1,1.8,virginica
148,6.2,3.4,5.4,2.3,virginica
147,6.5,3.0,5.2,2.0,virginica
146,6.3,2.5,5.0,1.9,virginica
145,6.7,3.0,5.2,2.3,virginica
...,...,...,...,...,...
4,5.0,3.6,1.4,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa


### **Step-11: Sort by values**

In [26]:
df.sort_values(by='sepal_length', ascending=False)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
131,7.9,3.8,6.4,2.0,virginica
135,7.7,3.0,6.1,2.3,virginica
122,7.7,2.8,6.7,2.0,virginica
117,7.7,3.8,6.7,2.2,virginica
118,7.7,2.6,6.9,2.3,virginica
...,...,...,...,...,...
41,4.5,2.3,1.3,0.3,setosa
42,4.4,3.2,1.3,0.2,setosa
38,4.4,3.0,1.3,0.2,setosa
8,4.4,2.9,1.4,0.2,setosa


### **Step-12: Selecting a single column**

In [27]:
df['sepal_length']

0      5.1
1      4.9
2      4.7
3      4.6
4      5.0
      ... 
145    6.7
146    6.3
147    6.5
148    6.2
149    5.9
Name: sepal_length, Length: 150, dtype: float64

### **Step-13: Selecting Multiple Columns**

In [32]:
df[['sepal_length', 'sepal_width']]

Unnamed: 0,sepal_length,sepal_width
0,5.1,3.5
1,4.9,3.0
2,4.7,3.2
3,4.6,3.1
4,5.0,3.6
...,...,...
145,6.7,3.0
146,6.3,2.5
147,6.5,3.0
148,6.2,3.4


### **Step-14: Slicing rows**

In [29]:
df[22:29]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
22,4.6,3.6,1.0,0.2,setosa
23,5.1,3.3,1.7,0.5,setosa
24,4.8,3.4,1.9,0.2,setosa
25,5.0,3.0,1.6,0.2,setosa
26,5.0,3.4,1.6,0.4,setosa
27,5.2,3.5,1.5,0.2,setosa
28,5.2,3.4,1.4,0.2,setosa


### **Step-15: Slicing at Specific index(Position)**

In [37]:
df.iloc[27]

sepal_length       5.2
sepal_width        3.5
petal_length       1.5
petal_width        0.2
species         setosa
Name: 27, dtype: object

### **Step-16: Slicing multiple rows and column**

In [35]:
df.iloc[26:32, 0:3]

Unnamed: 0,sepal_length,sepal_width,petal_length
26,5.0,3.4,1.6
27,5.2,3.5,1.5
28,5.2,3.4,1.4
29,4.7,3.2,1.6
30,4.8,3.1,1.6
31,5.4,3.4,1.5


### **Step-17: Boolian indexing**

In [38]:
df[df['sepal_length'] > 7]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
102,7.1,3.0,5.9,2.1,virginica
105,7.6,3.0,6.6,2.1,virginica
107,7.3,2.9,6.3,1.8,virginica
109,7.2,3.6,6.1,2.5,virginica
117,7.7,3.8,6.7,2.2,virginica
118,7.7,2.6,6.9,2.3,virginica
122,7.7,2.8,6.7,2.0,virginica
125,7.2,3.2,6.0,1.8,virginica
129,7.2,3.0,5.8,1.6,virginica
130,7.4,2.8,6.1,1.9,virginica


In [44]:
df.iloc[:, 0:4] > 3

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width
0,True,True,False,False
1,True,False,False,False
2,True,True,False,False
3,True,True,False,False
4,True,True,False,False
...,...,...,...,...
145,True,False,True,False
146,True,False,True,False
147,True,False,True,False
148,True,True,True,False


### **Step-18: Filtering by isin( )**

In [45]:
df[df['species'].isin(['setosa', 'virginica'])]

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,setosa
1,4.9,3.0,1.4,0.2,setosa
2,4.7,3.2,1.3,0.2,setosa
3,4.6,3.1,1.5,0.2,setosa
4,5.0,3.6,1.4,0.2,setosa
...,...,...,...,...,...
145,6.7,3.0,5.2,2.3,virginica
146,6.3,2.5,5.0,1.9,virginica
147,6.5,3.0,5.2,2.0,virginica
148,6.2,3.4,5.4,2.3,virginica


### **Step-19: Setting**

* **Setting a new column automatically aligns the data by the indexes**

In [122]:
df['new_column'] = np.arange(150)
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,new_column
0,5.1,3.5,1.4,0.2,setosa,0
1,4.9,3.0,1.4,0.2,setosa,1
2,4.7,3.2,1.3,0.2,setosa,2
3,4.6,3.1,1.5,0.2,setosa,3
4,5.0,3.6,1.4,0.2,setosa,4


* Setting values by label

In [123]:
df.iat[3, 5] = 100                          # iat specify only 1 position
df.head()                                   # unlike iloc, which specify a range

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,new_column
0,5.1,3.5,1.4,0.2,setosa,0
1,4.9,3.0,1.4,0.2,setosa,1
2,4.7,3.2,1.3,0.2,setosa,2
3,4.6,3.1,1.5,0.2,setosa,100
4,5.0,3.6,1.4,0.2,setosa,4


### **Step-20: Creating Missing Data**

In [124]:
df.loc[0:3, 'sepal_length'] = np.nan; df.loc[4:7, 'sepal_width'] = np.nan
df.head(8)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,new_column
0,,3.5,1.4,0.2,setosa,0
1,,3.0,1.4,0.2,setosa,1
2,,3.2,1.3,0.2,setosa,2
3,,3.1,1.5,0.2,setosa,100
4,5.0,,1.4,0.2,setosa,4
5,5.4,,1.7,0.4,setosa,5
6,4.6,,1.4,0.3,setosa,6
7,5.0,,1.5,0.2,setosa,7


* **Checking Missing Values**

In [125]:
df.isnull().sum()

sepal_length    4
sepal_width     4
petal_length    0
petal_width     0
species         0
new_column      0
dtype: int64

### **Step-21: Dropping Missing Data**

In [126]:
df.dropna(axis=0, inplace=True)
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,new_column
8,4.4,2.9,1.4,0.2,setosa,8
9,4.9,3.1,1.5,0.1,setosa,9
10,5.4,3.7,1.5,0.2,setosa,10
11,4.8,3.4,1.6,0.2,setosa,11
12,4.8,3.0,1.4,0.1,setosa,12


### **Step-22: Reset-Index after dropping NaN**

In [127]:
df.reset_index(drop=True, inplace=True)
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,new_column
0,4.4,2.9,1.4,0.2,setosa,8
1,4.9,3.1,1.5,0.1,setosa,9
2,5.4,3.7,1.5,0.2,setosa,10
3,4.8,3.4,1.6,0.2,setosa,11
4,4.8,3.0,1.4,0.1,setosa,12


In [128]:
df.tail()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,new_column
137,6.7,3.0,5.2,2.3,virginica,145
138,6.3,2.5,5.0,1.9,virginica,146
139,6.5,3.0,5.2,2.0,virginica,147
140,6.2,3.4,5.4,2.3,virginica,148
141,5.9,3.0,5.1,1.8,virginica,149


### **Step-23: Filling Missing Data**

* **Create Missing Values**

In [129]:
df.loc[0:3, 'sepal_length'] = np.nan; df.loc[4:7, 'sepal_width'] = np.nan
df.head(8)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,new_column
0,,2.9,1.4,0.2,setosa,8
1,,3.1,1.5,0.1,setosa,9
2,,3.7,1.5,0.2,setosa,10
3,,3.4,1.6,0.2,setosa,11
4,4.8,,1.4,0.1,setosa,12
5,4.3,,1.1,0.1,setosa,13
6,5.8,,1.2,0.2,setosa,14
7,5.7,,1.5,0.4,setosa,15


* **Filling Missing Data with Mean**

In [130]:
df.describe()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,new_column
count,138.0,138.0,142.0,142.0,142.0
mean,5.925362,3.022464,3.888028,1.253521,78.5
std,0.808356,0.418377,1.724273,0.747161,41.135953
min,4.3,2.0,1.0,0.1,8.0
25%,5.225,2.8,1.6,0.4,43.25
50%,5.9,3.0,4.45,1.4,78.5
75%,6.475,3.3,5.1,1.8,113.75
max,7.9,4.2,6.9,2.5,149.0


In [131]:
df.fillna(df.mean(), inplace=True)
df.head(8)

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species,new_column
0,5.925362,2.9,1.4,0.2,setosa,8
1,5.925362,3.1,1.5,0.1,setosa,9
2,5.925362,3.7,1.5,0.2,setosa,10
3,5.925362,3.4,1.6,0.2,setosa,11
4,4.8,3.022464,1.4,0.1,setosa,12
5,4.3,3.022464,1.1,0.1,setosa,13
6,5.8,3.022464,1.2,0.2,setosa,14
7,5.7,3.022464,1.5,0.4,setosa,15
