## About the dataset

This is perhaps the best known database to be found in the pattern recognition literature. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant.  One class is linearly separable from the other 2; the latter are NOT linearly separable from each other.

Attribute Information:
   1. sepal length in cm
   2. sepal width in cm
   3. petal length in cm
   4. petal width in cm
   5. class: 
      -- Iris Setosa
      -- Iris Versicolour
      -- Iris Virginica

### Read the dataset and store it in the dataframe named Iris

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import os
"""
* The following assumes 'Jupyter Notebook' is run where the dataset 'Iris.csv' resides
* Configure os path accordingly if your current working directory is different
"""
Iris = pd.read_csv(os.path.join('', 'Iris.csv'))
display(Iris)

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


### Find out the datatypes of each and every column

In [4]:
Iris.dtypes

Sepal Length (in cm)    float64
Sepal Width in (cm)     float64
Petal length (in cm)    float64
Petal width (in cm)     float64
Class                    object
dtype: object

### Print top 10 & bottom 10 samples from the dataframe

In [3]:
# top 10 values
display(Iris.head(10))
# bottom 10 values
display(Iris.tail(10))

Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


Unnamed: 0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
140,6.7,3.1,5.6,2.4,Iris-virginica
141,6.9,3.1,5.1,2.3,Iris-virginica
142,5.8,2.7,5.1,1.9,Iris-virginica
143,6.8,3.2,5.9,2.3,Iris-virginica
144,6.7,3.3,5.7,2.5,Iris-virginica
145,6.7,3.0,5.2,2.3,Iris-virginica
146,6.3,2.5,5.0,1.9,Iris-virginica
147,6.5,3.0,5.2,2.0,Iris-virginica
148,6.2,3.4,5.4,2.3,Iris-virginica
149,5.9,3.0,5.1,1.8,Iris-virginica


### Find the shape of the dataset

In [16]:
Iris.shape

(150, 5)

### Set the index of the dataframe to be the first column

In [25]:
Iris.set_index('index', inplace=True)
Iris.head()

Unnamed: 0_level_0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


### Use Iloc function to print all the rows of the 3rd column 

In [34]:
#3rd Column - Sepal Width in (cm)
Iris.iloc[:,1:2]

Unnamed: 0_level_0,Sepal Width in (cm)
index,Unnamed: 1_level_1
0,3.5
1,3.0
2,3.2
3,3.1
4,3.6
5,3.9
6,3.4
7,3.4
8,2.9
9,3.1


### Slicing
Print only the Sepal width and Sepal Length for first 10 rows 

In [35]:
Iris.loc[:,['Sepal Width in (cm)','Sepal Length (in cm)']].head(10)

Unnamed: 0_level_0,Sepal Width in (cm),Sepal Length (in cm)
index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,3.5,5.1
1,3.0,4.9
2,3.2,4.7
3,3.1,4.6
4,3.6,5.0
5,3.9,5.4
6,3.4,4.6
7,3.4,5.0
8,2.9,4.4
9,3.1,4.9


### Using Logical statements for indexing
Print all the columns of row which has class name "Iris-setosa"

In [39]:
Iris.loc[Iris['Class']=='Iris-setosa']

Unnamed: 0_level_0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


### Multiply Sepal Length and width and store it under the column name "SepalExtra" in the same Iris dataframe

In [40]:
Iris['SepalExtra (in cm)']=Iris['Sepal Length (in cm)']*Iris['Sepal Width in (cm)']

In [41]:
display(Iris)

Unnamed: 0_level_0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class,SepalExtra (in cm)
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
0,5.1,3.5,1.4,0.2,Iris-setosa,17.85
1,4.9,3.0,1.4,0.2,Iris-setosa,14.70
2,4.7,3.2,1.3,0.2,Iris-setosa,15.04
3,4.6,3.1,1.5,0.2,Iris-setosa,14.26
4,5.0,3.6,1.4,0.2,Iris-setosa,18.00
5,5.4,3.9,1.7,0.4,Iris-setosa,21.06
6,4.6,3.4,1.4,0.3,Iris-setosa,15.64
7,5.0,3.4,1.5,0.2,Iris-setosa,17.00
8,4.4,2.9,1.4,0.2,Iris-setosa,12.76
9,4.9,3.1,1.5,0.1,Iris-setosa,15.19


### Find out the mean and variance for each column but for class column 

In [42]:
Iris.groupby('Class').agg(["mean","var"])

Unnamed: 0_level_0,Sepal Length (in cm),Sepal Length (in cm),Sepal Width in (cm),Sepal Width in (cm),Petal length (in cm),Petal length (in cm),Petal width (in cm),Petal width (in cm),SepalExtra (in cm),SepalExtra (in cm)
Unnamed: 0_level_1,mean,var,mean,var,mean,var,mean,var,mean,var
Class,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2,Unnamed: 7_level_2,Unnamed: 8_level_2,Unnamed: 9_level_2,Unnamed: 10_level_2
Iris-setosa,5.006,0.124249,3.418,0.14518,1.464,0.030106,0.244,0.011494,17.2088,8.688864
Iris-versicolor,5.936,0.266433,2.77,0.098469,4.26,0.220816,1.326,0.039106,16.5262,8.219012
Iris-virginica,6.588,0.404343,2.974,0.104004,5.552,0.304588,2.026,0.075433,19.6846,11.96318


### Write a function that accepts two numbers as input and prints them - Pass the Sepal length and sepal width of 5th row and print the output

In [49]:
# strict - takes 2 numbers 
def petal_size(len:int,width:int):
    print("The length of the sepal is {0} \nThe width of the sepal is {1}".format(len,width))
    
petal_size(Iris.loc[4,'Sepal Length (in cm)'],Iris.loc[4,'Sepal Width in (cm)'])

The length of the sepal is 5.0 
The width of the sepal is 3.6


### Find the range of all the columns in the dataset

*Range = Max value - Min value (in the column)*

In [61]:
# for numeric values 
range = Iris.select_dtypes(include=[np.number]).apply(lambda x: x.max() - x.min())
# for object values
range.append(Iris.select_dtypes(include=[object]).max()+ ' - '+Iris.select_dtypes(include=[object]).min())

Sepal Length (in cm)                             3.6
Sepal Width in (cm)                              2.4
Petal length (in cm)                             5.9
Petal width (in cm)                              2.4
SepalExtra (in cm)                             20.02
Class                   Iris-virginica - Iris-setosa
dtype: object

### Sort the entire dataset according to the column Petal width

In [62]:
Iris.sort_values(by='Petal width (in cm)')

Unnamed: 0_level_0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class,SepalExtra (in cm)
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
32,5.2,4.1,1.5,0.1,Iris-setosa,21.32
13,4.3,3.0,1.1,0.1,Iris-setosa,12.90
37,4.9,3.1,1.5,0.1,Iris-setosa,15.19
9,4.9,3.1,1.5,0.1,Iris-setosa,15.19
12,4.8,3.0,1.4,0.1,Iris-setosa,14.40
34,4.9,3.1,1.5,0.1,Iris-setosa,15.19
0,5.1,3.5,1.4,0.2,Iris-setosa,17.85
27,5.2,3.5,1.5,0.2,Iris-setosa,18.20
28,5.2,3.4,1.4,0.2,Iris-setosa,17.68
29,4.7,3.2,1.6,0.2,Iris-setosa,15.04


### Remove the new column "SepalExtra" from the dataframe

In [63]:
Iris.drop(['SepalExtra (in cm)'],axis=1,inplace=True)
Iris.head()

Unnamed: 0_level_0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


### Print only the rows which has the class to be "Iris-setosa"

In [64]:
Iris.loc[Iris['Class']=='Iris-setosa']

Unnamed: 0_level_0,Sepal Length (in cm),Sepal Width in (cm),Petal length (in cm),Petal width (in cm),Class
index,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa
5,5.4,3.9,1.7,0.4,Iris-setosa
6,4.6,3.4,1.4,0.3,Iris-setosa
7,5.0,3.4,1.5,0.2,Iris-setosa
8,4.4,2.9,1.4,0.2,Iris-setosa
9,4.9,3.1,1.5,0.1,Iris-setosa


### Take only the top 10 rows of the dataset with only first 3 columns and store it in a dataframe named "IrisSubset" 

In [65]:
IrisSubset = Iris.iloc[0:10,0:2]

In [66]:
display(IrisSubset)

Unnamed: 0_level_0,Sepal Length (in cm),Sepal Width in (cm)
index,Unnamed: 1_level_1,Unnamed: 2_level_1
0,5.1,3.5
1,4.9,3.0
2,4.7,3.2
3,4.6,3.1
4,5.0,3.6
5,5.4,3.9
6,4.6,3.4
7,5.0,3.4
8,4.4,2.9
9,4.9,3.1
