# IRIS Dataset Data Frame

### Other sources of information in Jupyter

**Data Example**
- <u><i>Dataset name</i></u>: Iris Plants Database
- <u><i>Description</i></u>: This is perhaps the best known database to be found in the pattern recognition literature.  The data set was donated in 1988 by Michael Marshall but the data set was created by R.A. Fisher in 1936; Fisher's paper is a classic in the field and is referenced frequently to this day. The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. There are a total of 5 attributes, with four of these being the measurements of the sepal and petals of each observation in the data set and the fifth being the class or species of Iris (Setosa, Versicolor, and Virginica) that each observation belongs to.

- <u><i>Predictors (X)<i/></u>: sepal length in cm, sepal width in cm, petal length in cm, petal width in cm​

- <i>Response (Y)<i/>: class of iris plant (Setosa, Versicolor, Virginica) 

<u><i> Further Resources <i/> <u/>

- [Fisher's Original Paper](https://onlinelibrary.wiley.com/doi/10.1111/j.1469-1809.1936.tb02137.x) 

- [Data Set Repository](https://archive.ics.uci.edu/ml/datasets/Iris)  
 
- [More Information About Iris](https://www.angela1c.com/projects/iris_project/the-iris-dataset/)  

In [39]:
#Required Packages 
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

# import dataset from skleanr and save as iris
iris = load_iris()

# Create Data Frame (df)
df = pd.DataFrame(data= np.c_[iris['data'], iris['target']],
                     columns= iris['feature_names'] + ['target'])
#Peak
df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,0.0
1,4.9,3.0,1.4,0.2,0.0
2,4.7,3.2,1.3,0.2,0.0
3,4.6,3.1,1.5,0.2,0.0
4,5.0,3.6,1.4,0.2,0.0


In [40]:
#Change 0.0,1.0, & 2.0 to the species names 
df.loc[(df.target == 0.0),'target']='Setosa'
df.loc[(df.target == 1.0),'target']='Versicolor'
df.loc[(df.target == 2.0),'target']='Virginica'

df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
0,5.1,3.5,1.4,0.2,Setosa
1,4.9,3.0,1.4,0.2,Setosa
2,4.7,3.2,1.3,0.2,Setosa
3,4.6,3.1,1.5,0.2,Setosa
4,5.0,3.6,1.4,0.2,Setosa


In [41]:
#Increase index (eastetic), so that rangges 1-50, 51-100, 101, 150
df.index += 1 

df.head()

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
1,5.1,3.5,1.4,0.2,Setosa
2,4.9,3.0,1.4,0.2,Setosa
3,4.7,3.2,1.3,0.2,Setosa
4,4.6,3.1,1.5,0.2,Setosa
5,5.0,3.6,1.4,0.2,Setosa


In [42]:
#Print 5 samples from the each species
df.loc[np.r_[1:6, 51:56, 146:151], :]

Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target
1,5.1,3.5,1.4,0.2,Setosa
2,4.9,3.0,1.4,0.2,Setosa
3,4.7,3.2,1.3,0.2,Setosa
4,4.6,3.1,1.5,0.2,Setosa
5,5.0,3.6,1.4,0.2,Setosa
51,7.0,3.2,4.7,1.4,Versicolor
52,6.4,3.2,4.5,1.5,Versicolor
53,6.9,3.1,4.9,1.5,Versicolor
54,5.5,2.3,4.0,1.3,Versicolor
55,6.5,2.8,4.6,1.5,Versicolor
