### IRIS Dataset

#### Description
The Iris dataset consists of 150 data instances. There are 3 classes (Iris Versicolor, Iris Setosa and Iris Virginica) each have 50 instances. 


For each flower we have the below data attributes 

- sepal length in cm
- sepal width in cm
- petal length in cm
- petal width in cm

To make our experiment easy we rename the classes  with numbers : 

    "0": setosa
    "1": versicolor
    "2": virginica
    

### Domain Information

Iris Plants are flowering plants with showy flowers. They are very popular among movie directors as it gives excellent background. 

They are predominantly found in dry, semi-desert, or colder rocky mountainous areas in Europe and Asia. They have long, erect flowering stems and can produce white, yellow, orange, pink, purple, lavender, blue or brown colored flowers. There are 260 to 300 types of iris.

![alt text](https://cdn-images-1.medium.com/max/1275/1*7bnLKsChXq94QjtAiRn40w.png)

As you could see, flowers have 3 sepals and 3 petals.  The sepals are usually spreading or drop downwards and the petals stand upright, partly behind the sepal bases. However, the length and width of the sepals and petals vary for each type.


#### Importing the required packages

In [1]:
# Importing datasets package to load iris dataset 
from sklearn import datasets
from sklearn import model_selection

In [2]:
# Import linear model
from sklearn.linear_model import LogisticRegression
# Import neighbors to use the KKneighborsClassifier further in the experiment
from sklearn.neighbors import KNeighborsClassifier

In [3]:
# Load iris dataset from sklearn datasets package using 'load_iris()' function
iris = datasets.load_iris()

In [4]:
# Get data from the dataset object 'data' defined above
X = iris.data
y = iris.target

In [5]:
# Let us check the type of the variable 'dataArr' and type of the dataset object 'data'
print(type(X),type(y))

<class 'numpy.ndarray'> <class 'numpy.ndarray'>


In [6]:
iris.feature_names # Display features present in the dataset from dataset object 'data' defined above.

['sepal length (cm)',
 'sepal width (cm)',
 'petal length (cm)',
 'petal width (cm)']

In [7]:
iris.keys()

dict_keys(['data', 'target', 'target_names', 'DESCR', 'feature_names', 'filename'])

In [8]:
iris.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

#### Split the data into train and test  sets

In [9]:
# Use train_test_split from model_selection to split the data into train and test data.
X_train,X_test,y_train,y_test = model_selection.train_test_split(X,y,test_size=0.33,random_state=124)

#### Look at the shape of the training and testing sets

In [10]:
# trainData.shape,testData.shape, dataArr.shape
print(X_train.shape,y_train.shape,X_test.shape,y_test.shape) # shape will return number of rows and columns in a dataset

(100, 4) (100,) (50, 4) (50,)


#### Apply a Linear classifier on the iris dataset

In [11]:
clf = clf = LogisticRegression(random_state=0) # Create an object of Linear Classifer class
clf.fit(X_train,y_train) # Fit the data to Classifier
clf.score(X_test,y_test) # Display the mean accuracy on the given test data and labels

0.92

#### Apply KNN on the iris dataset

In [12]:
neigh = KNeighborsClassifier(n_neighbors=3) # Create an object of K Nearest Neighbors Classifer class
neigh.fit(X_train,y_train) # Fit the data to KNNClassifier
neigh.score(X_test,y_test) # Display the mean accuracy on the given test data and labels

0.94