## IRIS Dataset: Multiple Machine Learning Models
## Logistic Regression, Perceptron, K-NN, Support Vector Machines, Decision Tree, Random Forest and Gradient Boosting Classifiers.

Author: Rajpal Virk
<br>
www.rajpal-virk.com

### About dataset

This is perhaps the best known database to be found in the pattern recognition literature. Fisher's paper is a classic in the field and is referenced frequently to this day. (See Duda & Hart, for example.) The data set contains 3 classes of 50 instances each, where each class refers to a type of iris plant. One class is linearly separable from the other 2; the latter are NOT linearly separable from each other. 

Predicted attribute: class of iris plant. 

This is an exceedingly simple domain. 

This data differs from the data presented in Fishers article (identified by Steve Chadwick, spchadwick '@' espeedaz.net ). The 35th sample should be: 4.9,3.1,1.5,0.2,"Iris-setosa" where the error is in the fourth feature. The 38th sample: 4.9,3.6,1.4,0.1,"Iris-setosa" where the errors are in the second and third features.


Attribute Information:

1. sepal length in cm 
2. sepal width in cm 
3. petal length in cm 
4. petal width in cm 
5. class: 
-- Iris Setosa 
-- Iris Versicolour 
-- Iris Virginica

Dataset can be downloaded from:
https://archive.ics.uci.edu/ml/datasets/iris

### Import required libraries

In [1]:
# Import requried libraries

import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import Perceptron
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier


from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

import warnings
warnings.filterwarnings("ignore")

### Load, review and preprocess dataset

In [2]:
# Load and review dataset

X, y = load_iris(return_X_y=True)
print("Shape of input features: ", X.shape)
print("Shape of target labels: ", y.shape)

Shape of input features:  (150, 4)
Shape of target labels:  (150,)


**_we have 4 input features and 1 target label. Let's print first few of these values._**

In [3]:
# Print first 5 values of both input features and target labels

print("First 5 input features: \n", X[:5])
print()
print("First 5 target labels: \n", y[:5])

First 5 input features: 
 [[5.1 3.5 1.4 0.2]
 [4.9 3.  1.4 0.2]
 [4.7 3.2 1.3 0.2]
 [4.6 3.1 1.5 0.2]
 [5.  3.6 1.4 0.2]]

First 5 target labels: 
 [0 0 0 0 0]


In [4]:
# Print the unique values of target labels or classes

print("Unique values of target labels or classes: ", np.unique(y))

Unique values of target labels or classes:  [0 1 2]


**_We have 3 unique classes - 0, 1, and 2. Class '0' is  'Iris Setosa, Class '1' is 'Iris Versicolour', and Class '2' is 'Iris Virginica'._**

In [5]:
# Split dataset in Train and Test data

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)

### Model Building and Evaluation

In [6]:
# Build Machine Learning Models

models = {'Logistic Regression': LogisticRegression(), 'Perceptron':Perceptron(alpha=0.001),  'K Neighbours':KNeighborsClassifier(n_neighbors=1), 'SVM':SVC(),
         'Decision Tree': DecisionTreeClassifier(), 'Random Forest': RandomForestClassifier(), 'Gradient Boosting': GradientBoostingClassifier()}

In [7]:
for model_name, model in models.items():
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    print("#########################################################")
    print()
    print("Model:", model_name)
    print("Training score: "+"{:.2f}".format(model.score(X_train, y_train)))
    print("Testing score: "+"{:.2f}".format(model.score(X_test, y_test)))
    print("Confusion Matrix: \n", confusion_matrix(y_pred, y_test))
    print()
    print("Classification Report:\n", classification_report(y_pred, y_test))
    print()
    

#########################################################

Model: Logistic Regression
Training score: 0.97
Testing score: 1.00
Confusion Matrix: 
 [[11  0  0]
 [ 0 13  0]
 [ 0  0  6]]

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        11
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00         6

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30


#########################################################

Model: Perceptron
Training score: 0.95
Testing score: 1.00
Confusion Matrix: 
 [[11  0  0]
 [ 0 13  0]
 [ 0  0  6]]

Classification Report:
               precision    recall  f1-score   support

           0       1.00      1.00      1.00        11
           1       1.00      1.00      1.00        13
           2       1.00      1.00      1.00   

**_From above report, we can conclude that we have successfully build different machine leanring classifiers that result in 100% accuracy, precision and recall on IRIS dataset._**