# Iris Flowers Classification

Predict the different species of flowers on the length of there petals and sepals

For this problem we will be using Iris Flower Data, the data set consists of 50 samples from each of three species of Iris (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample: the length and the width of the sepals and petals, in centimeters. Based on the combination of these four features, Fisher developed a linear discriminant model to distinguish the species from each other.

To train our machine learning model with Iris Flowe Data, we will be using scikit-learn’s iris dataset.

## Importing Libraries

In [47]:
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split

## Loading Iris Flower Data

In [48]:
iris_data = load_iris()
print('Classes to predict: ', iris_data.target_names)

Classes to predict:  ['setosa' 'versicolor' 'virginica']


## Splitting Features

In [49]:
# Spliting target variable and independent variables
X = iris_data.data
y = iris_data.target

In [50]:
print('Number of examples in the data:', X.shape[0])

Number of examples in the data: 150


In [51]:
X[:5]

array([[5.1, 3.5, 1.4, 0.2],
       [4.9, 3. , 1.4, 0.2],
       [4.7, 3.2, 1.3, 0.2],
       [4.6, 3.1, 1.5, 0.2],
       [5. , 3.6, 1.4, 0.2]])

In [52]:
y[:150:50] 

array([0, 1, 2])

## Splitting Data for Training and Testing 

In [53]:
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 47, test_size = 0.25)


## Create a Decision Tree Model

In [54]:
# Import library for Decision Tree
from sklearn.tree import DecisionTreeClassifier

# Create a Decision Tree model
dt = DecisionTreeClassifier(criterion = 'entropy')

# Train the model using the training sets 
dt.fit(X_train, y_train)

DecisionTreeClassifier(criterion='entropy')

## Model Testing

In [55]:
y_pred =  dt.predict(X_test)

## Accuracy of the Model

In [56]:
from sklearn.metrics import accuracy_score
dt = DecisionTreeClassifier(criterion='entropy', min_samples_split=20)
dt.fit(X_train, y_train)
print('Accuracy Score on train data: ', accuracy_score(y_true=y_train, y_pred=dt.predict(X_train)))
print('Accuracy Score on the test data: ', accuracy_score(y_true=y_test, y_pred=dt.predict(X_test)))

Accuracy Score on train data:  0.9732142857142857
Accuracy Score on the test data:  0.9473684210526315


## Model Evaluation

In [57]:
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
result = confusion_matrix(y_test, y_pred)
print("Confusion Matrix:")
print(result)
result1 = classification_report(y_test, y_pred)
print("Classification Report:",)
print (result1)

Confusion Matrix:
[[15  0  0]
 [ 0  8  0]
 [ 0  1 14]]
Classification Report:
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        15
           1       0.89      1.00      0.94         8
           2       1.00      0.93      0.97        15

    accuracy                           0.97        38
   macro avg       0.96      0.98      0.97        38
weighted avg       0.98      0.97      0.97        38



## Testing the model on values

In [61]:
X_new = np.array([[3, 2, 1, 0.2], [  4.9, 2.2, 3.8, 1.1 ], [  5, 2.5, 2.9, 12.3 ]])
#Prediction of the species from the input vector
prediction = dt.predict(X_new)
print("Prediction of Species: {}".format(prediction))

Prediction of Species: [0 1 1]
