# End-to-end Multi-class Iris Flower Species Classification
This notebook builds an end-to-end multi-species Iris Flower classifier using Sklearn.

## 1. Problem
Identifying the species of a Iris flower given sepal length, sepal width, petal length and petal width.

So, it basically like I want to know which exactly species of a Iris Flower based on the sepal length, sepal width, petal length and petal width.

## 2. Data
The data I'm using is from Kaggle's Iris.csv file which is uploaded by SAURABH SINGH.

https://www.kaggle.com/datasets/saurabh00007/iriscsv

## 3.  Evaluation
The evaluation is a file with prediction probabilities for each species of Iris flower.

## Get the workspace ready

* Import sklearn
* Import matplotlib
* Import NumPy
* Import pandas

In [None]:
# Import necessary libraries
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np

from google.colab import drive
drive.mount('/content/drive')


Mounted at /content/drive


## Getting a dataset ready

In [None]:
path = "drive/MyDrive/Iris_Classifier/Iris.csv"
iris_df = pd.read_csv(path)
print(iris_df)
print(iris_df.describe())

      Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm  \
0      1            5.1           3.5            1.4           0.2   
1      2            4.9           3.0            1.4           0.2   
2      3            4.7           3.2            1.3           0.2   
3      4            4.6           3.1            1.5           0.2   
4      5            5.0           3.6            1.4           0.2   
..   ...            ...           ...            ...           ...   
145  146            6.7           3.0            5.2           2.3   
146  147            6.3           2.5            5.0           1.9   
147  148            6.5           3.0            5.2           2.0   
148  149            6.2           3.4            5.4           2.3   
149  150            5.9           3.0            5.1           1.8   

            Species  
0       Iris-setosa  
1       Iris-setosa  
2       Iris-setosa  
3       Iris-setosa  
4       Iris-setosa  
..              ...  
145  

In [None]:
# Display the first few rows of the dataset
print(iris_df.head())

   Id  SepalLengthCm  SepalWidthCm  PetalLengthCm  PetalWidthCm      Species
0   1            5.1           3.5            1.4           0.2  Iris-setosa
1   2            4.9           3.0            1.4           0.2  Iris-setosa
2   3            4.7           3.2            1.3           0.2  Iris-setosa
3   4            4.6           3.1            1.5           0.2  Iris-setosa
4   5            5.0           3.6            1.4           0.2  Iris-setosa


## Data Preprocessing:
### What we have to do right now?
Our goal here is to build a machine learning model on all of the columns except Species to predict Species.

In essence, the species column is our target variable (also called y or labels) and the rest of the other columns are our independent variables (also called data or X).

We then split data into training and testing sets using 'train_test_split'.

The features are standardized using 'StandardScaler' to improve model performance.

Knowing this, let's create X and y by splitting our dataframe up.

In [None]:
X = iris_df.drop('Species', axis=1)
y = iris_df['Species']

# Store the column names for use in prediction
global feature_names
feature_names = X.columns

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# View the different shapes of the training and test datasets
X_train.shape, X_test.shape, y_train.shape, y_test.shape

# Standardize the features
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

## Preparing a machine learning model

In [None]:
# Initialize the model
model = RandomForestClassifier(random_state=42)

## Fitting a model and making predictions

In [None]:
# Train the model
model.fit(X_train, y_train)

Use the fitted model to make predictions on the test data and save the predictions to a variable called y_preds

In [None]:
y_pred = model.predict(X_test)

## Evaluating a model's predictions
Evaluating predictions is as important making them. Let's check how our model did by calling the score() method on it and passing it the training (X_train, y_train) and testing data.

We evaluate the model using accuracy, classification report, and confusion matrix to understand its performance.

In [None]:
# Evaluate the fitted model on the training set using the score() function
model.score(X_train, y_train)

1.0

In [None]:
# Evaluate the fitted model on the test set using the score() function
model.score(X_test, y_test)

1.0

In [None]:
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print(f'Accuracy: {accuracy}')
print('Classification Report:')
print(report)
print('Confusion Matrix:')
print(conf_matrix)

Accuracy: 1.0
Classification Report:
                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        10
Iris-versicolor       1.00      1.00      1.00         9
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30

Confusion Matrix:
[[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]


## Save the trained model and scaler

In [None]:
import joblib
joblib.dump(model, 'iris_model.pkl')
joblib.dump(scaler, 'scaler.pkl')

['scaler.pkl']