# Exploring the IRIS Dataset: A Machine Learning Approach

## Introduction

In this Jupyter Notebook, we delve into the classic IRIS dataset, a benchmark in the field of machine learning and pattern recognition. The dataset comprises measurements of four features—sepal length, sepal width, petal length, and petal width—across three different species of iris flowers: setosa, versicolor, and virginica.

Our primary objective is to apply machine learning techniques to classify the iris flowers based on their features. To enhance the predictive performance of our models, we have employed preprocessing techniques such as Min-Max Scaling to normalize the feature values.

The machine learning algorithm of choice for this exploration is the Gaussian Naive Bayes classifier, known for its simplicity and effectiveness, especially in scenarios with relatively small datasets like IRIS. We will assess the model's performance using various evaluation metrics, including the confusion matrix, accuracy, precision, recall, and F1 score.

Through this analysis, we aim to gain insights into the effectiveness of the chosen classifier and understand how well it generalizes to unseen data. So, let's dive into the world of IRIS classification and unravel the patterns hidden within this iconic dataset.


## Loading Libraries

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler
from sklearn.naive_bayes import GaussianNB
import sklearn.metrics as skm

## Loading Libraries

In [4]:
df=pd.read_csv("iris.csv", names=['sepal length', 'sepan width', 'petal length', 'petal width', 'class'])
df.head()

Unnamed: 0,sepal length,sepan width,petal length,petal width,class
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


## Exploratory Data Analysis

In [7]:
df.shape

(150, 5)

In [8]:
Y=df.iloc[:, -1:]
Y.head()

Unnamed: 0,class
0,Iris-setosa
1,Iris-setosa
2,Iris-setosa
3,Iris-setosa
4,Iris-setosa


In [9]:
X=df.iloc[:, 0:-1]
X.head()

Unnamed: 0,sepal length,sepan width,petal length,petal width
0,5.1,3.5,1.4,0.2
1,4.9,3.0,1.4,0.2
2,4.7,3.2,1.3,0.2
3,4.6,3.1,1.5,0.2
4,5.0,3.6,1.4,0.2


## Dataset Splitting

In [10]:
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, random_state=42, test_size=0.33)

## Data Scaling

In [14]:
scaler=MinMaxScaler()
X_train_scale=scaler.fit_transform(X_train)
X_test_scale=scaler.transform(X_test)

## Model Development

In [19]:
model=GaussianNB()
model.fit(X_train_scale, Y_train)

  y = column_or_1d(y, warn=True)


In [22]:
Y_pred=model.predict(X_test_scale)

## Model Evaluation

In [35]:
skm.confusion_matrix(Y_test, Y_pred)

array([[19,  0,  0],
       [ 0, 14,  1],
       [ 0,  1, 15]], dtype=int64)

In [25]:
skm.accuracy_score(Y_test, Y_pred)

0.96

In [29]:
skm.precision_score(Y_test, Y_pred, average='micro')

0.96

In [30]:
skm.recall_score(Y_test, Y_pred, average='micro')

0.96

In [31]:
skm.f1_score(Y_test, Y_pred, average='micro')

0.96