## Iris Flower Classification through Logistic Regression

The iris flower classification is a well-known problem in machine learning. It involves distinguishing between different species of iris flowers based on their attributes. The dataset, introduced by Ronald Fisher in 1936, includes measurements of sepal and petal length and width for three species: setosa, versicolor, and virginica. The iris classification problem is often used as a benchmark in machine learning, allowing researchers to compare and evaluate different classification algorithms. It has also contributed to the development of new techniques, such as Fisher's Linear Discriminant Analysis. Overall, the iris flower classification serves as a fundamental example for understanding and exploring classification algorithms.

The iris flower classification is a well-known problem in machine learning, often tackled using logistic regression. Logistic regression is a popular classification algorithm that models the relationship between the input features and the probabilities of different classes. In the case of the iris flower classification, logistic regression can be trained to predict the species of an iris flower based on its measurements. By optimizing the model parameters using techniques like maximum likelihood estimation, logistic regression can effectively separate the different species. The simplicity and interpretability of logistic regression make it a suitable choice for solving the iris flower classification problem while achieving good accuracy.

In conclusion, the iris flower classification is a classic problem in machine learning and pattern recognition. It has provided a foundation for understanding and developing classification algorithms, and its simplicity and availability have made it a popular dataset for experimentation and research in the field.

### Importing required libraries

In [1]:
# Importing required libraries for data manipulation
import numpy as np
import pandas as pd

# Importing required libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns

import warnings
warnings.filterwarnings(action = "ignore")

### Reading the CSV File

In [2]:
data = pd.read_csv("Iris.csv")
data

Unnamed: 0,Id,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,1,5.1,3.5,1.4,0.2,Iris-setosa
1,2,4.9,3.0,1.4,0.2,Iris-setosa
2,3,4.7,3.2,1.3,0.2,Iris-setosa
3,4,4.6,3.1,1.5,0.2,Iris-setosa
4,5,5.0,3.6,1.4,0.2,Iris-setosa
...,...,...,...,...,...,...
145,146,6.7,3.0,5.2,2.3,Iris-virginica
146,147,6.3,2.5,5.0,1.9,Iris-virginica
147,148,6.5,3.0,5.2,2.0,Iris-virginica
148,149,6.2,3.4,5.4,2.3,Iris-virginica


### Exploratory Data Analysis

In [3]:
# Getting rid of the unwanted column 

data.drop("Id", axis = 1, inplace = True)
data.head()

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm,Species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [4]:
data.shape

(150, 5)

In [5]:
data.describe()

Unnamed: 0,SepalLengthCm,SepalWidthCm,PetalLengthCm,PetalWidthCm
count,150.0,150.0,150.0,150.0
mean,5.843333,3.054,3.758667,1.198667
std,0.828066,0.433594,1.76442,0.763161
min,4.3,2.0,1.0,0.1
25%,5.1,2.8,1.6,0.3
50%,5.8,3.0,4.35,1.3
75%,6.4,3.3,5.1,1.8
max,7.9,4.4,6.9,2.5


In [6]:
data.isnull().sum()
# Here we can see that there are no null values in the data so we can move further with processing

SepalLengthCm    0
SepalWidthCm     0
PetalLengthCm    0
PetalWidthCm     0
Species          0
dtype: int64

In [7]:
# Checking unique species of flower
data['Species'].value_counts()

Iris-setosa        50
Iris-versicolor    50
Iris-virginica     50
Name: Species, dtype: int64

### Creating the independent variables and dependent values

In [8]:
x = data.drop("Species", axis =1)
y = data["Species"]

In [9]:
# Splitting the entire data into train and test dataset for modelling

from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size = 0.2, random_state = 42)

In [10]:
x_train.shape

(120, 4)

In [11]:
y_train.shape

(120,)

In [12]:
print(x_test.shape)
print(y_test.shape)

(30, 4)
(30,)


### Model Building

In [13]:
# Importing the required algorithm for data modelling

from sklearn.linear_model import LogisticRegression
regressor = LogisticRegression()                            # creating an instance of the function

In [14]:
# Now we fit the training data on the model

regressor.fit(x_train,y_train)

LogisticRegression()

In [15]:
y_predict = regressor.predict(x_test)

In [16]:
y_predict

array(['Iris-versicolor', 'Iris-setosa', 'Iris-virginica',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-setosa',
       'Iris-versicolor', 'Iris-virginica', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-virginica', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-virginica',
       'Iris-versicolor', 'Iris-versicolor', 'Iris-virginica',
       'Iris-setosa', 'Iris-virginica', 'Iris-setosa', 'Iris-virginica',
       'Iris-virginica', 'Iris-virginica', 'Iris-virginica',
       'Iris-virginica', 'Iris-setosa', 'Iris-setosa'], dtype=object)

In [17]:
y_pred = regressor.predict(x_train)
y_pred

array(['Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-setosa',
       'Iris-setosa', 'Iris-virginica', 'Iris-versicolor', 'Iris-setosa',
       'Iris-setosa', 'Iris-setosa', 'Iris-virginica', 'Iris-versicolor',
       'Iris-versicolor', 'Iris-setosa', 'Iris-setosa', 'Iris-versicolor',
       'Iris-virginica', 'Iris-virginica', 'Iris-versicolor',
       'Iris-virginica', 'Iris-versicolor', 'Iris-virginica',
       'Iris-versicolor', 'Iris-setosa', 'Iris-virginica',
       'Iris-versicolor', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',
       'Iris-versicolor', 'Iris-virginica', 'Iris-setosa', 'Iris-setosa',
       'Iris-setosa', 'Iris-versicolor', 'Iris-setosa', 'Iris-versicolor',
       'Iris-virginica', 'Iris-setosa', 'Iris-versicolor',
       'Iris-virginica', 'Iris-setosa', 'Iris-virginica',
       'Iris-virginica', 'Iris-versicolor', 'Iris-versicolor',
       'Iris-virginica', 'Iris-versicolor', 'Iris-setosa',
       'Iris-versicolor', 'Iris-virginica', 'Iris-setosa', 'Iris

### Testing Accuracy of the Model

In [18]:
# Checking the accuracy of the model

from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score


print("Training Accuracy of the model is", accuracy_score(y_train,y_pred))
print()
print("Testing Accuracy of the model is", accuracy_score(y_test,y_predict))
print()
print("The confusion matrix for the model is",confusion_matrix(y_test, y_predict))
print()
print(classification_report(y_test, y_predict))

Training Accuracy of the model is 0.975

Testing Accuracy of the model is 1.0

The confusion matrix for the model is [[10  0  0]
 [ 0  9  0]
 [ 0  0 11]]

                 precision    recall  f1-score   support

    Iris-setosa       1.00      1.00      1.00        10
Iris-versicolor       1.00      1.00      1.00         9
 Iris-virginica       1.00      1.00      1.00        11

       accuracy                           1.00        30
      macro avg       1.00      1.00      1.00        30
   weighted avg       1.00      1.00      1.00        30



In [19]:
d = [[4.9,3.5,1.1,0.4]]
pred = regressor.predict(d)
pred

array(['Iris-setosa'], dtype=object)