<h1 style="
  font-size: 2.5em; 
  color: #fdf6e3; /* Solarized base2 (light text) */
  background-color: #002b36; /* Solarized base03 (dark background) */
  text-align: center; 
  padding: 20px; 
  border-radius: 15px; 
  box-shadow: 0 6px 12px rgba(0, 0, 0, 0.3); 
  font-family: 'Verdana', sans-serif; 
  margin: 0; 
">
  Iris Flower Classification
</h1>

![](https://media.licdn.com/dms/image/v2/D563DAQHsoI-UE2dm9w/image-scale_191_1128/image-scale_191_1128/0/1693071350174/cognoriseinfotech_cover?e=2147483647&v=beta&t=u92lJoSjZnd56KvUWiJmsf7WVZA_-C2uxR-6a6-Q8eE)
<p style="
  font-size: 1.2em; 
  color: black; /* Solarized base2 (light text) */
  font-family: 'Verdana', sans-serif; 
  text-align: center; 
  margin: 20px 0;
">
  The Iris flower dataset includes three distinct species: <strong>Setosa</strong>, <strong>Versicolor</strong>, and <strong>Virginica</strong>. These species can be identified through specific measurements.
</p>

<ul style="
  font-size: 1.1em; 
  color: black; /* Solarized base2 (light text) */
  font-family: 'Verdana', sans-serif; 
  text-align: center; 
  margin: 20px 0; 
  list-style-type: disc; 
  padding: 0 20px;
">
  <li><strong>Objective:</strong> Train a machine learning model to learn from these measurements and accurately classify Iris flowers into their respective species.</li>
  <li><strong>Approach:</strong> Use the Iris dataset to construct a model adept at categorizing flowers based on their sepal and petal measurements.</li>
  <li><strong>Importance:</strong> This dataset is a widely used choice for introductory classification tasks, making it ideal for learning and experimentation.</li>
</ul>

<p style="
  font-size: 1.2em; 
  color: #fdf6e3; /* Solarized base2 (light text) */
  font-family: 'Verdana', sans-serif; 
  text-align: center; 
  margin: 20px 0;
">
  
</p>


<h2 style="
  font-size: 1.5em; 
  color: black; /* Light text color */
  margin-left: 0; 
  margin-top: 20px; 
  font-family: 'Verdana', sans-serif; 
  border-bottom: 5px solid #002b36; /* Dark background color for underline effect */
  padding-bottom: 5px; 
">
  <b>Importing Necessary Libraries</b>
</h2>


In [4]:
# pip install pandas plotly scikit-learn
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import *


<h2 style="
  font-size: 1.5em; 
  color: black; /* Light text color */
  margin-left: 0; 
  margin-top: 20px; 
  font-family: 'Verdana', sans-serif; 
  border-bottom: 5px solid #002b36; /* Dark background color for underline effect */
  padding-bottom: 5px; 
">
  <b>Dataset Loading</b>
</h2>


In [5]:
#loading dataset

dataset_path_drive = "IRIS.csv"
df = pd.read_csv(dataset_path_drive)


<h2 style="
  font-size: 1.5em; 
  color: black; /* Light text color */
  margin-left: 0; 
  margin-top: 20px; 
  font-family: 'Verdana', sans-serif; 
  border-bottom: 5px solid #002b36; /* Dark background color for underline effect */
  padding-bottom: 5px; 
">
  <b>Exploratory Data Analysis</b>
</h2>


In [6]:
#lets do that
df.head()

Unnamed: 0,sepal_length,sepal_width,petal_length,petal_width,species
0,5.1,3.5,1.4,0.2,Iris-setosa
1,4.9,3.0,1.4,0.2,Iris-setosa
2,4.7,3.2,1.3,0.2,Iris-setosa
3,4.6,3.1,1.5,0.2,Iris-setosa
4,5.0,3.6,1.4,0.2,Iris-setosa


In [7]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   sepal_length  150 non-null    float64
 1   sepal_width   150 non-null    float64
 2   petal_length  150 non-null    float64
 3   petal_width   150 non-null    float64
 4   species       150 non-null    object 
dtypes: float64(4), object(1)
memory usage: 6.0+ KB


In [8]:
df.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
sepal_length,150.0,5.843333,0.828066,4.3,5.1,5.8,6.4,7.9
sepal_width,150.0,3.054,0.433594,2.0,2.8,3.0,3.3,4.4
petal_length,150.0,3.758667,1.76442,1.0,1.6,4.35,5.1,6.9
petal_width,150.0,1.198667,0.763161,0.1,0.3,1.3,1.8,2.5


In [9]:
df.isnull().sum()/df.shape[0]*100

sepal_length    0.0
sepal_width     0.0
petal_length    0.0
petal_width     0.0
species         0.0
dtype: float64

In [10]:
df.shape

(150, 5)

In [11]:
df.columns.to_list()

['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']

In [12]:
df.dtypes

sepal_length    float64
sepal_width     float64
petal_length    float64
petal_width     float64
species          object
dtype: object

In [13]:
df.nunique()

sepal_length    35
sepal_width     23
petal_length    43
petal_width     22
species          3
dtype: int64

In [14]:
df.duplicated().sum()

3

In [15]:
#drop duplicates
df.drop_duplicates(inplace=True)

In [16]:
df.duplicated().sum()

0

<h2 style="
  font-size: 1.5em; 
  color: black; /* Light text color */
  margin-left: 0; 
  margin-top: 20px; 
  font-family: 'Verdana', sans-serif; 
  border-bottom: 5px solid #002b36; /* Dark background color for underline effect */
  padding-bottom: 5px; 
">
  <b>Plotting</b>
</h2>


In [23]:

# Define dimensions for scatter plot matrix
dimensions = [
    dict(label='Sepal Length', values=df['sepal_length']),
    dict(label='Sepal Width', values=df['sepal_width']),
    dict(label='Petal Length', values=df['petal_length']),
    dict(label='Petal Width', values=df['petal_width'])
]

# Create scatter plot matrix
fig = go.Figure(data=go.Splom(
    dimensions=dimensions,
    text=df['species'],  # Text labels for each point
    marker=dict(
        color=df['species'].astype('category').cat.codes,  # Use category codes for coloring
        colorscale='Viridis',  # Change colorscale if desired
        showscale=True
    )
))

# Update layout
fig.update_layout(
    title="Scatter Matrix of Iris Features",
    dragmode='select',
    autosize=True,
    xaxis=dict(
        showline=False,
        showgrid=False,
        zeroline=False
    ),
    yaxis=dict(
        showline=False,
        showgrid=False,
        zeroline=False
    )
)

# Show the figure
fig.show()


In [24]:


# Create subplots
fig = make_subplots(rows=1, cols=2, specs=[[{'type':'domain'}, {'type':'xy'}]])

# Pie chart
fig.add_trace(go.Pie(labels=df['species'].value_counts().index,
                     values=df['species'].value_counts().values,
                     textinfo='label+percent',
                     insidetextorientation='radial'),
              row=1, col=1)

# Histogram
fig.add_trace(go.Histogram(x=df['species'],
                           marker_color=['#1f77b4', '#ff7f0e', '#2ca02c']),
              row=1, col=2)

# Update layout for better presentation
fig.update_layout(title_text="Species Distribution",
                  xaxis_title="Species",
                  yaxis_title="Count",
                  showlegend=False)

fig.show()


<h2 style="
  font-size: 1.5em; 
  color: black; /* Light text color */
  margin-left: 0; 
  margin-top: 20px; 
  font-family: 'Verdana', sans-serif; 
  border-bottom: 5px solid #002b36; /* Dark background color for underline effect */
  padding-bottom: 5px; 
">
  <b>Modelling || Random Forest </b>
</h2>


In [25]:
#ML task
x=df.drop('species',axis=1)
y=df['species']
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.2,random_state=42)
model=RandomForestClassifier()
model.fit(x_train,y_train)
y_pred=model.predict(x_test)
accuracy=accuracy_score(y_test,y_pred)
print("Accuracy:",accuracy)

Accuracy: 1.0


<h2 style="
  font-size: 1.5em; 
  color: black; /* Light text color */
  margin-left: 0; 
  margin-top: 20px; 
  font-family: 'Verdana', sans-serif; 
  border-bottom: 5px solid #002b36; /* Dark background color for underline effect */
  padding-bottom: 5px; 
">
  <b>Evaluation</b>
</h2>


In [26]:
#evaluation
print(classification_report(y_test,y_pred))

              precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      1.00      1.00         9
   virginica       1.00      1.00      1.00        11

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30



In [28]:

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
fig_cm = ff.create_annotated_heatmap(conf_matrix, x=['Setosa', 'Versicolor', 'Virginica'],
                                     y=['Setosa', 'Versicolor', 'Virginica'], colorscale='Viridis')
fig_cm.update_layout(title_text='Confusion Matrix', xaxis_title='Predicted', yaxis_title='True')
fig_cm.show()

# # ROC Curve (One vs Rest for multiclass)
# from sklearn.preprocessing import label_binarize
# y_test_bin = label_binarize(y_test, classes=['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'])
# y_pred_proba = model.predict_proba(x_test)

# fig_roc = go.Figure()
# for i in range(3):
#     fpr, tpr, _ = roc_curve(y_test_bin[:, i], y_pred_proba[:, i])
#     auc_score = roc_auc_score(y_test_bin[:, i], y_pred_proba[:, i])
#     fig_roc.add_trace(go.Scatter(x=fpr, y=tpr, mode='lines',
#                                  name=f'Class {i+1} (AUC = {auc_score:.2f})'))

# fig_roc.update_layout(title_text='ROC Curve (One vs Rest)', xaxis_title='False Positive Rate', yaxis_title='True Positive Rate')
# fig_roc.show()

# Feature Importance Plot
feature_importances = model.feature_importances_
features = x.columns
fig_fi = px.bar(x=features, y=feature_importances, labels={'x':'Feature', 'y':'Importance'})
fig_fi.update_layout(title_text='Feature Importance')
fig_fi.show()


![alt text](https://cdn.pixabay.com/animation/2023/03/19/19/55/19-55-58-835_512.gif)