# Iris Flower Classification

This notebook demonstrates how to classify iris flowers into three species (setosa, versicolor, virginica) using their sepal and petal measurements. The workflow includes:
- Data loading and exploration
- Data visualization
- Data preprocessing
- Model training (Random Forest)
- Model evaluation (accuracy, classification report, confusion matrix)

The Iris dataset is a classic dataset for introductory machine learning and data visualization tasks.

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

In [None]:
# Load the dataset
iris = pd.read_csv('IRIS.csv')

# Data overview
print("First 5 rows of the dataset:")
display(iris.head())
print("\nDataset info:")
display(iris.info())
print("\nClass distribution:")
display(iris['species'].value_counts())

In [None]:
# --- Visualizations ---
plt.figure(figsize=(8, 6))
sns.countplot(x='species', data=iris)
plt.title('Count of Each Iris Species')
plt.xlabel('Species')
plt.ylabel('Count')
plt.show()

# Pairplot for all features colored by species
sns.pairplot(iris, hue='species', diag_kind='hist')
plt.suptitle('Pairplot of Features by Species', y=1.02)
plt.show()

# Boxplot for each feature by species
features = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']
plt.figure(figsize=(12, 8))
for i, feature in enumerate(features, 1):
    plt.subplot(2, 2, i)
    sns.boxplot(x='species', y=feature, data=iris)
    plt.title(f'{feature.capitalize()} by Species')
plt.tight_layout()
plt.show()

# Correlation heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(iris[features].corr(), annot=True, cmap='coolwarm', fmt='.2f')
plt.title('Feature Correlation Heatmap')
plt.show()

In [None]:
# --- Preprocess the Data ---
X = iris.drop('species', axis=1)
y = iris['species']

le = LabelEncoder()
y_encoded = le.fit_transform(y)

In [None]:
# --- Split the Dataset ---
X_train, X_test, y_train, y_test = train_test_split(X, y_encoded, test_size=0.2, random_state=42)

In [None]:
# --- Train Random Forest Classifier ---
clf = RandomForestClassifier(random_state=42)
clf.fit(X_train, y_train)

In [None]:
# --- Evaluate the Model ---
y_pred = clf.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred, target_names=le.classes_)
print(f"Accuracy: {accuracy:.2f}")
print("\nClassification Report:\n", report)

In [None]:
# --- Visualize Confusion Matrix ---
cm = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(6, 5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=le.classes_, yticklabels=le.classes_)
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

## Model Results
The model's accuracy and classification report are printed above. The confusion matrix provides a visual summary of prediction performance for each class.