
# üå∏ Iris Classification ‚Äì GridSearchCV & RandomizedSearchCV

This notebook demonstrates **end-to-end classification on the Iris dataset** using:

- Multiple ML algorithms
- **GridSearchCV**
- **RandomizedSearchCV**
- Proper **train / validation / test handling**
- Clear explanations at every step





## 1Ô∏è‚É£ Import Required Libraries
We import:
- Dataset utilities
- Model selection tools
- Evaluation metrics
- Classification algorithms


In [1]:

import numpy as np
import pandas as pd

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split, GridSearchCV, RandomizedSearchCV
from sklearn.metrics import accuracy_score

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.naive_bayes import GaussianNB



## 2Ô∏è‚É£ Load the Iris Dataset
The Iris dataset contains:
- 150 samples
- 4 features
- 3 classes


In [2]:

iris = load_iris()
X = iris.data
y = iris.target

print("Classes:", iris.target_names)
print("Shape:", X.shape)


Classes: ['setosa' 'versicolor' 'virginica']
Shape: (150, 4)



## 3Ô∏è‚É£ Train‚ÄìTest Split (Very Important)
- **Test data is locked**
- Validation is handled internally via Cross-Validation


In [3]:

X_train, X_test, y_train, y_test = train_test_split(
    X, y,
    test_size=0.2,
    random_state=42,
    stratify=y
)



## 4Ô∏è‚É£ Define Models & Hyperparameters

For **each algorithm**, we define:
- A small grid for **GridSearchCV**
- A wider range for **RandomizedSearchCV**


In [4]:

models = {
    "Logistic Regression": (
        LogisticRegression(max_iter=200),
        {"C": [0.1, 1, 10]},
        {"C": np.logspace(-2, 2, 20)}
    ),

    "KNN": (
        KNeighborsClassifier(),
        {"n_neighbors": [3, 5, 7, 9]},
        {"n_neighbors": np.arange(3, 15)}
    ),

    "Decision Tree": (
        DecisionTreeClassifier(random_state=42),
        {"max_depth": [2, 3, 4, 5]},
        {"max_depth": np.arange(2, 10),
         "min_samples_leaf": np.arange(1, 5)}
    ),

    "Random Forest": (
        RandomForestClassifier(random_state=42),
        {"n_estimators": [50, 100],
         "max_depth": [None, 3, 5]},
        {"n_estimators": np.arange(50, 200),
         "max_depth": [None, 3, 5, 7]}
    ),

    "SVM": (
        SVC(),
        {"C": [0.1, 1, 10],
         "gamma": [0.01, 0.1, 1],
         "kernel": ["rbf"]},
        {"C": np.logspace(-2, 2, 20),
         "gamma": np.logspace(-3, 1, 20),
         "kernel": ["rbf"]}
    ),

    "Naive Bayes": (
        GaussianNB(),
        {},
        {}
    )
}



## 5Ô∏è‚É£ Apply GridSearchCV and RandomizedSearchCV

For every algorithm:
- GridSearchCV ‚Üí Exhaustive, small space
- RandomizedSearchCV ‚Üí Efficient, large space


In [5]:

results = []

for name, (model, grid_params, random_params) in models.items():

    # ----- Grid Search -----
    if grid_params:
        grid = GridSearchCV(model, grid_params, cv=5, scoring="accuracy")
        grid.fit(X_train, y_train)
        best_grid_model = grid.best_estimator_
    else:
        model.fit(X_train, y_train)
        best_grid_model = model

    grid_acc = accuracy_score(y_test, best_grid_model.predict(X_test))
    results.append((name, "GridSearchCV", grid_acc))

    # ----- Randomized Search -----
    if random_params:
        random_search = RandomizedSearchCV(
            model,
            random_params,
            n_iter=10,
            cv=5,
            scoring="accuracy",
            random_state=42
        )
        random_search.fit(X_train, y_train)
        best_random_model = random_search.best_estimator_
    else:
        model.fit(X_train, y_train)
        best_random_model = model

    random_acc = accuracy_score(y_test, best_random_model.predict(X_test))
    results.append((name, "RandomizedSearchCV", random_acc))



## 6Ô∏è‚É£ Final Results Comparison


In [6]:

results_df = pd.DataFrame(
    results,
    columns=["Algorithm", "Search Method", "Test Accuracy"]
)

results_df.sort_values(by="Test Accuracy", ascending=False)


Unnamed: 0,Algorithm,Search Method,Test Accuracy
3,KNN,RandomizedSearchCV,1.0
2,KNN,GridSearchCV,1.0
1,Logistic Regression,RandomizedSearchCV,0.966667
0,Logistic Regression,GridSearchCV,0.966667
6,Random Forest,GridSearchCV,0.966667
11,Naive Bayes,RandomizedSearchCV,0.966667
10,Naive Bayes,GridSearchCV,0.966667
4,Decision Tree,GridSearchCV,0.933333
7,Random Forest,RandomizedSearchCV,0.933333
5,Decision Tree,RandomizedSearchCV,0.933333



## ‚úÖ Final Takeaways

- Test data was **never used** during tuning
- GridSearchCV is best for **small parameter spaces**
- RandomizedSearchCV is better for **speed & scalability**
- This structure is **interview-safe and production-correct**
