<a href="https://colab.research.google.com/github/sud-git/Alarm-clock/blob/main/MLT_2_iris_ml_pipeline.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Machine Learning Process Example â€“ Iris Dataset

This notebook demonstrates the **7 steps of a Machine Learning workflow** using the famous Iris dataset.

Steps covered:
1. Data Collection
2. Data Preparation
3. Choose the Model
4. Train the Model
5. Evaluate the Model
6. Parameter Tuning
7. Prediction


## 1. Data Collection
We load the Iris dataset using `sklearn.datasets.load_iris()`.

In [None]:
from sklearn.datasets import load_iris

iris = load_iris()
X, y = iris.data, iris.target
print("Features shape:", X.shape)
print("Target shape:", y.shape)
print("Target names:", iris.target_names)

Features shape: (150, 4)
Target shape: (150,)
Target names: ['setosa' 'versicolor' 'virginica']


## 2. Data Preparation
Split the dataset into training and test sets.

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, stratify=y, random_state=42
)

print("Training size:", X_train.shape[0])
print("Test size:", X_test.shape[0])

Training size: 120
Test size: 30


## 3. Choose the Model
We use **Support Vector Machine (SVM)** classifier with a pipeline (scaling included).

In [None]:
from sklearn.svm import SVC
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

pipe = Pipeline(steps=[
    ('scaler', StandardScaler()),
    ('svc', SVC(random_state=42))
])

## 4 & 5. Train and Evaluate the Model
Fit the model and check performance.

In [None]:
pipe.fit(X_train, y_train)
y_pred = pipe.predict(X_test)

from sklearn.metrics import accuracy_score, classification_report

print("Accuracy:", accuracy_score(y_test, y_pred))
print("\nClassification Report:\n", classification_report(y_test, y_pred, target_names=iris.target_names))

Accuracy: 0.9666666666666667

Classification Report:
               precision    recall  f1-score   support

      setosa       1.00      1.00      1.00        10
  versicolor       1.00      0.90      0.95        10
   virginica       0.91      1.00      0.95        10

    accuracy                           0.97        30
   macro avg       0.97      0.97      0.97        30
weighted avg       0.97      0.97      0.97        30



## 6. Parameter Tuning
Use GridSearchCV to find best hyperparameters.

In [None]:
from sklearn.model_selection import GridSearchCV

param_grid = {
    'svc__kernel': ['rbf', 'linear'],
    'svc__C': [0.1, 1, 10, 100],
    'svc__gamma': ['scale', 0.1, 0.01, 0.001]
}

grid = GridSearchCV(pipe, param_grid, cv=5, n_jobs=-1)
grid.fit(X_train, y_train)

print("Best parameters:", grid.best_params_)
print("Best CV accuracy:", grid.best_score_)
print("Test accuracy with tuned model:", grid.score(X_test, y_test))

Best parameters: {'svc__C': 0.1, 'svc__gamma': 'scale', 'svc__kernel': 'linear'}
Best CV accuracy: 0.975
Test accuracy with tuned model: 0.9333333333333333


## 7. Prediction
Predict species of a new flower sample.

In [None]:
import numpy as np

new_sample = np.array([[5.1, 3.5, 1.4, 0.2]])
pred_idx = grid.predict(new_sample)[0]
print("Predicted species:", iris.target_names[pred_idx])

Predicted species: setosa
