# üå∏ Iris Classification - ML Pipeline Builder Example

This notebook was **auto-generated** from the ML Pipeline Builder visual interface.

## Overview
- **Dataset**: Iris flower dataset (classic ML example)
- **Task**: Multi-class classification
- **Model**: Random Forest Classifier
- **Pipeline**: Data Loading ‚Üí Train/Test Split ‚Üí Scaling ‚Üí Training ‚Üí Evaluation

Generated on: 2025ÎÖÑ 11Ïõî 12Ïùº

## Step 1: Import Required Libraries

First, we import all necessary libraries for data processing, machine learning, and visualization.

In [None]:
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix
import seaborn as sns
import matplotlib.pyplot as plt

# Set random seed for reproducibility
np.random.seed(42)

print("‚úÖ All libraries imported successfully!")

## Step 2: Load Data

Load the Iris dataset from sklearn's built-in datasets.

In [None]:
from sklearn.datasets import load_iris

# Load Iris dataset
iris = load_iris()
data = pd.DataFrame(iris.data, columns=iris.feature_names)
data['target'] = iris.target

print(f"üìä Data loaded: {data.shape}")
print(f"\nFeatures: {iris.feature_names}")
print(f"Target classes: {iris.target_names}")
print(f"\nFirst 5 rows:")
data.head()

## Step 3: Train/Test Split

Split the data into training (80%) and testing (20%) sets.

In [None]:
X = data.drop('target', axis=1)
y = data['target']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

print(f"‚úÇÔ∏è Train size: {len(X_train)}")
print(f"‚úÇÔ∏è Test size: {len(X_test)}")
print(f"\nTraining data shape: {X_train.shape}")
print(f"Testing data shape: {X_test.shape}")

## Step 4: Scale Features

Standardize features by removing the mean and scaling to unit variance.

In [None]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

print("‚öñÔ∏è Features scaled using StandardScaler")
print(f"\nOriginal mean: {X_train.mean().values}")
print(f"Scaled mean: {X_train_scaled.mean(axis=0)}")
print(f"\nOriginal std: {X_train.std().values}")
print(f"Scaled std: {X_train_scaled.std(axis=0)}")

## Step 5: Train Classifier

Train a Random Forest Classifier with 100 estimators.

In [None]:
model = RandomForestClassifier(n_estimators=100, random_state=42)
model.fit(X_train_scaled, y_train)

print("üå≤ Model trained: RandomForest")
print(f"Number of trees: {model.n_estimators}")
print(f"Number of features: {model.n_features_in_}")
print(f"\n Feature importances:")
for i, importance in enumerate(model.feature_importances_):
    print(f"  {iris.feature_names[i]}: {importance:.4f}")

## Step 6: Evaluate Model

Evaluate the model performance on the test set.

In [None]:
y_pred = model.predict(X_test_scaled)
accuracy = accuracy_score(y_test, y_pred)

print(f"üéØ Accuracy: {accuracy:.4f}")
print(f"   That's {accuracy*100:.2f}%!")
print("\nüìä Classification Report:")
print(classification_report(y_test, y_pred, target_names=iris.target_names))

print("\nüî¢ Confusion Matrix:")
cm = confusion_matrix(y_test, y_pred)
print(cm)

## Step 7: Visualize Results

Create visualizations of the confusion matrix and feature importances.

In [None]:
fig, axes = plt.subplots(1, 2, figsize=(14, 5))

# Confusion Matrix Heatmap
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=iris.target_names, 
            yticklabels=iris.target_names, 
            ax=axes[0])
axes[0].set_title('Confusion Matrix')
axes[0].set_ylabel('True Label')
axes[0].set_xlabel('Predicted Label')

# Feature Importance Bar Plot
feature_importance = pd.DataFrame({
    'Feature': iris.feature_names,
    'Importance': model.feature_importances_
}).sort_values('Importance', ascending=False)

axes[1].barh(feature_importance['Feature'], feature_importance['Importance'], color='skyblue')
axes[1].set_title('Feature Importances')
axes[1].set_xlabel('Importance Score')

plt.tight_layout()
plt.show()

print("‚úÖ Pipeline Complete!")

## üéì What You Learned

This notebook demonstrated a complete ML pipeline:

1. **Data Loading**: Using sklearn's built-in datasets
2. **Data Splitting**: 80/20 train/test split
3. **Feature Scaling**: StandardScaler for normalization
4. **Model Training**: Random Forest Classifier
5. **Evaluation**: Accuracy, Classification Report, Confusion Matrix
6. **Visualization**: Heatmaps and bar plots

### üöÄ Next Steps

- Try different algorithms (Logistic Regression, SVM)
- Adjust hyperparameters (n_estimators, max_depth)
- Use your own CSV data
- Add feature selection
- Implement cross-validation

---

**Generated by ML Pipeline Builder** üß†  
Create your own visual ML pipelines at: https://github.com/enderpawar/2025_oss_term_project-22101203_-