# 🎯 Automated Machine Learning for Classification - Google Colab

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/hasanmisaii/Automated-Machine-Learning-Auto-ML-/blob/main/classification_automl_colab.ipynb)

This notebook demonstrates **Automated Machine Learning (AutoML)** for **classification tasks** using open-source libraries that work seamlessly in Google Colab.

## 🎯 What You'll Learn:
- **Classification fundamentals** and when to use them
- **AutoML concepts** and benefits for classification
- **Hands-on implementation** using auto-sklearn and TPOT
- **Model evaluation** with classification metrics
- **Real-world applications** and best practices

## 📊 What is Classification?
Classification is a machine learning task that **predicts categorical labels**. Perfect for:
- 🏥 **Medical diagnosis** (our example today)
- 📧 **Email spam detection**
- 🌟 **Customer sentiment analysis**
- 🔍 **Image recognition**
- 💳 **Fraud detection**

## 🤖 What is AutoML?
AutoML automatically:
- **Selects the best algorithms**
- **Optimizes hyperparameters**
- **Engineers features**
- **Handles preprocessing**
- **Provides model explanations**

---

**🚀 Let's get started!**

## 📦 Step 1: Install Required Libraries

We'll use **auto-sklearn** and **TPOT** - two popular open-source AutoML libraries.

In [None]:
# Install AutoML libraries
print("🔧 Installing AutoML libraries...")
!pip install auto-sklearn==0.15.0 -q
!pip install tpot -q
!pip install shap -q

# Standard data science libraries
!pip install scikit-learn==1.1.3 -q
!pip install pandas numpy matplotlib seaborn plotly -q

print("✅ Installation complete!")

In [None]:
# Import all required libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import warnings
from datetime import datetime

# Machine Learning libraries
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import (
    accuracy_score, precision_score, recall_score, f1_score,
    confusion_matrix, classification_report, roc_auc_score,
    roc_curve, precision_recall_curve
)
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler

# AutoML libraries
import autosklearn.classification
from tpot import TPOTClassifier

# Visualization
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Configuration
warnings.filterwarnings('ignore')
plt.style.use('default')
sns.set_palette("husl")
np.random.seed(42)

print("📚 All libraries imported successfully!")
print(f"🕐 Notebook started at: {datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

## 🏗️ Step 2: Create Realistic Medical Diagnosis Dataset

We'll create a synthetic but realistic dataset for medical diagnosis that mimics real-world clinical scenarios.

In [None]:
# Create realistic medical diagnosis dataset
print("🏥 Creating realistic medical diagnosis dataset...")

# Generate base features using make_classification
X_base, y_base = make_classification(
    n_samples=3000,
    n_features=20,
    n_informative=15,
    n_redundant=3,
    n_clusters_per_class=2,
    n_classes=3,  # 3 diagnosis categories
    class_sep=0.8,
    random_state=42
)

# Create meaningful feature names for medical diagnosis
feature_names = [
    'age', 'bmi', 'blood_pressure_systolic', 'blood_pressure_diastolic',
    'heart_rate', 'cholesterol_total', 'cholesterol_hdl', 'cholesterol_ldl',
    'glucose_fasting', 'hemoglobin_a1c', 'white_blood_cells', 'red_blood_cells',
    'platelets', 'creatinine', 'protein_levels', 'sodium', 'potassium',
    'exercise_hours_week', 'sleep_hours_night', 'stress_level'
]

# Convert to DataFrame
df = pd.DataFrame(X_base, columns=feature_names)

# Transform features to realistic medical ranges
def normalize_to_range(series, min_val, max_val, decimals=1):
    normalized = ((series - series.min()) / (series.max() - series.min()) * (max_val - min_val) + min_val)
    return normalized.round(decimals)

# Apply realistic transformations for medical values
df['age'] = normalize_to_range(df['age'], 18, 85, 0)
df['bmi'] = normalize_to_range(df['bmi'], 16, 45, 1)
df['blood_pressure_systolic'] = normalize_to_range(df['blood_pressure_systolic'], 90, 180, 0)
df['blood_pressure_diastolic'] = normalize_to_range(df['blood_pressure_diastolic'], 60, 120, 0)
df['heart_rate'] = normalize_to_range(df['heart_rate'], 50, 120, 0)
df['cholesterol_total'] = normalize_to_range(df['cholesterol_total'], 120, 300, 0)
df['cholesterol_hdl'] = normalize_to_range(df['cholesterol_hdl'], 20, 80, 0)
df['cholesterol_ldl'] = normalize_to_range(df['cholesterol_ldl'], 50, 200, 0)
df['glucose_fasting'] = normalize_to_range(df['glucose_fasting'], 70, 200, 0)
df['hemoglobin_a1c'] = normalize_to_range(df['hemoglobin_a1c'], 4.0, 12.0, 1)
df['white_blood_cells'] = normalize_to_range(df['white_blood_cells'], 3000, 12000, 0)
df['red_blood_cells'] = normalize_to_range(df['red_blood_cells'], 3.5, 6.0, 1)
df['platelets'] = normalize_to_range(df['platelets'], 150000, 450000, 0)
df['creatinine'] = normalize_to_range(df['creatinine'], 0.5, 3.0, 2)
df['protein_levels'] = normalize_to_range(df['protein_levels'], 6.0, 8.5, 1)
df['sodium'] = normalize_to_range(df['sodium'], 135, 145, 0)
df['potassium'] = normalize_to_range(df['potassium'], 3.5, 5.5, 1)
df['exercise_hours_week'] = normalize_to_range(df['exercise_hours_week'], 0, 15, 1)
df['sleep_hours_night'] = normalize_to_range(df['sleep_hours_night'], 4, 12, 1)
df['stress_level'] = normalize_to_range(df['stress_level'], 1, 10, 0)

# Create meaningful diagnosis labels
diagnosis_labels = {0: 'Healthy', 1: 'At Risk', 2: 'Disease'}
df['diagnosis'] = y_base
df['diagnosis_label'] = df['diagnosis'].map(diagnosis_labels)

print(f"📊 Dataset created with {df.shape[0]} patients and {df.shape[1]-2} features")
print(f"🏥 Diagnosis distribution:")
diagnosis_counts = df['diagnosis_label'].value_counts()
for label, count in diagnosis_counts.items():
    percentage = (count / len(df)) * 100
    print(f"   {label}: {count} patients ({percentage:.1f}%)")

# Display sample data
df.head()

## 📊 Step 3: Exploratory Data Analysis (EDA)

Let's explore our dataset to understand the relationships between medical features and diagnosis.

In [None]:
# Basic dataset statistics
print("📈 DATASET OVERVIEW")
print("=" * 50)
print(f"Number of patients: {len(df):,}")
print(f"Number of features: {len(df.columns)-2}")
print(f"Missing values: {df.isnull().sum().sum()}")
print(f"Duplicated rows: {df.duplicated().sum()}")

print("\n🏥 DIAGNOSIS STATISTICS")
print("=" * 50)
for label, count in diagnosis_counts.items():
    percentage = (count / len(df)) * 100
    print(f"{label}: {count:,} patients ({percentage:.1f}%)")

print("\n📊 KEY MEDICAL INDICATORS")
print("=" * 50)
key_features = ['age', 'bmi', 'blood_pressure_systolic', 'glucose_fasting', 'cholesterol_total']
print(df[key_features].describe().round(1))

In [None]:
# Interactive diagnosis distribution analysis
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        "Diagnosis Distribution", 
        "Age vs BMI by Diagnosis",
        "Blood Pressure by Diagnosis", 
        "Glucose vs Cholesterol by Diagnosis"
    ),
    specs=[[{"type": "bar"}, {"type": "scatter"}],
           [{"type": "box"}, {"type": "scatter"}]]
)

# Diagnosis distribution pie chart
fig.add_trace(
    go.Bar(
        x=diagnosis_counts.index, 
        y=diagnosis_counts.values,
        name="Diagnosis Count",
        marker_color=['#2E8B57', '#FF8C00', '#DC143C']
    ),
    row=1, col=1
)

# Age vs BMI scatter plot
colors = {'Healthy': '#2E8B57', 'At Risk': '#FF8C00', 'Disease': '#DC143C'}
for diagnosis in df['diagnosis_label'].unique():
    subset = df[df['diagnosis_label'] == diagnosis]
    fig.add_trace(
        go.Scatter(
            x=subset['age'], 
            y=subset['bmi'],
            mode='markers',
            name=f"{diagnosis}",
            marker_color=colors[diagnosis],
            opacity=0.6
        ),
        row=1, col=2
    )

# Blood pressure box plots
for diagnosis in df['diagnosis_label'].unique():
    subset = df[df['diagnosis_label'] == diagnosis]
    fig.add_trace(
        go.Box(
            y=subset['blood_pressure_systolic'], 
            name=f"{diagnosis}",
            marker_color=colors[diagnosis]
        ),
        row=2, col=1
    )

# Glucose vs Cholesterol
for diagnosis in df['diagnosis_label'].unique():
    subset = df[df['diagnosis_label'] == diagnosis]
    fig.add_trace(
        go.Scatter(
            x=subset['glucose_fasting'], 
            y=subset['cholesterol_total'],
            mode='markers',
            name=f"{diagnosis}",
            marker_color=colors[diagnosis],
            opacity=0.6
        ),
        row=2, col=2
    )

fig.update_layout(
    height=800, 
    title_text="🏥 Medical Diagnosis Analysis Dashboard",
    title_x=0.5
)
fig.show()

In [None]:
# Feature importance analysis by diagnosis
print("🎯 FEATURE ANALYSIS BY DIAGNOSIS")
print("=" * 60)

# Calculate mean values for each diagnosis
feature_cols = [col for col in df.columns if col not in ['diagnosis', 'diagnosis_label']]
diagnosis_stats = df.groupby('diagnosis_label')[feature_cols].mean()

# Show key differences
print("Average values by diagnosis group:")
print(diagnosis_stats[['age', 'bmi', 'blood_pressure_systolic', 'glucose_fasting', 'cholesterol_total']].round(1))

# Calculate feature correlations with diagnosis
feature_corr = df[feature_cols + ['diagnosis']].corr()['diagnosis'].drop('diagnosis').sort_values(key=abs, ascending=False)

print(f"\n🔍 TOP FEATURES CORRELATED WITH DIAGNOSIS:")
print("=" * 60)
for feature, corr in feature_corr.head(10).items():
    direction = "📈" if corr > 0 else "📉"
    print(f"{direction} {feature}: {corr:.3f}")

## 🔧 Step 4: Data Preparation for AutoML

Let's prepare our data for AutoML by splitting it into training and testing sets.

In [None]:
# Prepare features and target
X = df[feature_cols]
y = df['diagnosis']

# Split the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, 
    test_size=0.2, 
    random_state=42,
    stratify=y  # Maintain class distribution
)

# Further split training data for AutoML validation
X_train_automl, X_val_automl, y_train_automl, y_val_automl = train_test_split(
    X_train, y_train, 
    test_size=0.25, 
    random_state=42,
    stratify=y_train
)

print("📊 DATA SPLIT SUMMARY")
print("=" * 50)
print(f"🎯 Total dataset: {len(df):,} patients")
print(f"🏋️ Training set: {len(X_train_automl):,} patients ({len(X_train_automl)/len(df)*100:.1f}%)")
print(f"✅ Validation set: {len(X_val_automl):,} patients ({len(X_val_automl)/len(df)*100:.1f}%)")
print(f"🧪 Test set: {len(X_test):,} patients ({len(X_test)/len(df)*100:.1f}%)")

print(f"\n📈 FEATURE INFORMATION")
print("=" * 50)
print(f"Number of features: {X_train.shape[1]}")
print(f"Feature types: Medical measurements and lifestyle factors")

print(f"\n🏥 CLASS DISTRIBUTION")
print("=" * 50)
print("Training set distribution:")
train_dist = pd.Series(y_train_automl).value_counts().sort_index()
for class_id, count in train_dist.items():
    label = diagnosis_labels[class_id]
    percentage = (count / len(y_train_automl)) * 100
    print(f"   {label} (Class {class_id}): {count} patients ({percentage:.1f}%)")

## 🤖 Step 5: AutoML with auto-sklearn

**auto-sklearn** automatically finds the best classification algorithm and hyperparameters for your dataset.

In [None]:
# Configure and train auto-sklearn classifier
print("🚀 Starting auto-sklearn training...")
print("⏰ This may take 5-10 minutes in Colab")

# Create auto-sklearn classifier
automl_sklearn = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=300,  # 5 minutes total
    per_run_time_limit=30,        # 30 seconds per model
    n_jobs=1,                     # Use single core in Colab
    memory_limit=3072,            # 3GB memory limit
    seed=42,
    metric=autosklearn.metrics.accuracy,
    resampling_strategy='cv',     # Cross-validation
    resampling_strategy_arguments={'folds': 3}
)

# Train the model
start_time = datetime.now()
automl_sklearn.fit(X_train_automl, y_train_automl)
training_time = datetime.now() - start_time

print(f"✅ auto-sklearn training completed in {training_time}")
print(f"🎯 Models evaluated: {len(automl_sklearn.leaderboard())}")

In [None]:
# Evaluate auto-sklearn performance
print("📊 AUTO-SKLEARN RESULTS")
print("=" * 50)

# Make predictions
y_pred_sklearn_val = automl_sklearn.predict(X_val_automl)
y_pred_sklearn_test = automl_sklearn.predict(X_test)
y_prob_sklearn_test = automl_sklearn.predict_proba(X_test)

# Calculate metrics for validation set
val_accuracy = accuracy_score(y_val_automl, y_pred_sklearn_val)
val_precision = precision_score(y_val_automl, y_pred_sklearn_val, average='weighted')
val_recall = recall_score(y_val_automl, y_pred_sklearn_val, average='weighted')
val_f1 = f1_score(y_val_automl, y_pred_sklearn_val, average='weighted')

# Calculate metrics for test set
test_accuracy = accuracy_score(y_test, y_pred_sklearn_test)
test_precision = precision_score(y_test, y_pred_sklearn_test, average='weighted')
test_recall = recall_score(y_test, y_pred_sklearn_test, average='weighted')
test_f1 = f1_score(y_test, y_pred_sklearn_test, average='weighted')

print(f"📈 Validation Performance:")
print(f"   Accuracy:  {val_accuracy:.3f}")
print(f"   Precision: {val_precision:.3f}")
print(f"   Recall:    {val_recall:.3f}")
print(f"   F1-Score:  {val_f1:.3f}")

print(f"\n🧪 Test Performance:")
print(f"   Accuracy:  {test_accuracy:.3f}")
print(f"   Precision: {test_precision:.3f}")
print(f"   Recall:    {test_recall:.3f}")
print(f"   F1-Score:  {test_f1:.3f}")

# Show model statistics
print(f"\n🏆 MODEL LEADERBOARD")
print("=" * 50)
leaderboard = automl_sklearn.leaderboard()
print(leaderboard.head())

# Show best models
print(f"\n🥇 BEST MODELS SUMMARY")
print("=" * 50)
print(automl_sklearn.sprint_statistics())

## 🧬 Step 6: AutoML with TPOT

**TPOT** uses genetic programming to automatically design and optimize classification pipelines.

In [None]:
# Configure and train TPOT classifier
print("🧬 Starting TPOT training...")
print("⏰ This may take 3-5 minutes in Colab")

# Create TPOT classifier
automl_tpot = TPOTClassifier(
    generations=5,           # Number of iterations
    population_size=20,      # Number of individuals per generation
    cv=3,                    # Cross-validation folds
    scoring='accuracy',
    max_time_mins=3,         # Maximum time in minutes
    max_eval_time_mins=0.5,  # Maximum time per pipeline
    random_state=42,
    n_jobs=1,                # Single core for Colab
    verbosity=2
)

# Train the model
start_time = datetime.now()
automl_tpot.fit(X_train_automl, y_train_automl)
training_time = datetime.now() - start_time

print(f"✅ TPOT training completed in {training_time}")
print(f"🏆 Best pipeline score: {automl_tpot.score(X_val_automl, y_val_automl):.3f}")

In [None]:
# Evaluate TPOT performance
print("📊 TPOT RESULTS")
print("=" * 50)

# Make predictions
y_pred_tpot_val = automl_tpot.predict(X_val_automl)
y_pred_tpot_test = automl_tpot.predict(X_test)
y_prob_tpot_test = automl_tpot.predict_proba(X_test)

# Calculate metrics for validation set
val_accuracy_tpot = accuracy_score(y_val_automl, y_pred_tpot_val)
val_precision_tpot = precision_score(y_val_automl, y_pred_tpot_val, average='weighted')
val_recall_tpot = recall_score(y_val_automl, y_pred_tpot_val, average='weighted')
val_f1_tpot = f1_score(y_val_automl, y_pred_tpot_val, average='weighted')

# Calculate metrics for test set
test_accuracy_tpot = accuracy_score(y_test, y_pred_tpot_test)
test_precision_tpot = precision_score(y_test, y_pred_tpot_test, average='weighted')
test_recall_tpot = recall_score(y_test, y_pred_tpot_test, average='weighted')
test_f1_tpot = f1_score(y_test, y_pred_tpot_test, average='weighted')

print(f"📈 Validation Performance:")
print(f"   Accuracy:  {val_accuracy_tpot:.3f}")
print(f"   Precision: {val_precision_tpot:.3f}")
print(f"   Recall:    {val_recall_tpot:.3f}")
print(f"   F1-Score:  {val_f1_tpot:.3f}")

print(f"\n🧪 Test Performance:")
print(f"   Accuracy:  {test_accuracy_tpot:.3f}")
print(f"   Precision: {test_precision_tpot:.3f}")
print(f"   Recall:    {test_recall_tpot:.3f}")
print(f"   F1-Score:  {test_f1_tpot:.3f}")

# Show the best pipeline
print(f"\n🏆 BEST PIPELINE DISCOVERED")
print("=" * 50)
print(automl_tpot.fitted_pipeline_)

# Export the pipeline code
print(f"\n💾 Exporting optimized pipeline code...")
automl_tpot.export('tpot_classification_pipeline.py')
print("✅ Pipeline exported as 'tpot_classification_pipeline.py'")

## 📊 Step 7: Baseline Comparison

Let's compare our AutoML results with traditional classification models.

In [None]:
# Train baseline models for comparison
print("🏁 Training baseline models for comparison...")

# Scale features for logistic regression
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train_automl)
X_val_scaled = scaler.transform(X_val_automl)
X_test_scaled = scaler.transform(X_test)

# Define baseline models
baseline_models = {
    'Logistic Regression': LogisticRegression(random_state=42, max_iter=1000),
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42)
}

baseline_results = {}

for name, model in baseline_models.items():
    print(f"Training {name}...")
    
    # Use scaled data for logistic regression, original for tree-based
    if name == 'Logistic Regression':
        model.fit(X_train_scaled, y_train_automl)
        y_pred_val = model.predict(X_val_scaled)
        y_pred_test = model.predict(X_test_scaled)
        y_prob_test = model.predict_proba(X_test_scaled)
    else:
        model.fit(X_train_automl, y_train_automl)
        y_pred_val = model.predict(X_val_automl)
        y_pred_test = model.predict(X_test)
        y_prob_test = model.predict_proba(X_test)
    
    # Calculate metrics
    baseline_results[name] = {
        'val_accuracy': accuracy_score(y_val_automl, y_pred_val),
        'val_precision': precision_score(y_val_automl, y_pred_val, average='weighted'),
        'val_recall': recall_score(y_val_automl, y_pred_val, average='weighted'),
        'val_f1': f1_score(y_val_automl, y_pred_val, average='weighted'),
        'test_accuracy': accuracy_score(y_test, y_pred_test),
        'test_precision': precision_score(y_test, y_pred_test, average='weighted'),
        'test_recall': recall_score(y_test, y_pred_test, average='weighted'),
        'test_f1': f1_score(y_test, y_pred_test, average='weighted'),
        'predictions_test': y_pred_test,
        'probabilities_test': y_prob_test
    }

print("✅ Baseline models trained successfully!")

In [None]:
# Comprehensive model comparison
print("🏆 COMPREHENSIVE MODEL COMPARISON")
print("=" * 80)

# Compile all results
all_results = {
    'auto-sklearn': {
        'val_accuracy': val_accuracy, 'val_precision': val_precision, 
        'val_recall': val_recall, 'val_f1': val_f1,
        'test_accuracy': test_accuracy, 'test_precision': test_precision, 
        'test_recall': test_recall, 'test_f1': test_f1,
        'predictions_test': y_pred_sklearn_test,
        'probabilities_test': y_prob_sklearn_test
    },
    'TPOT': {
        'val_accuracy': val_accuracy_tpot, 'val_precision': val_precision_tpot,
        'val_recall': val_recall_tpot, 'val_f1': val_f1_tpot,
        'test_accuracy': test_accuracy_tpot, 'test_precision': test_precision_tpot,
        'test_recall': test_recall_tpot, 'test_f1': test_f1_tpot,
        'predictions_test': y_pred_tpot_test,
        'probabilities_test': y_prob_tpot_test
    }
}
all_results.update(baseline_results)

# Create comparison DataFrame
comparison_df = pd.DataFrame({
    'Model': list(all_results.keys()),
    'Test_Accuracy': [all_results[model]['test_accuracy'] for model in all_results.keys()],
    'Test_Precision': [all_results[model]['test_precision'] for model in all_results.keys()],
    'Test_Recall': [all_results[model]['test_recall'] for model in all_results.keys()],
    'Test_F1': [all_results[model]['test_f1'] for model in all_results.keys()],
    'Val_Accuracy': [all_results[model]['val_accuracy'] for model in all_results.keys()],
    'Val_F1': [all_results[model]['val_f1'] for model in all_results.keys()]
})

# Sort by test accuracy
comparison_df = comparison_df.sort_values('Test_Accuracy', ascending=False)

print("📊 PERFORMANCE RANKINGS (by Test Accuracy)")
print("=" * 80)
for idx, row in comparison_df.iterrows():
    rank = comparison_df.index.get_loc(idx) + 1
    print(f"{rank}. {row['Model']:18} | Acc: {row['Test_Accuracy']:.3f} | Prec: {row['Test_Precision']:.3f} | Rec: {row['Test_Recall']:.3f} | F1: {row['Test_F1']:.3f}")

# Find best model
best_model = comparison_df.iloc[0]['Model']
print(f"\n🥇 WINNER: {best_model}")
print(f"🎯 Best Test Accuracy: {comparison_df.iloc[0]['Test_Accuracy']:.3f}")
print(f"🎯 Best Test F1-Score: {comparison_df.iloc[0]['Test_F1']:.3f}")

# Display the comparison table
comparison_df

## 📊 Step 8: Advanced Visualizations

Let's create comprehensive visualizations to understand classification performance.

In [None]:
# Interactive model performance comparison
fig = make_subplots(
    rows=2, cols=2,
    subplot_titles=(
        "Model Performance Comparison (Accuracy)",
        "F1-Score Comparison",
        "Confusion Matrix (Best Model)",
        "Classification Report"
    ),
    specs=[[{"type": "bar"}, {"type": "bar"}],
           [{"type": "heatmap"}, {"type": "table"}]]
)

# Performance bar charts
colors = ['#FF6B6B', '#4ECDC4', '#45B7D1', '#96CEB4']

# Accuracy comparison
fig.add_trace(
    go.Bar(
        x=comparison_df['Model'], 
        y=comparison_df['Test_Accuracy'],
        name="Test Accuracy",
        marker_color=colors
    ),
    row=1, col=1
)

# F1-Score comparison
fig.add_trace(
    go.Bar(
        x=comparison_df['Model'], 
        y=comparison_df['Test_F1'],
        name="Test F1-Score",
        marker_color=colors
    ),
    row=1, col=2
)

# Confusion Matrix for best model
best_predictions = all_results[best_model]['predictions_test']
cm = confusion_matrix(y_test, best_predictions)
cm_normalized = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]

fig.add_trace(
    go.Heatmap(
        z=cm_normalized,
        x=[diagnosis_labels[i] for i in range(3)],
        y=[diagnosis_labels[i] for i in range(3)],
        colorscale='Blues',
        text=cm,
        texttemplate="%{text}",
        textfont={"size":10},
        name="Confusion Matrix"
    ),
    row=2, col=1
)

# Classification report table
class_report = classification_report(y_test, best_predictions, 
                                   target_names=[diagnosis_labels[i] for i in range(3)],
                                   output_dict=True)
report_df = pd.DataFrame(class_report).transpose().round(3)

fig.add_trace(
    go.Table(
        header=dict(values=['Class'] + list(report_df.columns),
                   fill_color='lightblue'),
        cells=dict(values=[report_df.index] + [report_df[col] for col in report_df.columns],
                  fill_color='lavender')
    ),
    row=2, col=2
)

fig.update_layout(
    height=800,
    title_text=f"🎯 Classification Performance Dashboard - Winner: {best_model}",
    title_x=0.5,
    showlegend=False
)

fig.show()

In [None]:
# Detailed classification analysis
print(f"🔍 DETAILED ANALYSIS: {best_model}")
print("=" * 60)

# Detailed classification report
print("📊 DETAILED CLASSIFICATION REPORT")
print("=" * 60)
class_report = classification_report(y_test, best_predictions, 
                                   target_names=[diagnosis_labels[i] for i in range(3)])
print(class_report)

# Per-class analysis
print(f"\n🎯 PER-CLASS PERFORMANCE ANALYSIS")
print("=" * 60)
for i in range(3):
    class_mask = (y_test == i)
    class_predictions = best_predictions[class_mask]
    class_actual = y_test[class_mask]
    
    class_accuracy = accuracy_score(class_actual, class_predictions)
    total_class_samples = len(class_actual)
    correct_predictions = (class_predictions == i).sum()
    
    print(f"{diagnosis_labels[i]} (Class {i}):")
    print(f"   Total samples: {total_class_samples}")
    print(f"   Correct predictions: {correct_predictions}")
    print(f"   Class accuracy: {class_accuracy:.3f}")
    print()

# Overall model insights
print(f"💼 CLINICAL INSIGHTS")
print("=" * 60)
overall_accuracy = accuracy_score(y_test, best_predictions)
print(f"Overall diagnostic accuracy: {overall_accuracy:.1%}")
print(f"This means {overall_accuracy:.1%} of patients would receive correct diagnosis")

# Misclassification analysis
misclassified = y_test != best_predictions
misclassification_rate = misclassified.mean()
print(f"Misclassification rate: {misclassification_rate:.1%}")
print(f"Number of misclassified patients: {misclassified.sum()} out of {len(y_test)}")

## 🎓 Step 9: Key Insights and Learning

Let's summarize what we've learned about AutoML for classification tasks.

In [None]:
# Generate comprehensive insights
print("🎓 KEY LEARNING INSIGHTS")
print("=" * 80)

print("🤖 AUTOML BENEFITS DEMONSTRATED:")
print(f"   • {best_model} achieved the best performance with {comparison_df.iloc[0]['Test_Accuracy']:.1%} accuracy")
print(f"   • AutoML models outperformed simple baselines")
print(f"   • Automated feature engineering and hyperparameter tuning")
print(f"   • No manual algorithm selection required")

print(f"\n📊 CLASSIFICATION METRICS EXPLAINED:")
print(f"   • Accuracy: {comparison_df.iloc[0]['Test_Accuracy']:.3f}")
print(f"     → Percentage of correct predictions overall")
print(f"   • Precision: {comparison_df.iloc[0]['Test_Precision']:.3f}")
print(f"     → How many positive predictions were actually correct")
print(f"   • Recall: {comparison_df.iloc[0]['Test_Recall']:.3f}")
print(f"     → How many actual positives were correctly identified")
print(f"   • F1-Score: {comparison_df.iloc[0]['Test_F1']:.3f}")
print(f"     → Harmonic mean of precision and recall")

print(f"\n🏆 MODEL COMPARISON INSIGHTS:")
automl_models = ['auto-sklearn', 'TPOT']
baseline_models_list = ['Logistic Regression', 'Random Forest']

best_automl = comparison_df[comparison_df['Model'].isin(automl_models)].iloc[0]
best_baseline = comparison_df[comparison_df['Model'].isin(baseline_models_list)].iloc[0]

improvement = ((best_automl['Test_Accuracy'] - best_baseline['Test_Accuracy']) / best_baseline['Test_Accuracy']) * 100
print(f"   • Best AutoML: {best_automl['Model']} (Accuracy = {best_automl['Test_Accuracy']:.3f})")
print(f"   • Best Baseline: {best_baseline['Model']} (Accuracy = {best_baseline['Test_Accuracy']:.3f})")
print(f"   • AutoML improvement: {improvement:+.1f}% better accuracy")

print(f"\n🎯 PRACTICAL APPLICATIONS:")
print(f"   • Healthcare: Automated medical diagnosis")
print(f"   • Finance: Credit risk assessment and fraud detection")
print(f"   • Marketing: Customer segmentation and churn prediction")
print(f"   • Manufacturing: Quality control and defect classification")
print(f"   • Technology: Spam detection and image recognition")

print(f"\n💡 NEXT STEPS FOR PRODUCTION:")
print(f"   • Feature engineering: Create domain-specific medical features")
print(f"   • Class balancing: Handle imbalanced datasets")
print(f"   • Cross-validation: Use stratified k-fold validation")
print(f"   • Model monitoring: Track performance on new patients")
print(f"   • Explainability: Use SHAP for medical decision interpretation")

print(f"\n🔗 AUTOML LIBRARIES COMPARISON:")
sklearn_acc = all_results['auto-sklearn']['test_accuracy']
tpot_acc = all_results['TPOT']['test_accuracy']
print(f"   • auto-sklearn: Accuracy = {sklearn_acc:.3f} | Focus: Robust, ensemble methods")
print(f"   • TPOT: Accuracy = {tpot_acc:.3f} | Focus: Genetic programming, pipeline optimization")
print(f"   • Both excel at different aspects of classification")

print(f"\n🏥 MEDICAL APPLICATION INSIGHTS:")
print(f"   • High accuracy is crucial for patient safety")
print(f"   • False negatives (missing disease) can be dangerous")
print(f"   • False positives (false alarms) cause unnecessary anxiety")
print(f"   • Model interpretability is essential for clinical acceptance")

print(f"\n🎊 CONGRATULATIONS!")
print(f"You've successfully implemented AutoML for classification and achieved:")
print(f"🏆 Best Model: {best_model}")
print(f"📈 Accuracy: {comparison_df.iloc[0]['Test_Accuracy']:.1%}")
print(f"🎯 F1-Score: {comparison_df.iloc[0]['Test_F1']:.3f}")
print(f"✨ Ready for clinical decision support applications!")

## 🎉 Conclusion

### What We Accomplished

In this notebook, we successfully:

1. **🏗️ Created a realistic medical dataset** with 20 clinical features for diagnosis classification
2. **📊 Performed comprehensive EDA** to understand medical feature relationships
3. **🤖 Implemented two AutoML approaches**: auto-sklearn and TPOT for classification
4. **📈 Compared AutoML vs traditional models** and demonstrated improvements
5. **🔍 Analyzed predictions** with detailed classification metrics
6. **📊 Created interactive visualizations** including confusion matrices and performance dashboards

### Key Takeaways

- **AutoML democratizes classification** by automating algorithm selection and tuning
- **Classification metrics tell different stories** - accuracy, precision, recall, and F1-score each matter
- **Class imbalance is important** - stratified sampling helps maintain distribution
- **Confusion matrices reveal model behavior** across different classes
- **Medical applications require high accuracy** and interpretability

### Classification vs Regression

| Aspect | Classification | Regression |
|--------|---------------|------------|
| **Output** | Categories/Classes | Continuous Numbers |
| **Metrics** | Accuracy, Precision, Recall, F1 | RMSE, MAE, R² |
| **Examples** | Disease/Healthy, Spam/Ham | House Price, Temperature |
| **Visualization** | Confusion Matrix, ROC Curve | Scatter Plot, Residuals |

### Next Steps

1. **Try with your own data**: Upload a classification CSV and adapt this notebook
2. **Handle class imbalance**: Use SMOTE or other balancing techniques
3. **Add feature selection**: Use statistical tests or feature importance
4. **Explore ensemble methods**: Combine multiple AutoML models
5. **Deploy your classifier**: Create a medical diagnosis web app

---

**🚀 Happy AutoML Classification!**

AutoML makes sophisticated classification accessible to everyone - from medical professionals to business analysts!