# Week 3: Model Comparison Lab
## Side-by-Side Algorithm Comparison

### Learning Objectives
By the end of this lab, you will be able to:
- Compare multiple machine learning algorithms on the same dataset
- Evaluate model performance using various metrics
- Visualize and interpret comparison results
- Make informed decisions about algorithm selection
- Understand the trade-offs between different algorithms

### Dataset Overview
We'll use the classic **Wine Quality** dataset to compare different classification algorithms. This dataset contains physicochemical properties of wines and their quality ratings.

## Part 1: Setup and Data Loading

In [None]:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, cross_val_score, StratifiedKFold
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, classification_report
import warnings
warnings.filterwarnings('ignore')

# Set style for better plots
plt.style.use('seaborn-v0_8')
sns.set_palette("husl")

print("Libraries imported successfully!")

In [None]:
# Load the wine quality dataset
# You can download from: https://archive.ics.uci.edu/ml/datasets/wine+quality
# For this lab, we'll create a sample dataset

from sklearn.datasets import load_wine

# Load the wine dataset
wine_data = load_wine()
X = pd.DataFrame(wine_data.data, columns=wine_data.feature_names)
y = wine_data.target

# Display basic information about the dataset
print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(np.unique(y))}")
print(f"Class distribution: {np.bincount(y)}")

# Display first few rows
print("\nFirst 5 rows:")
display(X.head())

print("\nTarget classes:")
print(f"Classes: {wine_data.target_names}")

## Part 2: Exploratory Data Analysis

In [None]:
# Basic statistics
print("Dataset Info:")
print(X.info())

print("\nBasic Statistics:")
display(X.describe())

# Check for missing values
print(f"\nMissing values: {X.isnull().sum().sum()}")

In [None]:
# Visualize class distribution
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
unique, counts = np.unique(y, return_counts=True)
plt.bar(wine_data.target_names, counts, color=['#FF9999', '#66B2FF', '#99FF99'])
plt.title('Class Distribution')
plt.ylabel('Count')
plt.xticks(rotation=45)

plt.subplot(1, 2, 2)
plt.pie(counts, labels=wine_data.target_names, autopct='%1.1f%%', startangle=90)
plt.title('Class Distribution (Percentage)')

plt.tight_layout()
plt.show()

In [None]:
# Feature correlation heatmap
plt.figure(figsize=(14, 10))
correlation_matrix = X.corr()
sns.heatmap(correlation_matrix, annot=False, cmap='coolwarm', center=0)
plt.title('Feature Correlation Heatmap')
plt.tight_layout()
plt.show()

## Part 3: Data Preprocessing

In [None]:
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.3, random_state=42, stratify=y
)

print(f"Training set size: {X_train.shape[0]}")
print(f"Testing set size: {X_test.shape[0]}")
print(f"Training class distribution: {np.bincount(y_train)}")
print(f"Testing class distribution: {np.bincount(y_test)}")

In [None]:
# Scale the features
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Convert back to DataFrame for easier handling
X_train_scaled = pd.DataFrame(X_train_scaled, columns=X.columns)
X_test_scaled = pd.DataFrame(X_test_scaled, columns=X.columns)

print("Feature scaling completed!")
print(f"Original feature means: {X_train.mean().round(2).values[:5]}...")
print(f"Scaled feature means: {X_train_scaled.mean().round(2).values[:5]}...")

## Part 4: Algorithm Implementation and Comparison

We'll compare the following algorithms:
1. **Logistic Regression** - Linear classifier
2. **Decision Tree** - Tree-based classifier
3. **Random Forest** - Ensemble method
4. **Support Vector Machine (SVM)** - Kernel-based classifier
5. **K-Nearest Neighbors (KNN)** - Instance-based classifier
6. **Naive Bayes** - Probabilistic classifier

In [None]:
# Import algorithms
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.svm import SVC
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import roc_auc_score
from sklearn.preprocessing import label_binarize
import time

# Initialize algorithms with default parameters
algorithms = {
    'Logistic Regression': LogisticRegression(random_state=42, max_iter=1000),
    'Decision Tree': DecisionTreeClassifier(random_state=42),
    'Random Forest': RandomForestClassifier(random_state=42, n_estimators=100),
    'SVM': SVC(random_state=42, probability=True),
    'K-Nearest Neighbors': KNeighborsClassifier(n_neighbors=5),
    'Naive Bayes': GaussianNB()
}

print(f"Initialized {len(algorithms)} algorithms for comparison")

In [None]:
# Function to evaluate a single algorithm
def evaluate_algorithm(name, algorithm, X_train, X_test, y_train, y_test):
    """
    Evaluate a single algorithm and return performance metrics
    """
    print(f"\nEvaluating {name}...")
    
    # Measure training time
    start_time = time.time()
    algorithm.fit(X_train, y_train)
    training_time = time.time() - start_time
    
    # Measure prediction time
    start_time = time.time()
    y_pred = algorithm.predict(X_test)
    prediction_time = time.time() - start_time
    
    # Calculate metrics
    accuracy = accuracy_score(y_test, y_pred)
    precision = precision_score(y_test, y_pred, average='weighted')
    recall = recall_score(y_test, y_pred, average='weighted')
    f1 = f1_score(y_test, y_pred, average='weighted')
    
    # Cross-validation score
    cv_scores = cross_val_score(algorithm, X_train, y_train, cv=5, scoring='accuracy')
    cv_mean = cv_scores.mean()
    cv_std = cv_scores.std()
    
    return {
        'Algorithm': name,
        'Accuracy': accuracy,
        'Precision': precision,
        'Recall': recall,
        'F1-Score': f1,
        'CV_Mean': cv_mean,
        'CV_Std': cv_std,
        'Training_Time': training_time,
        'Prediction_Time': prediction_time,
        'Model': algorithm
    }

print("Evaluation function defined successfully!")

In [None]:
# Evaluate all algorithms
results = []

for name, algorithm in algorithms.items():
    result = evaluate_algorithm(name, algorithm, X_train_scaled, X_test_scaled, y_train, y_test)
    results.append(result)

print("\nAll algorithms evaluated successfully!")

## Part 5: Results Analysis and Visualization

In [None]:
# Create results DataFrame
results_df = pd.DataFrame(results)
results_df = results_df.drop('Model', axis=1)  # Remove model objects for display

# Sort by accuracy
results_df = results_df.sort_values('Accuracy', ascending=False).reset_index(drop=True)

print("\n=== ALGORITHM COMPARISON RESULTS ===")
print("\nPerformance Metrics:")
display(results_df.round(4))

In [None]:
# Visualize performance metrics
fig, axes = plt.subplots(2, 2, figsize=(15, 12))

# Accuracy comparison
axes[0, 0].bar(results_df['Algorithm'], results_df['Accuracy'], color='skyblue')
axes[0, 0].set_title('Accuracy Comparison')
axes[0, 0].set_ylabel('Accuracy')
axes[0, 0].tick_params(axis='x', rotation=45)
axes[0, 0].set_ylim(0, 1)

# F1-Score comparison
axes[0, 1].bar(results_df['Algorithm'], results_df['F1-Score'], color='lightcoral')
axes[0, 1].set_title('F1-Score Comparison')
axes[0, 1].set_ylabel('F1-Score')
axes[0, 1].tick_params(axis='x', rotation=45)
axes[0, 1].set_ylim(0, 1)

# Cross-validation scores with error bars
axes[1, 0].bar(results_df['Algorithm'], results_df['CV_Mean'], 
               yerr=results_df['CV_Std'], capsize=5, color='lightgreen')
axes[1, 0].set_title('Cross-Validation Accuracy (Mean ± Std)')
axes[1, 0].set_ylabel('CV Accuracy')
axes[1, 0].tick_params(axis='x', rotation=45)
axes[1, 0].set_ylim(0, 1)

# Training time comparison
axes[1, 1].bar(results_df['Algorithm'], results_df['Training_Time'], color='gold')
axes[1, 1].set_title('Training Time Comparison')
axes[1, 1].set_ylabel('Training Time (seconds)')
axes[1, 1].tick_params(axis='x', rotation=45)

plt.tight_layout()
plt.show()

In [None]:
# Radar chart for comprehensive comparison
from math import pi

# Select top 4 algorithms for radar chart
top_algorithms = results_df.head(4)

# Metrics for radar chart
metrics = ['Accuracy', 'Precision', 'Recall', 'F1-Score']
num_metrics = len(metrics)

# Create angles for each metric
angles = [n / float(num_metrics) * 2 * pi for n in range(num_metrics)]
angles += angles[:1]  # Complete the circle

# Create radar chart
fig, ax = plt.subplots(figsize=(10, 8), subplot_kw=dict(projection='polar'))

colors = ['red', 'blue', 'green', 'orange']

for i, (_, row) in enumerate(top_algorithms.iterrows()):
    values = [row[metric] for metric in metrics]
    values += values[:1]  # Complete the circle
    
    ax.plot(angles, values, 'o-', linewidth=2, label=row['Algorithm'], color=colors[i])
    ax.fill(angles, values, alpha=0.25, color=colors[i])

# Customize the chart
ax.set_xticks(angles[:-1])
ax.set_xticklabels(metrics)
ax.set_ylim(0, 1)
ax.set_title('Top 4 Algorithms - Performance Radar Chart', size=16, y=1.1)
ax.legend(loc='upper right', bbox_to_anchor=(1.3, 1.0))
ax.grid(True)

plt.tight_layout()
plt.show()

## Part 6: Detailed Analysis of Best Performing Algorithm

In [None]:
# Get the best performing algorithm
best_algorithm_name = results_df.iloc[0]['Algorithm']
best_algorithm = next(result['Model'] for result in results if result['Algorithm'] == best_algorithm_name)

print(f"\n=== DETAILED ANALYSIS: {best_algorithm_name} ===")

# Predictions
y_pred_best = best_algorithm.predict(X_test_scaled)

# Detailed classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred_best, target_names=wine_data.target_names))

In [None]:
# Confusion Matrix
cm = confusion_matrix(y_test, y_pred_best)

plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=wine_data.target_names, 
            yticklabels=wine_data.target_names)
plt.title(f'Confusion Matrix - {best_algorithm_name}')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()

# Calculate per-class accuracy
per_class_accuracy = cm.diagonal() / cm.sum(axis=1)
print("\nPer-class Accuracy:")
for i, acc in enumerate(per_class_accuracy):
    print(f"{wine_data.target_names[i]}: {acc:.3f}")

## Part 7: Feature Importance Analysis

In [None]:
# Feature importance (for tree-based algorithms)
if hasattr(best_algorithm, 'feature_importances_'):
    feature_importance = pd.DataFrame({
        'feature': X.columns,
        'importance': best_algorithm.feature_importances_
    }).sort_values('importance', ascending=False)
    
    plt.figure(figsize=(12, 8))
    sns.barplot(data=feature_importance.head(10), x='importance', y='feature')
    plt.title(f'Top 10 Feature Importances - {best_algorithm_name}')
    plt.xlabel('Importance')
    plt.tight_layout()
    plt.show()
    
    print("\nTop 10 Most Important Features:")
    display(feature_importance.head(10))
    
elif hasattr(best_algorithm, 'coef_'):
    # For linear models, show coefficient magnitudes
    coef_df = pd.DataFrame({
        'feature': X.columns,
        'coef_magnitude': np.abs(best_algorithm.coef_[0])
    }).sort_values('coef_magnitude', ascending=False)
    
    plt.figure(figsize=(12, 8))
    sns.barplot(data=coef_df.head(10), x='coef_magnitude', y='feature')
    plt.title(f'Top 10 Feature Coefficient Magnitudes - {best_algorithm_name}')
    plt.xlabel('Coefficient Magnitude')
    plt.tight_layout()
    plt.show()
    
    print("\nTop 10 Features by Coefficient Magnitude:")
    display(coef_df.head(10))
else:
    print(f"Feature importance not available for {best_algorithm_name}")

## Part 8: Learning Curves

In [None]:
# Learning curves for top 3 algorithms
from sklearn.model_selection import learning_curve

top_3_algorithms = results_df.head(3)

fig, axes = plt.subplots(1, 3, figsize=(18, 5))

for i, (_, row) in enumerate(top_3_algorithms.iterrows()):
    algorithm_name = row['Algorithm']
    algorithm = next(result['Model'] for result in results if result['Algorithm'] == algorithm_name)
    
    train_sizes, train_scores, val_scores = learning_curve(
        algorithm, X_train_scaled, y_train, cv=5, 
        train_sizes=np.linspace(0.1, 1.0, 10),
        scoring='accuracy', n_jobs=-1
    )
    
    train_mean = np.mean(train_scores, axis=1)
    train_std = np.std(train_scores, axis=1)
    val_mean = np.mean(val_scores, axis=1)
    val_std = np.std(val_scores, axis=1)
    
    axes[i].plot(train_sizes, train_mean, 'o-', color='blue', label='Training Score')
    axes[i].fill_between(train_sizes, train_mean - train_std, train_mean + train_std, alpha=0.1, color='blue')
    
    axes[i].plot(train_sizes, val_mean, 'o-', color='red', label='Validation Score')
    axes[i].fill_between(train_sizes, val_mean - val_std, val_mean + val_std, alpha=0.1, color='red')
    
    axes[i].set_title(f'Learning Curve - {algorithm_name}')
    axes[i].set_xlabel('Training Set Size')
    axes[i].set_ylabel('Accuracy Score')
    axes[i].legend()
    axes[i].grid(True)

plt.tight_layout()
plt.show()

## Part 9: Summary and Insights

In [None]:
# Generate comprehensive summary
print("\n" + "="*60)
print("           ALGORITHM COMPARISON SUMMARY")
print("="*60)

print(f"\n🏆 BEST PERFORMING ALGORITHM: {results_df.iloc[0]['Algorithm']}")
print(f"   • Accuracy: {results_df.iloc[0]['Accuracy']:.4f}")
print(f"   • F1-Score: {results_df.iloc[0]['F1-Score']:.4f}")
print(f"   • Cross-validation: {results_df.iloc[0]['CV_Mean']:.4f} ± {results_df.iloc[0]['CV_Std']:.4f}")

print(f"\n⚡ FASTEST TRAINING: {results_df.loc[results_df['Training_Time'].idxmin(), 'Algorithm']}")
print(f"   • Training time: {results_df['Training_Time'].min():.4f} seconds")

print(f"\n🎯 MOST CONSISTENT (lowest CV std): {results_df.loc[results_df['CV_Std'].idxmin(), 'Algorithm']}")
print(f"   • CV std: {results_df['CV_Std'].min():.4f}")

print("\n📊 RANKING BY ACCURACY:")
for i, (_, row) in enumerate(results_df.iterrows(), 1):
    print(f"   {i}. {row['Algorithm']}: {row['Accuracy']:.4f}")

print("\n💡 KEY INSIGHTS:")
accuracy_range = results_df['Accuracy'].max() - results_df['Accuracy'].min()
if accuracy_range < 0.05:
    print("   • All algorithms perform similarly (accuracy difference < 5%)")
else:
    print(f"   • Significant performance differences observed (range: {accuracy_range:.3f})")

if results_df.iloc[0]['Training_Time'] > 1.0:
    print("   • Consider faster algorithms for real-time applications")
else:
    print("   • Training times are reasonable for all algorithms")

print("\n" + "="*60)

## Part 10: Exercises and Questions

### Exercise 1: Parameter Tuning
Choose the best performing algorithm and try to improve its performance by tuning hyperparameters. Use GridSearchCV or RandomizedSearchCV.

### Exercise 2: Feature Selection
Implement feature selection techniques and see how they affect algorithm performance. Try:
- SelectKBest
- Recursive Feature Elimination
- Feature importance thresholding

### Exercise 3: Ensemble Methods
Create a voting classifier that combines the top 3 algorithms. Compare its performance to individual algorithms.

### Discussion Questions:
1. Why might different algorithms perform differently on the same dataset?
2. What factors should you consider when choosing an algorithm for a real-world problem?
3. How would you explain the trade-offs between accuracy and training time to a non-technical stakeholder?
4. When might you choose a slightly less accurate but much faster algorithm?
5. How reliable are these results? What could you do to make them more robust?

In [None]:
# Exercise 1 Solution Template - Hyperparameter Tuning
from sklearn.model_selection import GridSearchCV

# TODO: Choose best algorithm and define parameter grid
# Example for Random Forest:
# param_grid = {
#     'n_estimators': [50, 100, 200],
#     'max_depth': [None, 10, 20],
#     'min_samples_split': [2, 5, 10]
# }

# TODO: Implement GridSearchCV
# grid_search = GridSearchCV(...)

print("\nExercise 1: Implement hyperparameter tuning here!")

In [None]:
# Exercise 2 Solution Template - Feature Selection
from sklearn.feature_selection import SelectKBest, f_classif

# TODO: Implement feature selection
# selector = SelectKBest(score_func=f_classif, k=10)
# X_train_selected = selector.fit_transform(X_train_scaled, y_train)

print("\nExercise 2: Implement feature selection here!")

In [None]:
# Exercise 3 Solution Template - Ensemble Methods
from sklearn.ensemble import VotingClassifier

# TODO: Create voting classifier with top 3 algorithms
# voting_clf = VotingClassifier(
#     estimators=[...],
#     voting='soft'
# )

print("\nExercise 3: Implement ensemble methods here!")

## Conclusion

In this lab, you have:

1. **Compared multiple algorithms** on the same dataset using consistent evaluation metrics
2. **Analyzed performance trade-offs** between accuracy, speed, and consistency
3. **Visualized results** using various charts and plots
4. **Identified the best performing algorithm** for this specific problem
5. **Learned about feature importance** and model interpretability

### Key Takeaways:
- **No single algorithm** works best for all problems
- **Context matters**: Consider accuracy, speed, interpretability, and deployment constraints
- **Cross-validation** provides more reliable performance estimates
- **Feature engineering** and preprocessing can significantly impact results
- **Ensemble methods** often provide the best performance by combining multiple algorithms

### Next Steps:
- Try this comparison on different datasets
- Experiment with hyperparameter tuning
- Explore advanced ensemble techniques
- Consider algorithm-specific optimizations
- Practice explaining results to non-technical audiences