# Components of Machine Learning

This notebook demonstrates the **components of Machine Learning**:

1. **Data** - Features and Labels
2. **Hypothesis Space (Model)** - Both traditional ML and PyTorch Neural Networks
3. **Loss Functions** - Optimization objectives

## Environment setup

First, let's import all necessary libraries and set up our environment.

In [1]:
# Import necessary libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Traditional ML libraries
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# PyTorch libraries
import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import TensorDataset, DataLoader

# Suppress warnings
import warnings
warnings.filterwarnings('ignore')

# Set random seeds for reproducibility
np.random.seed(42)
torch.manual_seed(42)

# Configure matplotlib
plt.style.use('seaborn-v0_8' if 'seaborn-v0_8' in plt.style.available else 'default')
sns.set_palette("husl")

print("Environment Setup Complete!")
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

Environment Setup Complete!
PyTorch version: 2.7.1+cu126
CUDA available: True
Using device: cuda


---

## Component: Data

**Data** is the foundation of machine learning. It consists of:
- **Features (X)**: Input variables or measurements
- **Labels/Targets (y)**: What we want to predict

### The Iris Dataset

We'll use the famous Iris dataset which contains:
- **150 samples** of iris flowers
- **4 features**: sepal length, sepal width, petal length, petal width
- **3 classes**: Setosa, Versicolor, Virginica

This dataset is perfect for demonstrating both binary and multi-class classification.

In [2]:
# Load the Iris dataset
iris_data = load_iris()
X = iris_data.data
y = iris_data.target

print("🌸 IRIS DATASET OVERVIEW")
print("=" * 40)
print(f"Dataset shape: {X.shape}")
print(f"Number of classes: {len(np.unique(y))}")
print(f"Feature names: {iris_data.feature_names}")
print(f"Class names: {iris_data.target_names}")
print(f"Classes distribution: {np.bincount(y)}")

# Create a DataFrame for better visualization
df = pd.DataFrame(X, columns=iris_data.feature_names)
df['target'] = y
df['species'] = [iris_data.target_names[i] for i in y]

print(f"\nFirst 5 samples:")
display(df.head())

🌸 IRIS DATASET OVERVIEW
Dataset shape: (150, 4)
Number of classes: 3
Feature names: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
Class names: ['setosa' 'versicolor' 'virginica']
Classes distribution: [50 50 50]

First 5 samples:


Unnamed: 0,sepal length (cm),sepal width (cm),petal length (cm),petal width (cm),target,species
0,5.1,3.5,1.4,0.2,0,setosa
1,4.9,3.0,1.4,0.2,0,setosa
2,4.7,3.2,1.3,0.2,0,setosa
3,4.6,3.1,1.5,0.2,0,setosa
4,5.0,3.6,1.4,0.2,0,setosa


In [None]:
# Visualize the data
fig, axes = plt.subplots(2, 2, figsize=(15, 10))

# Scatter plot: Sepal dimensions
sns.scatterplot(data=df, x='sepal length (cm)', y='sepal width (cm)', 
                hue='species', s=60, ax=axes[0,0])
axes[0,0].set_title('Sepal Dimensions')

# Scatter plot: Petal dimensions  
sns.scatterplot(data=df, x='petal length (cm)', y='petal width (cm)',
                hue='species', s=60, ax=axes[0,1])
axes[0,1].set_title('Petal Dimensions')

# Distribution plots
for i, feature in enumerate(['sepal length (cm)', 'sepal width (cm)']):
    sns.histplot(data=df, x=feature, hue='species', alpha=0.7, 
                 ax=axes[1,i], kde=True)
    axes[1,i].set_title(f'{feature.title()} Distribution')

plt.tight_layout()
plt.show()

print("Observations:")
print("- Setosa is clearly separable from the other two species")
print("- Versicolor and Virginica have some overlap")  
print("- Petal dimensions show better separation than sepal dimensions")

---

## Component: Hypothesis Space (Models)

The **hypothesis space** defines the set of all possible functions our model can represent. Let's compare two approaches:

### Approach 1: Traditional Machine Learning (Scikit-learn)

Traditional ML uses well-established algorithms with built-in assumptions:
- **Logistic Regression**: Linear decision boundaries
- **Random Forest**: Tree-based ensemble methods
- **SVM**: Margin-based classification

**Advantages**: Simple, fast, interpretable  
**Disadvantages**: Limited complexity, requires feature engineering

In [None]:
# Traditional ML: Scikit-learn implementation
print("TRADITIONAL MACHINE LEARNING (Scikit-learn)")
print("=" * 50)

# Prepare data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Split data
X_train, X_test, y_train, y_test = train_test_split(
    X_scaled, y, test_size=0.2, random_state=42, stratify=y
)

print(f"Training set size: {X_train.shape[0]} samples")
print(f"Test set size: {X_test.shape[0]} samples")

# Traditional ML models
models = {
    'Logistic Regression': LogisticRegression(random_state=42),
    'Random Forest': RandomForestClassifier(n_estimators=100, random_state=42)
}

results_traditional = {}

for name, model in models.items():
    # Train model
    model.fit(X_train, y_train)
    
    # Make predictions
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    
    results_traditional[name] = {
        'model': model,
        'accuracy': accuracy,
        'predictions': y_pred
    }
    
    print(f"\n{name}:")
    print(f"  Accuracy: {accuracy:.4f}")
    print(f"  Training: model.fit(X_train, y_train)")
    print(f"  Prediction: model.predict(X_test)")

print(f"\nScikit-learn makes ML simple with just 2 lines of code.")

### Approach 2: Deep Learning (PyTorch)

PyTorch allows us to build **custom neural networks** with complete control over:
- **Architecture**: Number of layers, neurons, activations
- **Training Process**: Optimizers, learning rates, batch sizes
- **Loss Functions**: Custom objectives for specific tasks

**Advantages**: Extremely flexible, can learn complex patterns  
**Disadvantages**: More complex, requires more data and tuning

In [None]:
# PyTorch Neural Network Implementation
print("DEEP LEARNING (PyTorch)")  
print("=" * 30)

class MultiLayerPerceptron(nn.Module):
    """PyTorch Neural Network with multiple hidden layers"""
    
    def __init__(self, input_dim=4, hidden_dim=64, num_classes=3):
        super(MultiLayerPerceptron, self).__init__()
        
        self.layers = nn.Sequential(
            # First hidden layer
            nn.Linear(input_dim, hidden_dim),
            nn.ReLU(),                          # Non-linearity
            nn.Dropout(0.2),                    # Regularization
            
            # Second hidden layer
            nn.Linear(hidden_dim, hidden_dim//2),
            nn.ReLU(),
            nn.Dropout(0.2),
            
            # Output layer
            nn.Linear(hidden_dim//2, num_classes)
        )
    
    def forward(self, x):
        """Forward pass through the network"""
        return self.layers(x)

# Create model instance
model = MultiLayerPerceptron(input_dim=4, hidden_dim=64, num_classes=3)
print(f"Model Architecture:")
print(model)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"\nTotal parameters: {total_params}")
print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad)}")

In [None]:
# Prepare PyTorch data
X_train_tensor = torch.FloatTensor(X_train)
y_train_tensor = torch.LongTensor(y_train)
X_test_tensor = torch.FloatTensor(X_test)
y_test_tensor = torch.LongTensor(y_test)

# Create data loaders for batch processing
train_dataset = TensorDataset(X_train_tensor, y_train_tensor)
train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True)

test_dataset = TensorDataset(X_test_tensor, y_test_tensor)
test_loader = DataLoader(test_dataset, batch_size=16, shuffle=False)

# Define loss function and optimizer
criterion = nn.CrossEntropyLoss()  # For multi-class classification
optimizer = optim.Adam(model.parameters(), lr=0.01)

print("Training Configuration:")
print(f"  Loss Function: {criterion}")
print(f"  Optimizer: Adam (lr=0.01)")
print(f"  Batch Size: 16")
print(f"  Device: {device}")

# Move model to device
model.to(device)

In [None]:
# Training Loop
print("\nStarting PyTorch training.")
print("-" * 40)

num_epochs = 100
train_losses = []
train_accuracies = []

for epoch in range(num_epochs):
    model.train()  # Set to training mode
    epoch_loss = 0.0
    correct_predictions = 0
    total_samples = 0
    
    for batch_X, batch_y in train_loader:
        batch_X, batch_y = batch_X.to(device), batch_y.to(device)
        
        # Forward pass
        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)
        
        # Backward pass and optimization
        optimizer.zero_grad()  # Clear gradients
        loss.backward()        # Compute gradients
        optimizer.step()       # Update parameters (Adam magic!)
        
        # Track metrics
        epoch_loss += loss.item()
        _, predicted = torch.max(outputs.data, 1)
        total_samples += batch_y.size(0)
        correct_predictions += (predicted == batch_y).sum().item()
    
    # Calculate average loss and accuracy
    avg_loss = epoch_loss / len(train_loader)
    accuracy = 100 * correct_predictions / total_samples
    
    train_losses.append(avg_loss)
    train_accuracies.append(accuracy)
    
    if (epoch + 1) % 20 == 0:
        print(f'Epoch [{epoch+1:3d}/{num_epochs}] | Loss: {avg_loss:.4f} | Accuracy: {accuracy:.2f}%')

print("\nTraining complete.")

---

## Component: Loss Functions

**Loss functions** measure how wrong our predictions are and guide the learning process. Different tasks require different loss functions:

### Classification Loss Functions

1. **CrossEntropyLoss** - Multi-class classification
   - Formula: `CE = -Σ(y_true * log(y_pred))`
   - Used when predicting among multiple classes

2. **BCELoss** - Binary classification
   - Formula: `BCE = -[y*log(p) + (1-y)*log(1-p)]`
   - Used for yes/no, true/false predictions

### Regression Loss Functions

3. **MSELoss** - Mean Squared Error
   - Formula: `MSE = (1/n) * Σ(y_true - y_pred)²`
   - Penalizes large errors heavily

4. **L1Loss** - Mean Absolute Error
   - Formula: `MAE = (1/n) * Σ|y_true - y_pred|`
   - More robust to outliers

In [None]:
# Demonstrate different PyTorch loss functions
print("PYTORCH LOSS FUNCTIONS DEMO")
print("=" * 35)

# Sample data for demonstration
y_true_class = torch.tensor([0, 1, 2, 1, 0])  # True class labels
y_pred_logits = torch.tensor([
    [2.0, -1.0, 0.5],   # Prediction for sample 1
    [-0.5, 1.5, 0.2],   # Prediction for sample 2
    [0.1, -0.8, 1.2],   # Prediction for sample 3
    [0.3, 0.9, -0.1],   # Prediction for sample 4
    [1.8, -0.2, 0.0]    # Prediction for sample 5
])

# Convert logits to probabilities
y_pred_probs = torch.softmax(y_pred_logits, dim=1)

print("Sample Data:")
print(f"True labels: {y_true_class.tolist()}")
print(f"Predicted probabilities shape: {y_pred_probs.shape}")
print(f"Predicted probabilities:\n{y_pred_probs}")

print("\n" + "-" * 50)

# 1. Cross Entropy Loss (for classification)
ce_loss = nn.CrossEntropyLoss()
ce_value = ce_loss(y_pred_logits, y_true_class)
print(f"1. CrossEntropyLoss: {ce_value:.4f}")
print("   ↳ Used for multi-class classification")
print("   ↳ Takes raw logits (no softmax needed)")

# 2. Negative Log Likelihood Loss
nll_loss = nn.NLLLoss()
nll_value = nll_loss(torch.log(y_pred_probs), y_true_class)
print(f"\n2. NLLLoss: {nll_value:.4f}")
print("   ↳ CrossEntropy = Softmax + NLLLoss")

# 3. Mean Squared Error (for regression)
y_true_reg = torch.tensor([1.0, 2.5, 0.8, 1.9, 0.3])
y_pred_reg = torch.tensor([1.1, 2.3, 0.9, 2.1, 0.1])

mse_loss = nn.MSELoss()
mse_value = mse_loss(y_pred_reg, y_true_reg)
print(f"\n3. MSELoss: {mse_value:.4f}")
print("   ↳ Used for regression tasks")
print("   ↳ Squares the differences (penalizes large errors)")

# 4. Mean Absolute Error (for regression)
mae_loss = nn.L1Loss()
mae_value = mae_loss(y_pred_reg, y_true_reg)
print(f"\n4. L1Loss (MAE): {mae_value:.4f}")
print("   ↳ Used for regression tasks")
print("   ↳ Less sensitive to outliers than MSE")

In [None]:
# Evaluate PyTorch model
print("MODEL EVALUATION")
print("=" * 20)

model.eval()  # Set to evaluation mode
with torch.no_grad():
    test_outputs = model(X_test_tensor.to(device))
    _, test_predictions = torch.max(test_outputs, 1)
    test_predictions = test_predictions.cpu().numpy()
    
    pytorch_accuracy = accuracy_score(y_test, test_predictions)

print(f"PyTorch MLP Accuracy: {pytorch_accuracy:.4f}")

# Compare all approaches
print("\n📊 FINAL COMPARISON")
print("=" * 25)
print(f"Logistic Regression: {results_traditional['Logistic Regression']['accuracy']:.4f}")
print(f"Random Forest:       {results_traditional['Random Forest']['accuracy']:.4f}")
print(f"PyTorch MLP:         {pytorch_accuracy:.4f}")

# Determine best model
best_score = max(
    results_traditional['Logistic Regression']['accuracy'],
    results_traditional['Random Forest']['accuracy'],
    pytorch_accuracy
)

print(f"\nBest performing model: {best_score:.4f} accuracy")

if best_score == pytorch_accuracy:
    print("   Winner: PyTorch MLP")
elif best_score == results_traditional['Random Forest']['accuracy']:
    print("   Winner: Random Forest")
else:
    print("   Winner: Logistic Regression")

In [None]:
# Data Visualizations
print("VISUALIZATIONS")

fig, axes = plt.subplots(2, 3, figsize=(18, 12))

# 1. PyTorch Training History - Loss
axes[0, 0].plot(train_losses, 'b-', linewidth=2, label='Training Loss')
axes[0, 0].set_title('PyTorch Training Loss', fontsize=14, fontweight='bold')
axes[0, 0].set_xlabel('Epoch')
axes[0, 0].set_ylabel('CrossEntropy Loss')
axes[0, 0].grid(True, alpha=0.3)
axes[0, 0].legend()

# 2. PyTorch Training History - Accuracy
axes[0, 1].plot(train_accuracies, 'g-', linewidth=2, label='Training Accuracy')
axes[0, 1].set_title('PyTorch Training Accuracy', fontsize=14, fontweight='bold')
axes[0, 1].set_xlabel('Epoch')
axes[0, 1].set_ylabel('Accuracy (%)')
axes[0, 1].grid(True, alpha=0.3)
axes[0, 1].legend()

# 3. Model Comparison Bar Chart
models_names = ['Logistic Reg', 'Random Forest', 'PyTorch MLP']
accuracies = [
    results_traditional['Logistic Regression']['accuracy'],
    results_traditional['Random Forest']['accuracy'], 
    pytorch_accuracy
]
colors = ['skyblue', 'lightgreen', 'salmon']

bars = axes[0, 2].bar(models_names, accuracies, color=colors, alpha=0.8)
axes[0, 2].set_title('Model Accuracy Comparison', fontsize=14, fontweight='bold')
axes[0, 2].set_ylabel('Accuracy')
axes[0, 2].set_ylim(0.8, 1.0)

# Add value labels on bars
for bar, acc in zip(bars, accuracies):
    height = bar.get_height()
    axes[0, 2].text(bar.get_x() + bar.get_width()/2., height + 0.01,
                   f'{acc:.3f}', ha='center', va='bottom', fontweight='bold')

# 4. Confusion Matrix for PyTorch
cm_pytorch = confusion_matrix(y_test, test_predictions)
sns.heatmap(cm_pytorch, annot=True, fmt='d', cmap='Blues',
            xticklabels=iris_data.target_names,
            yticklabels=iris_data.target_names,
            ax=axes[1, 0])
axes[1, 0].set_title('PyTorch MLP - Confusion Matrix', fontsize=14, fontweight='bold')
axes[1, 0].set_xlabel('Predicted')
axes[1, 0].set_ylabel('Actual')

# 5. Feature Importance Visualization (Random Forest)
feature_importance = results_traditional['Random Forest']['model'].feature_importances_
axes[1, 1].barh(iris_data.feature_names, feature_importance, color='lightcoral')
axes[1, 1].set_title('Random Forest - Feature Importance', fontsize=14, fontweight='bold')
axes[1, 1].set_xlabel('Importance')

# 6. Loss Function Comparison
loss_names = ['CrossEntropy', 'NLL', 'MSE', 'L1/MAE']
loss_values = [ce_value.item(), nll_value.item(), mse_value.item(), mae_value.item()]
loss_colors = ['red', 'orange', 'purple', 'brown']

axes[1, 2].bar(loss_names, loss_values, color=loss_colors, alpha=0.7)
axes[1, 2].set_title('PyTorch Loss Functions', fontsize=14, fontweight='bold')
axes[1, 2].set_ylabel('Loss Value')
axes[1, 2].tick_params(axis='x', rotation=45)

# Add value labels
for i, (name, value) in enumerate(zip(loss_names, loss_values)):
    axes[1, 2].text(i, value + 0.05, f'{value:.3f}', 
                   ha='center', va='bottom', fontweight='bold')

plt.tight_layout()
plt.show()

print("All visualizations complete!")

---

## Summary

### The 3 Components of Machine Learning

| Component | Traditional ML | Deep Learning (PyTorch) |
|-----------|---------------|------------------------|
| **Data** | Features + Labels | Tensors + DataLoaders |
| **Model** | `model.fit()` | Custom `nn.Module` classes |
| **Loss** | Built-in metrics | Flexible loss functions |

### **When to use traditional ML or Deep Learning and PyTorch?**

**Traditional ML (Scikit-learn):**
- ✅ Small datasets (< 10,000 samples)
- ✅ Tabular data
- ✅ Need interpretability
- ✅ Quick prototyping
- ✅ Limited computational resources

**Deep Learning (PyTorch):**
- ✅ Large datasets (> 100,000 samples)
- ✅ Complex patterns (images, text, audio)
- ✅ Need custom architectures
- ✅ GPU resources available

### **PyTorch concepts**

1. **Adam Optimizer**: Adaptive learning rates, momentum, bias correction
2. **Neural Architecture**: Linear layers + ReLU activations + Dropout
3. **Loss Functions**: CrossEntropy (classification), MSE (regression)
4. **Training Loop**: Forward pass → Loss → Backward pass → Update


### **Next steps**

1. **Architecture**:
   - Change `hidden_dim` from 64 to 32 or 128
   - Add more layers to the neural network
   - Try different activation functions (`nn.Tanh`, `nn.LeakyReLU`)

2. **Optimizer**:
   - Replace Adam with `optim.SGD(model.parameters(), lr=0.1)`
   - Try different learning rates: 0.001, 0.01, 0.1

3. **Loss Function**:
   - For binary classification, use `nn.BCEWithLogitsLoss()`
   - For regression tasks, try `nn.L1Loss()` instead of `nn.MSELoss()`

4. **Data**:
   - Try other sklearn datasets: `load_wine()`, `load_breast_cancer()`
   - Experiment with different train/test split ratios

### **Resources**

- **PyTorch Tutorials**: https://pytorch.org/tutorials/
- **Deep Learning Book**: http://www.deeplearningbook.org/
- **Fast.ai Course**: https://www.fast.ai/
- **PyTorch Documentation**: https://pytorch.org/docs/

---
