# Lab 2: Machine Learning with PyTorch - SOLUTIONS

## From Classical ML to Deep Learning Foundations

**Duration:** 90-120 minutes | **Difficulty:** Intermediate | **Prerequisites:** Lab 1

**This notebook contains all solutions. Use for reference after attempting the exercises.**

---

## Overview

This lab bridges classical machine learning and deep learning by teaching PyTorch fundamentals through hands-on implementation of ML algorithms. You'll learn to build, train, and evaluate models using the same patterns used in production deep learning systems.

### Lab Structure

| Part | Topic | Key Concepts |
|------|-------|--------------|
| **Part 1** | PyTorch Tensors | Creating tensors, tensor operations, automatic differentiation (autograd) |
| **Part 2** | Linear Regression | Training loop from scratch, MSE loss, nn.Module, gradient descent |
| **Part 3** | Logistic Regression | Sigmoid function, BCE loss, binary classification, decision boundaries |
| **Part 4** | Support Vector Machines | Kernel trick (linear, RBF, polynomial), margins, support vectors |
| **Part 5** | Model Evaluation | Confusion matrix, precision, recall, F1-score, classification report |
| **Part 6** | Natural Language Processing | Text preprocessing, TF-IDF, Naive Bayes, sentiment analysis |

### Key Pattern You'll Learn

The PyTorch training loop used in all deep learning:

```python
for epoch in range(n_epochs):
    y_pred = model(X)           # Forward pass
    loss = criterion(y_pred, y) # Compute loss
    optimizer.zero_grad()       # Clear gradients
    loss.backward()             # Backward pass
    optimizer.step()            # Update weights
```

---

In [None]:
# SETUP
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
from sklearn.svm import SVC
from sklearn.datasets import make_classification, make_moons

# NLP imports
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
import re

%matplotlib inline
plt.rcParams['figure.figsize'] = [10, 6]
plt.rcParams['font.size'] = 12

np.random.seed(42)
torch.manual_seed(42)

print("Setup Complete!")

---
# Part 1: PyTorch Tensors - Solutions
---

In [None]:
"""
Exercise 1.1 Solution: Creating PyTorch Tensors

Tensors are the fundamental data structure in PyTorch - like NumPy arrays
but with GPU acceleration and automatic differentiation support.
"""

# Create a 1D tensor from a Python list
# torch.tensor() converts Python/NumPy data to a PyTorch tensor
tensor_a = torch.tensor([10, 20, 30, 40, 50])

# Create a 3x4 tensor filled with zeros
# Useful for initializing weight matrices or placeholders
tensor_b = torch.zeros(3, 4)

# Create a 2x5 tensor with random values between 0 and 1
# torch.rand() uses uniform distribution [0, 1)
tensor_c = torch.rand(2, 5)

# Print results to verify
print(f"a) {tensor_a}")
print(f"b) Shape: {tensor_b.shape}")
print(f"c) Random tensor with shape {tensor_c.shape}")

"""
Exercise 1.2 Solution: Tensor Operations

PyTorch tensors support element-wise operations, aggregations,
and shape transformations - all essential for neural network computations.
"""

x = torch.tensor([[1., 2., 3.], [4., 5., 6.]])
y = torch.tensor([[7., 8., 9.], [10., 11., 12.]])

print("x =")
print(x)
print("\ny =")
print(y)
print()

# Element-wise addition: adds corresponding elements
# Same as x.add(y) or torch.add(x, y)
result_add = x + y

# Mean of all elements: returns single scalar value
# Can also specify dimension: x.mean(dim=0) for column means
result_mean = x.mean()

# Sum along dimension 1 (rows): collapses columns
# dim=0 would sum columns, dim=1 sums rows
result_row_sum = x.sum(dim=1)

# Reshape: change tensor dimensions while preserving data
# Total elements must remain same: 2*3 = 3*2 = 6
result_reshape = x.reshape(3, 2)

# Print results
print(f"a) x + y =\n{result_add}")
print(f"\nb) Mean of x = {result_mean}")
print(f"\nc) Sum of each row = {result_row_sum}")
print(f"\nd) x reshaped to 3x2:\n{result_reshape}")

"""
Exercise 1.3 Solution: Automatic Differentiation (Autograd)

Autograd is PyTorch's automatic differentiation engine that powers neural network training.
It tracks operations on tensors and computes gradients automatically.
"""

# Create tensor with gradient tracking enabled
# requires_grad=True tells PyTorch to track all operations on this tensor
x = torch.tensor([2.0], requires_grad=True)

# Compute y = 3x² - 4x + 5
# PyTorch builds a computation graph as we perform operations
y = 3*x**2 - 4*x + 5

# Compute gradients via backpropagation
# This calculates dy/dx and stores it in x.grad
y.backward()

# Print results
# At x=2: y = 3(4) - 4(2) + 5 = 12 - 8 + 5 = 9
# dy/dx = 6x - 4, at x=2: 6(2) - 4 = 8
print(f"y = 3x² - 4x + 5 at x=2: y = {y.item()}")
print(f"dy/dx = 6x - 4 at x=2: dy/dx = {x.grad.item()}")

### Code Explanation: Exercise 1.3

| Line | Code | Explanation |
|------|------|-------------|
| 1 | `x = torch.tensor([2.0], requires_grad=True)` | **Creates tensor with gradient tracking.** PyTorch will record all operations on x for backpropagation. |
| 2 | `y = 3*x**2 - 4*x + 5` | **Builds computation graph.** Each operation creates a node. PyTorch stores how to compute gradients. |
| 3 | `y.backward()` | **Backpropagation.** Computes dy/dx using chain rule and stores result in x.grad. |
| 4 | `x.grad` | **Access computed gradient.** Contains the derivative value after backward() is called. |

**Mathematical Verification:**
```
Function:    y = 3x² - 4x + 5
Derivative:  dy/dx = 6x - 4

At x = 2:
  y = 3(2)² - 4(2) + 5 = 12 - 8 + 5 = 9
  dy/dx = 6(2) - 4 = 12 - 4 = 8
```

**The Computation Graph:**
```
x (leaf) → x² → 3x² → 3x² - 4x → 3x² - 4x + 5 (y)
                ↑
               4x ←
```

**Why autograd matters:**
- Eliminates manual gradient calculation (error-prone for complex networks)
- Enables training of deep networks with millions of parameters
- Foundation of all deep learning: gradients tell us how to adjust weights

In [None]:
# ============================================
# EXERCISE 1.2 SOLUTION: Tensor Operations
# ============================================

x = torch.tensor([[1., 2., 3.], [4., 5., 6.]])
y = torch.tensor([[7., 8., 9.], [10., 11., 12.]])

print("x =")
print(x)
print("\ny =")
print(y)
print()

# a) Add x and y element-wise
result_add = x + y

# b) Calculate the mean of x
result_mean = x.mean()

# c) Calculate the sum of each row of x (dim=1)
result_row_sum = x.sum(dim=1)

# d) Reshape x to be 3 rows x 2 columns
result_reshape = x.reshape(3, 2)

# ---- Test ----
print(f"a) x + y =\n{result_add}")
print(f"\nb) Mean of x = {result_mean}")
print(f"\nc) Sum of each row = {result_row_sum}")
print(f"\nd) x reshaped to 3x2:\n{result_reshape}")

In [None]:
"""
Exercise 2.1 Solution: Linear Regression Model

This solution implements linear regression using PyTorch's nn.Module.
The training loop demonstrates the core pattern used in all deep learning.
"""

torch.manual_seed(123)

class MyLinearRegression(nn.Module):
    """
    Linear regression model: y = wx + b
    
    Uses nn.Linear which contains learnable weight and bias parameters.
    """
    def __init__(self):
        super().__init__()
        # nn.Linear(in_features, out_features) creates y = xW^T + b
        # 1 input feature (x), 1 output (y)
        self.linear = nn.Linear(1, 1)
    
    def forward(self, x):
        # Forward pass: compute prediction given input
        return self.linear(x)

# Instantiate the model
my_model = MyLinearRegression()

# Mean Squared Error loss: measures average squared difference
# L = (1/n) * Σ(y_pred - y_true)²
my_criterion = nn.MSELoss()

# Stochastic Gradient Descent optimizer
# lr=0.01 means weights update by: w = w - 0.01 * gradient
my_optimizer = optim.SGD(my_model.parameters(), lr=0.01)

# Training loop: the core pattern of all deep learning
my_losses = []
for epoch in range(100):
    # STEP 1: Forward pass - compute predictions
    y_pred = my_model(X)
    
    # STEP 2: Compute loss - how wrong are we?
    loss = my_criterion(y_pred, y)
    
    # STEP 3: Zero gradients - clear previous gradients
    # (gradients accumulate by default, which we don't want)
    my_optimizer.zero_grad()
    
    # STEP 4: Backward pass - compute gradients via backprop
    loss.backward()
    
    # STEP 5: Update weights - apply gradient descent step
    my_optimizer.step()
    
    my_losses.append(loss.item())
    
    if (epoch + 1) % 20 == 0:
        print(f'Epoch {epoch+1}/100 | Loss: {loss.item():.4f}')

# Print learned parameters
print(f"\nLearned: w = {my_model.linear.weight.item():.4f}, b = {my_model.linear.bias.item():.4f}")
print(f"True:    w = 3.0000, b = 2.0000")

### Code Explanation: Exercise 2.1

**Model Definition:**
| Line | Code | Explanation |
|------|------|-------------|
| 1 | `class MyLinearRegression(nn.Module)` | **Inherits from nn.Module.** All PyTorch models must inherit from this base class. |
| 2 | `super().__init__()` | **Initializes parent class.** Required for nn.Module to set up internal tracking. |
| 3 | `self.linear = nn.Linear(1, 1)` | **Creates linear layer.** Parameters: (input_features, output_features). Contains weight w and bias b. |
| 4 | `def forward(self, x)` | **Defines forward pass.** Called when you do `model(x)`. Returns predictions. |

**Training Components:**
| Component | Code | Purpose |
|-----------|------|---------|
| Loss Function | `nn.MSELoss()` | Measures prediction error: mean of (prediction - target)² |
| Optimizer | `optim.SGD(..., lr=0.01)` | Updates weights using gradients. lr = learning rate. |

**Training Loop (The 5 Essential Steps):**
```python
for epoch in range(n_epochs):
    y_pred = model(X)           # 1. Forward: compute predictions
    loss = criterion(y_pred, y) # 2. Loss: how wrong are we?
    optimizer.zero_grad()       # 3. Zero: clear old gradients
    loss.backward()             # 4. Backward: compute new gradients
    optimizer.step()            # 5. Update: adjust weights
```

**Why each step matters:**
1. **Forward pass** - runs input through model to get predictions
2. **Compute loss** - quantifies error (lower is better)
3. **Zero gradients** - gradients accumulate by default; we want fresh ones
4. **Backward pass** - autograd computes ∂loss/∂weights
5. **Optimizer step** - applies gradient descent: w_new = w_old - lr × gradient

---
# Part 2: Linear Regression - Solutions
---

In [None]:
"""
Exercise 3.1 Solution: Logistic Regression Classifier

Logistic regression extends linear regression for binary classification
by applying sigmoid function to squash outputs to [0, 1] probabilities.
"""

torch.manual_seed(42)

class MyClassifier(nn.Module):
    """
    Binary classifier using logistic regression.
    
    Architecture: Linear layer → Sigmoid activation
    Output is probability of class 1.
    """
    def __init__(self, input_dim):
        super().__init__()
        # Linear layer: maps input features to single output
        self.linear = nn.Linear(input_dim, 1)
        # Sigmoid: squashes output to (0, 1) range
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        # Apply linear transformation then sigmoid
        z = self.linear(x)    # z = wx + b (can be any real number)
        return self.sigmoid(z) # probability in (0, 1)

# Create model with 2 input features
my_classifier = MyClassifier(input_dim=2)

# Binary Cross Entropy loss for classification
# Penalizes confident wrong predictions heavily
my_bce_loss = nn.BCELoss()

# Adam optimizer: adaptive learning rate, often faster than SGD
my_opt = optim.Adam(my_classifier.parameters(), lr=0.1)

# Training loop
for epoch in range(100):
    # Forward pass: get predicted probabilities
    y_pred = my_classifier(X_train_t)
    
    # Compute BCE loss
    loss = my_bce_loss(y_pred, y_train_t)
    
    # Standard backprop steps
    my_opt.zero_grad()
    loss.backward()
    my_opt.step()
    
    if (epoch + 1) % 20 == 0:
        # Calculate accuracy: threshold at 0.5
        acc = ((y_pred >= 0.5).float() == y_train_t).float().mean()
        print(f'Epoch {epoch+1}/100 | Loss: {loss.item():.4f} | Acc: {acc.item():.4f}')

# Evaluate on test set
my_classifier.eval()  # Switch to evaluation mode
with torch.no_grad():  # Disable gradient tracking for inference
    y_pred = my_classifier(X_test_t)
    y_class = (y_pred >= 0.5).float()  # Convert probabilities to 0/1
    acc = (y_class == y_test_t).float().mean()
print(f"\nTest Accuracy: {acc.item():.4f}")

### Code Explanation: Exercise 3.1

**Model Architecture:**
| Component | Code | Purpose |
|-----------|------|---------|
| Linear layer | `nn.Linear(input_dim, 1)` | Computes z = w₁x₁ + w₂x₂ + b |
| Sigmoid | `nn.Sigmoid()` | Converts z to probability: σ(z) = 1/(1+e^(-z)) |

**The Sigmoid Function:**
```
σ(z) = 1 / (1 + e^(-z))

z = -∞ → σ(z) = 0
z = 0  → σ(z) = 0.5
z = +∞ → σ(z) = 1
```

**Loss and Optimizer:**
| Component | Code | Explanation |
|-----------|------|-------------|
| `nn.BCELoss()` | Binary Cross Entropy | L = -[y·log(p) + (1-y)·log(1-p)]. Heavily penalizes confident wrong predictions. |
| `optim.Adam()` | Adaptive optimizer | Combines momentum and adaptive learning rates. Often faster than SGD. |

**Inference Mode:**
| Code | Purpose |
|------|---------|
| `model.eval()` | Disables dropout, uses running stats for batch norm |
| `torch.no_grad()` | Disables gradient tracking (faster, less memory) |
| `y_pred >= 0.5` | Converts probabilities to class labels |

**Why logistic regression for classification:**
- Linear regression outputs can be any real number
- Classification needs probabilities (0 to 1)
- Sigmoid transforms linear output to valid probability
- Decision boundary: predict 1 if probability ≥ 0.5

In [None]:
# ============================================
# EXERCISE 2.1 SOLUTION: Build Linear Regression
# ============================================

torch.manual_seed(123)

# Define the model
class MyLinearRegression(nn.Module):
    def __init__(self):
        super().__init__()
        # SOLUTION: Create a linear layer (1 input, 1 output)
        self.linear = nn.Linear(1, 1)
    
    def forward(self, x):
        # SOLUTION: Return the output of the linear layer
        return self.linear(x)

# Create model, loss function, and optimizer
my_model = MyLinearRegression()
my_criterion = nn.MSELoss()  # SOLUTION
my_optimizer = optim.SGD(my_model.parameters(), lr=0.01)  # SOLUTION

# Training loop
my_losses = []
for epoch in range(100):
    # SOLUTION: Complete the training loop
    # 1. Forward pass
    y_pred = my_model(X)
    
    # 2. Compute loss
    loss = my_criterion(y_pred, y)
    
    # 3. Zero gradients
    my_optimizer.zero_grad()
    
    # 4. Backward pass
    loss.backward()
    
    # 5. Update weights
    my_optimizer.step()
    
    my_losses.append(loss.item())
    
    if (epoch + 1) % 20 == 0:
        print(f'Epoch {epoch+1}/100 | Loss: {loss.item():.4f}')

# Print results
print(f"\nLearned: w = {my_model.linear.weight.item():.4f}, b = {my_model.linear.bias.item():.4f}")
print(f"True:    w = 3.0000, b = 2.0000")

"""
Exercise 4.1 Solution: Support Vector Machine

SVMs find the optimal hyperplane that separates classes with maximum margin.
The RBF kernel enables non-linear decision boundaries.
"""

# Create SVM with RBF (Radial Basis Function) kernel
# kernel='rbf': uses Gaussian kernel for non-linear boundaries
# C=10: regularization parameter (higher = less regularization, fits training data more closely)
my_svm = SVC(kernel='rbf', C=10)

# Train the SVM on the training data
# SVM finds support vectors: points closest to decision boundary
my_svm.fit(X_train_m, y_train_m)

# Evaluate on test data
# .score() returns accuracy: (correct predictions / total)
accuracy = my_svm.score(X_test_m, y_test_m)

# Get number of support vectors
# These are the training points that define the decision boundary
n_support = len(my_svm.support_vectors_)

print(f"Test Accuracy: {accuracy:.4f}")
print(f"Number of Support Vectors: {n_support}")

### Code Explanation: Exercise 4.1

| Line | Code | Explanation |
|------|------|-------------|
| 1 | `SVC(kernel='rbf', C=10)` | **Creates SVM classifier.** RBF kernel enables non-linear boundaries. C controls regularization. |
| 2 | `my_svm.fit(X_train_m, y_train_m)` | **Trains the SVM.** Finds optimal hyperplane and identifies support vectors. |
| 3 | `my_svm.score(X_test_m, y_test_m)` | **Computes test accuracy.** Returns fraction of correctly classified samples. |
| 4 | `my_svm.support_vectors_` | **Accesses support vectors.** Training points that lie on or within the margin. |

**SVM Kernels:**
| Kernel | Formula | Use Case |
|--------|---------|----------|
| `linear` | K(x,y) = x·y | Linearly separable data |
| `rbf` | K(x,y) = exp(-γ‖x-y‖²) | Most common, handles non-linear |
| `poly` | K(x,y) = (x·y + c)^d | Polynomial decision boundaries |

**The C Parameter:**
```
C = 0.1  → High regularization, wider margin, may underfit
C = 1    → Balanced (default)
C = 10   → Low regularization, narrow margin, fits training data closely
C = 100  → Very low regularization, may overfit
```

**What are Support Vectors?**
- Training points that lie on or inside the margin boundary
- The decision boundary depends ONLY on these points
- Removing non-support vectors doesn't change the model
- Fewer support vectors = simpler, more generalizable model

In [None]:
# Setup: Generate classification data
X_class, y_class = make_classification(
    n_samples=300, n_features=2, n_redundant=0, n_informative=2,
    n_clusters_per_class=1, random_state=42
)
X_train, X_test, y_train, y_test = train_test_split(
    X_class, y_class, test_size=0.2, random_state=42
)
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
X_train_t = torch.tensor(X_train_scaled, dtype=torch.float32)
X_test_t = torch.tensor(X_test_scaled, dtype=torch.float32)
y_train_t = torch.tensor(y_train, dtype=torch.float32).reshape(-1, 1)
y_test_t = torch.tensor(y_test, dtype=torch.float32).reshape(-1, 1)

In [None]:
"""
Exercise 5.1 Solution: Model Evaluation

This solution demonstrates comprehensive model evaluation using
confusion matrix and classification report for detailed metrics.
"""

# Set model to evaluation mode and get predictions
log_model.eval()
with torch.no_grad():
    # Get probability predictions
    y_pred_prob = log_model(X_test_t)
    # Convert to class labels: 1 if probability >= 0.5, else 0
    y_pred_log = (y_pred_prob >= 0.5).numpy().astype(int).flatten()

# Create confusion matrix
# Rows = actual class, Columns = predicted class
# [[TN, FP], [FN, TP]]
cm_log = confusion_matrix(y_test, y_pred_log)

print("Confusion Matrix:")
print(cm_log)
print()

# Print detailed classification report
# Shows precision, recall, and F1-score for each class
print("Classification Report:")
print(classification_report(y_test, y_pred_log, target_names=['Class 0', 'Class 1']))

### Code Explanation: Exercise 5.1

| Line | Code | Explanation |
|------|------|-------------|
| 1 | `confusion_matrix(y_test, y_pred_log)` | **Creates 2×2 confusion matrix.** Compares actual vs predicted labels. |
| 2 | `classification_report(...)` | **Generates detailed metrics.** Precision, recall, F1-score per class. |
| 3 | `target_names=['Class 0', 'Class 1']` | **Labels for the report.** Makes output more readable. |

**Understanding the Confusion Matrix:**
```
                  Predicted
                  0     1
Actual  0      [[TN,   FP],    TN = True Negative (correct 0)
        1       [FN,   TP]]    FP = False Positive (wrong 1)
                               FN = False Negative (wrong 0)
                               TP = True Positive (correct 1)
```

**Classification Metrics:**
| Metric | Formula | Interpretation |
|--------|---------|----------------|
| **Precision** | TP / (TP + FP) | Of predicted positives, how many are correct? |
| **Recall** | TP / (TP + FN) | Of actual positives, how many did we find? |
| **F1-Score** | 2 × (P × R) / (P + R) | Harmonic mean of precision and recall |
| **Support** | Count | Number of actual instances per class |

**When to prioritize each metric:**
- **Precision**: When false positives are costly (spam detection)
- **Recall**: When false negatives are costly (disease detection)
- **F1-Score**: When you need balance between precision and recall
- **Accuracy**: When classes are balanced and errors are equally costly

---
# Part 4: Support Vector Machines - Solutions
---

In [None]:
# Setup: Generate moons data
X_moons, y_moons = make_moons(n_samples=300, noise=0.2, random_state=42)
X_train_m, X_test_m, y_train_m, y_test_m = train_test_split(
    X_moons, y_moons, test_size=0.2, random_state=42
)

In [None]:
# ============================================
# EXERCISE 4.1 SOLUTION: Train an SVM
# ============================================

# a) Create an RBF SVM with C=10
my_svm = SVC(kernel='rbf', C=10)  # SOLUTION

# b) Fit it on the training data
my_svm.fit(X_train_m, y_train_m)  # SOLUTION

# c) Calculate test accuracy
accuracy = my_svm.score(X_test_m, y_test_m)
n_support = len(my_svm.support_vectors_)
print(f"Test Accuracy: {accuracy:.4f}")
print(f"Number of Support Vectors: {n_support}")

---
# Part 5: Model Evaluation - Solutions
---

In [None]:
# Setup: Train a logistic regression model for evaluation
torch.manual_seed(42)

class LogisticRegressionModel(nn.Module):
    def __init__(self, input_dim):
        super().__init__()
        self.linear = nn.Linear(input_dim, 1)
        self.sigmoid = nn.Sigmoid()
    
    def forward(self, x):
        return self.sigmoid(self.linear(x))

log_model = LogisticRegressionModel(input_dim=2)
criterion_bce = nn.BCELoss()
optimizer_log = optim.Adam(log_model.parameters(), lr=0.1)

for epoch in range(100):
    y_pred_prob = log_model(X_train_t)
    loss = criterion_bce(y_pred_prob, y_train_t)
    optimizer_log.zero_grad()
    loss.backward()
    optimizer_log.step()

In [None]:
# ============================================
# EXERCISE 5.1 SOLUTION: Evaluate Your Model
# ============================================

# Get predictions from the logistic regression model
log_model.eval()
with torch.no_grad():
    y_pred_prob = log_model(X_test_t)
    y_pred_log = (y_pred_prob >= 0.5).numpy().astype(int).flatten()

# a) Create confusion matrix
cm_log = confusion_matrix(y_test, y_pred_log)  # SOLUTION

print("Confusion Matrix:")
print(cm_log)
print()

# b) Print classification report
print("Classification Report:")  # SOLUTION
print(classification_report(y_test, y_pred_log, target_names=['Class 0', 'Class 1']))

---
# Part 6: Natural Language Processing - Solutions
---

In [None]:
"""
Exercise 6.1 Solution: Text Preprocessing

Text preprocessing cleans and normalizes raw text before feeding it to ML models.
This is a critical step - garbage in, garbage out!
"""

def preprocess_text(text):
    """
    Preprocess text for NLP tasks.
    
    Steps:
    1. Convert to lowercase - makes 'Hello' and 'hello' the same token
    2. Remove punctuation - keeps only alphanumeric and spaces
    
    Args:
        text: Raw input string
    
    Returns:
        Cleaned string
    """
    # Step 1: Convert to lowercase
    text = text.lower()
    
    # Step 2: Remove non-alphanumeric characters (keep spaces)
    # [^a-z0-9\s] matches anything that's NOT a-z, 0-9, or whitespace
    text = re.sub(r'[^a-z0-9\s]', '', text)
    
    return text

# Test the function
test_text = "Hello, World! This is NLP 101."
cleaned = preprocess_text(test_text)

print(f"Original: {test_text}")
print(f"Cleaned: {cleaned}")

In [None]:
"""
Exercise 6.2 Solution: Bag of Words with CountVectorizer

Bag of Words converts text into numerical vectors by counting word occurrences.
Each document becomes a vector where each dimension represents a word in the vocabulary.
"""

# Create sample documents
documents = [
    "I love machine learning",
    "Machine learning is great",
    "I love programming"
]

# Create CountVectorizer
# This builds a vocabulary from all documents and counts occurrences
count_vectorizer = CountVectorizer()

# fit_transform: builds vocabulary AND transforms documents to vectors
X_counts = count_vectorizer.fit_transform(documents)

# Get the vocabulary (feature names)
vocabulary = count_vectorizer.get_feature_names_out()

print("Vocabulary:", vocabulary)
print(f"\nDocument-Term Matrix:")
print(X_counts.toarray())

# Explanation of the matrix:
print("\nInterpretation:")
print("Row 0 (Doc 1 'I love machine learning'):", X_counts.toarray()[0])
print("Row 1 (Doc 2 'Machine learning is great'):", X_counts.toarray()[1])
print("Row 2 (Doc 3 'I love programming'):", X_counts.toarray()[2])

In [None]:
"""
Exercise 6.3 Solution: TF-IDF Vectorization

TF-IDF weighs words by their importance:
- TF (Term Frequency): How often a word appears in a document
- IDF (Inverse Document Frequency): How rare the word is across all documents

Words that are common in one document but rare overall get HIGH scores.
Words that appear everywhere (like 'the', 'is') get LOW scores.
"""

# Create TF-IDF vectorizer
tfidf_vectorizer = TfidfVectorizer()

# Transform the same documents
X_tfidf = tfidf_vectorizer.fit_transform(documents)

print("Vocabulary:", tfidf_vectorizer.get_feature_names_out())
print(f"\nTF-IDF Matrix (rounded to 2 decimals):")
print(np.round(X_tfidf.toarray(), 2))

# Key observation
print("\nKey Observations:")
print("- 'programming' has HIGH weight (0.79) in Doc 3 - it's unique to that document")
print("- 'love' has LOWER weight - it appears in multiple documents")
print("- 'machine' and 'learning' have moderate weights - appear in 2 docs")

In [None]:
# Setup: Sentiment Analysis Dataset
reviews = [
    "This movie was amazing and wonderful",
    "I loved this film, it was great",
    "Excellent movie, highly recommended",
    "Best film I have ever seen",
    "Wonderful story and great acting",
    "This movie was terrible and boring",
    "I hated this film, it was awful",
    "Worst movie ever, do not watch",
    "Boring and disappointing film",
    "Terrible acting and bad story",
    "The movie was okay, nothing special",
    "It was an average film",
    "Not bad but not great either",
    "Mediocre movie with some good moments",
    "Fantastic cinematography and brilliant performances",
    "Absolutely dreadful, waste of time",
]

# Labels: 1 = positive, 0 = negative
labels = [1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0]

# Split into train and test
X_train_text, X_test_text, y_train_nlp, y_test_nlp = train_test_split(
    reviews, labels, test_size=0.25, random_state=42
)

print(f"Training samples: {len(X_train_text)}")
print(f"Test samples: {len(X_test_text)}")

In [None]:
"""
Exercise 6.5 Solution: Text Classification with Naive Bayes

Naive Bayes is a probabilistic classifier based on Bayes' Theorem.
It works particularly well with text because:
1. It's fast to train and predict
2. It handles high-dimensional sparse data (like TF-IDF vectors)
3. It works well even with limited training data

The 'naive' assumption: words are conditionally independent given the class.
"""

# Step 1: Create TF-IDF vectorizer and transform text
vectorizer = TfidfVectorizer()
X_train_vec = vectorizer.fit_transform(X_train_text)  # fit AND transform training data
X_test_vec = vectorizer.transform(X_test_text)        # only transform test data (use same vocabulary!)

print(f"Training vectors shape: {X_train_vec.shape}")
print(f"Test vectors shape: {X_test_vec.shape}")
print(f"Vocabulary size: {len(vectorizer.get_feature_names_out())}")

# Step 2: Create and train Naive Bayes classifier
clf = MultinomialNB()
clf.fit(X_train_vec, y_train_nlp)

# Step 3: Make predictions
y_pred_nlp = clf.predict(X_test_vec)

# Step 4: Calculate accuracy
accuracy = accuracy_score(y_test_nlp, y_pred_nlp)
print(f"\nAccuracy: {accuracy:.2f}")

# Step 5: Print classification report
print("\nClassification Report:")
print(classification_report(y_test_nlp, y_pred_nlp, target_names=['Negative', 'Positive']))

In [None]:
"""
Exercise 6.6 Solution: Predict on New Text

Use the trained model to classify completely new reviews.
Important: Use .transform() (not .fit_transform()) to use the same vocabulary!
"""

# Create new reviews to classify
new_reviews = [
    "This was the best experience ever!",
    "Horrible waste of my time",
    "It was pretty good overall",
    "I absolutely loved every moment",
    "Disappointing and frustrating"
]

# Vectorize using the SAME vectorizer (important!)
new_vectors = vectorizer.transform(new_reviews)

# Predict sentiment
predictions = clf.predict(new_vectors)

# Get probability scores
probabilities = clf.predict_proba(new_vectors)

# Display results
print("Predictions on New Reviews:\n")
for review, pred, prob in zip(new_reviews, predictions, probabilities):
    sentiment = "Positive" if pred == 1 else "Negative"
    confidence = max(prob) * 100
    print(f"Review: \"{review}\"")
    print(f"Predicted: {sentiment} (confidence: {confidence:.1f}%)")
    print()

### Code Explanation: NLP Pipeline

| Step | Code | Explanation |
|------|------|-------------|
| 1 | `TfidfVectorizer()` | Creates a vectorizer that converts text to TF-IDF weighted vectors |
| 2 | `.fit_transform(X_train)` | Builds vocabulary from training data AND transforms it to vectors |
| 3 | `.transform(X_test)` | Transforms test data using the SAME vocabulary (no fitting!) |
| 4 | `MultinomialNB()` | Creates a Naive Bayes classifier suited for discrete features like word counts |
| 5 | `.fit(X, y)` | Trains the classifier on vectorized text and labels |
| 6 | `.predict(X)` | Makes predictions on new vectorized text |
| 7 | `.predict_proba(X)` | Returns probability scores for each class |

**Why use .transform() not .fit_transform() for test data?**
- `fit_transform()` creates a NEW vocabulary from the data
- `transform()` uses the EXISTING vocabulary from training
- If we fit on test data, we might have different words/features
- The model was trained on the training vocabulary - we must use the same one!

**Common NLP Pipeline Pattern:**
```python
# Training
vectorizer = TfidfVectorizer()
X_train_vec = vectorizer.fit_transform(X_train)  # fit + transform
clf = MultinomialNB()
clf.fit(X_train_vec, y_train)

# Testing/Inference
X_test_vec = vectorizer.transform(X_test)  # only transform!
predictions = clf.predict(X_test_vec)
```

---

# Solutions Summary

## Exercise 1.1: Create Tensors
```python
tensor_a = torch.tensor([10, 20, 30, 40, 50])
tensor_b = torch.zeros(3, 4)
tensor_c = torch.rand(2, 5)
```

## Exercise 1.2: Tensor Operations
```python
result_add = x + y
result_mean = x.mean()
result_row_sum = x.sum(dim=1)
result_reshape = x.reshape(3, 2)
```

## Exercise 1.3: Autograd
```python
x = torch.tensor([2.0], requires_grad=True)
y = 3*x**2 - 4*x + 5
y.backward()
```

## Exercise 2.1: Linear Regression
```python
self.linear = nn.Linear(1, 1)
return self.linear(x)
my_criterion = nn.MSELoss()
my_optimizer = optim.SGD(my_model.parameters(), lr=0.01)
```

## Exercise 3.1: Logistic Regression
```python
self.linear = nn.Linear(input_dim, 1)
self.sigmoid = nn.Sigmoid()
return self.sigmoid(self.linear(x))
my_bce_loss = nn.BCELoss()
my_opt = optim.Adam(my_classifier.parameters(), lr=0.1)
```

## Exercise 4.1: SVM
```python
my_svm = SVC(kernel='rbf', C=10)
my_svm.fit(X_train_m, y_train_m)
```

## Exercise 5.1: Evaluation
```python
cm_log = confusion_matrix(y_test, y_pred_log)
print(classification_report(y_test, y_pred_log))
```

## Exercise 6.1-6.6: NLP
```python
# Text preprocessing
text = text.lower()
text = re.sub(r'[^a-z0-9\s]', '', text)

# Bag of Words
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(documents)

# TF-IDF
tfidf = TfidfVectorizer()
X_tfidf = tfidf.fit_transform(documents)

# Naive Bayes Classification
vectorizer = TfidfVectorizer()
X_train_vec = vectorizer.fit_transform(X_train_text)
X_test_vec = vectorizer.transform(X_test_text)
clf = MultinomialNB()
clf.fit(X_train_vec, y_train_nlp)
predictions = clf.predict(X_test_vec)
```