# Lab 3 – PyTorch Fundamentals: From Keras to PyTorch

> **📚 Prerequisites**: Lab 1 (Keras) and Lab 2 (GradientTape) completed

In this lab, you'll learn **PyTorch fundamentals** - the core library that powers modern LLMs and Unsloth. We'll build the same concepts from Keras but using PyTorch's more explicit, flexible approach.

## Why PyTorch?

- **LLM Standard**: All major LLMs (GPT, LLaMA, Qwen) are built with PyTorch
- **Unsloth Foundation**: Unsloth is built on PyTorch for efficient fine-tuning
- **Research Flexibility**: More control than Keras for custom architectures
- **Industry Standard**: Used by Meta, OpenAI, Hugging Face, and more

## Objectives

- Understand PyTorch's tensor operations and automatic differentiation
- Build neural networks using `torch.nn` (equivalent to Keras layers)
- Implement training loops with optimizers and loss functions
- Compare PyTorch vs Keras syntax and concepts
- Prepare for Unsloth and advanced LLM techniques

**Note**: This lab focuses on PyTorch basics. You'll use these concepts in all subsequent labs!


In [None]:
# Install PyTorch (uncomment if needed)
# !pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# For GPU support: !pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118


### Step 1: PyTorch Tensors (like NumPy but with GPU support)

**Documentation:**
- PyTorch Tensors: https://pytorch.org/docs/stable/tensors.html
- Tensor operations: https://pytorch.org/docs/stable/torch.html
- Device management: https://pytorch.org/docs/stable/notes/cuda.html

**Key Concepts:**
- `torch.tensor()` - create tensors (like `np.array()`)
- `.to(device)` - move tensors to GPU/CPU
- `requires_grad=True` - enable automatic differentiation
- `.backward()` - compute gradients

**Your Task:**
1. Create tensors for input data (x) and targets (y)
2. Set up a simple linear relationship: y = 2x + 1
3. Add some noise to make it realistic
4. Move tensors to GPU if available
5. Print tensor shapes and device information


In [None]:
# TODO: Create PyTorch tensors for training data
# Step 1a: Import PyTorch
# import torch
# import torch.nn as nn
# import numpy as np

# Step 1b: Check if GPU is available
# device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
# print(f"Using device: {device}")

# Step 1c: Create input data (x values from 0 to 10)
# x_data = torch.linspace(0, 10, 100, device=device)

# Step 1d: Create target data (y = 2x + 1 + noise)
# y_data = 2 * x_data + 1 + torch.randn_like(x_data) * 0.5

# Step 1e: Reshape for neural network (needs batch dimension)
# x_train = x_data.unsqueeze(1)  # Shape: (100, 1)
# y_train = y_data.unsqueeze(1)  # Shape: (100, 1)

# Step 1f: Print information
# print(f"x_train shape: {x_train.shape}")
# print(f"y_train shape: {y_train.shape}")
# print(f"x_train device: {x_train.device}")

print("TODO: Implement tensor creation above")


### Step 2: Building Neural Networks with `torch.nn`

**Documentation:**
- PyTorch nn module: https://pytorch.org/docs/stable/nn.html
- Sequential models: https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html
- Linear layers: https://pytorch.org/docs/stable/generated/torch.nn.Linear.html

**Key Concepts:**
- `nn.Sequential()` - like Keras Sequential model
- `nn.Linear(in_features, out_features)` - like Keras Dense layer
- `nn.ReLU()` - activation function
- Model parameters: `model.parameters()`

**Your Task:**
1. Create a neural network with 2 hidden layers
2. Use ReLU activation between layers
3. Print model architecture and parameter count
4. Compare with Keras syntax from Lab 1


In [None]:
# TODO: Create a PyTorch neural network
# Step 2a: Define the model architecture
# model = nn.Sequential(
#     nn.Linear(1, 16),      # Input layer: 1 feature -> 16 neurons
#     nn.ReLU(),             # Activation function
#     nn.Linear(16, 8),      # Hidden layer: 16 -> 8 neurons
#     nn.ReLU(),             # Activation function
#     nn.Linear(8, 1)        # Output layer: 8 -> 1 neuron (regression)
# )

# Step 2b: Move model to device (GPU/CPU)
# model = model.to(device)

# Step 2c: Print model information
# print("Model architecture:")
# print(model)
# print(f"\nTotal parameters: {sum(p.numel() for p in model.parameters())}")
# print(f"Trainable parameters: {sum(p.numel() for p in model.parameters() if p.requires_grad)}")

print("TODO: Implement neural network creation above")


### Step 3: Training Loop (PyTorch's Explicit Approach)

**Documentation:**
- Optimizers: https://pytorch.org/docs/stable/optim.html
- Loss functions: https://pytorch.org/docs/stable/nn.html#loss-functions
- Training loops: https://pytorch.org/tutorials/beginner/introyt/trainingyt.html

**Key Concepts:**
- `optimizer.zero_grad()` - clear gradients
- `loss.backward()` - compute gradients
- `optimizer.step()` - update weights
- Manual training loop (vs Keras `.fit()`)

**Your Task:**
1. Set up optimizer (Adam) and loss function (MSE)
2. Implement training loop for 100 epochs
3. Track loss during training
4. Print progress every 20 epochs


In [None]:
# TODO: Implement PyTorch training loop
# Step 3a: Set up optimizer and loss function
# optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
# criterion = nn.MSELoss()

# Step 3b: Training loop
# num_epochs = 100
# loss_history = []

# model.train()  # Set model to training mode
# for epoch in range(num_epochs):
#     # Forward pass
#     predictions = model(x_train)
#     loss = criterion(predictions, y_train)
#     
#     # Backward pass
#     optimizer.zero_grad()  # Clear gradients
#     loss.backward()         # Compute gradients
#     optimizer.step()        # Update weights
#     
#     # Track loss
#     loss_history.append(loss.item())
#     
#     # Print progress
#     if (epoch + 1) % 20 == 0:
#         print(f"Epoch {epoch + 1:3d}: Loss = {loss.item():.6f}")

print("TODO: Implement training loop above")


### Step 4: Model Evaluation and Visualization

**Documentation:**
- Model evaluation: https://pytorch.org/docs/stable/nn.html#torch.nn.Module.eval
- No gradient context: https://pytorch.org/docs/stable/autograd.html#torch.autograd.no_grad

**Key Concepts:**
- `model.eval()` - set to evaluation mode
- `torch.no_grad()` - disable gradient computation
- `.item()` - convert single-element tensor to Python number

**Your Task:**
1. Plot training loss curve
2. Make predictions on test data
3. Compare predictions with true values
4. Calculate final loss


In [None]:
# TODO: Evaluate the trained model
# Step 4a: Plot training loss
# import matplotlib.pyplot as plt
# plt.figure(figsize=(10, 6))
# plt.plot(loss_history)
# plt.title('Training Loss')
# plt.xlabel('Epoch')
# plt.ylabel('Loss')
# plt.grid(True)
# plt.show()

# Step 4b: Make predictions on test data
# model.eval()  # Set to evaluation mode
# with torch.no_grad():  # Disable gradient computation
#     test_x = torch.tensor([[2.0], [5.0], [8.0]], device=device)
#     predictions = model(test_x)
#     
#     print("\nPredictions:")
#     for x_val, pred_val in zip(test_x.flatten(), predictions.flatten()):
#         expected = 2 * x_val.item() + 1  # True relationship
#         print(f"  x={x_val.item():.1f}: predicted={pred_val.item():.2f}, expected={expected:.2f}")

print("TODO: Implement model evaluation above")


### Step 5: PyTorch vs Keras Comparison

**Key Differences:**

| Concept | Keras | PyTorch |
|---------|-------|---------|
| **Model Definition** | `Sequential([Dense(16), ReLU()])` | `Sequential(Linear(1,16), ReLU())` |
| **Training** | `model.fit(x, y, epochs=100)` | Manual loop with optimizer |
| **Gradients** | Automatic in `.fit()` | Manual `loss.backward()` |
| **Device** | `with strategy.scope():` | `.to(device)` |
| **Evaluation** | `model.evaluate()` | `model.eval()` + `torch.no_grad()` |

**Why PyTorch for LLMs?**
- **Flexibility**: Custom architectures (Transformers, attention)
- **Research**: Easy to implement new techniques
- **Ecosystem**: Hugging Face, Unsloth, research libraries
- **Performance**: Better GPU utilization for large models

**Your Task:**
1. Compare the code you wrote with Lab 1 (Keras)
2. Note the differences in syntax and approach
3. Understand why PyTorch is preferred for LLMs


In [None]:
# TODO: Compare PyTorch vs Keras approaches
# Step 5a: Reflect on the differences
# print("PyTorch vs Keras Comparison:")
# print("=" * 50)
# print("\n1. Model Definition:")
# print("   Keras: Sequential([Dense(16), ReLU()])")
# print("   PyTorch: Sequential(Linear(1,16), ReLU())")
# print("\n2. Training:")
# print("   Keras: model.fit(x, y, epochs=100)  # One line!")
# print("   PyTorch: Manual loop with optimizer.step()  # More control")
# print("\n3. Device Management:")
# print("   Keras: with strategy.scope():  # TensorFlow strategy")
# print("   PyTorch: model.to(device)  # Explicit device placement")
# print("\n4. Why PyTorch for LLMs?")
# print("   - More flexible for custom architectures")
# print("   - Better research ecosystem (Hugging Face, Unsloth)")
# print("   - Explicit control over training process")
# print("   - Industry standard for large language models")

print("TODO: Complete the comparison above")


## Reflection

**PyTorch Fundamentals Learned:**
- ✅ Tensor operations and device management
- ✅ Neural network construction with `nn.Sequential`
- ✅ Manual training loops with optimizers
- ✅ Model evaluation and gradient control

**Key Takeaways:**
- **Explicit vs Implicit**: PyTorch gives you more control but requires more code
- **Device Management**: Always move tensors and models to the same device
- **Training Loop**: Understand the forward → backward → step cycle
- **Gradient Control**: Use `no_grad()` for inference, `zero_grad()` for training

**Next Steps:**
- You're now ready for **Lab 4: Hello Unsloth**!
- Unsloth builds on these PyTorch concepts
- You'll see how Unsloth simplifies LLM fine-tuning

**Questions to Consider:**
- How does PyTorch's explicit approach help with debugging?
- Why might researchers prefer PyTorch over Keras?
- How will these concepts apply to large language models?
