# üìò Day 2: Introduction to PyTorch

**üéØ Goal:** Master PyTorch for building deep learning models

**‚è±Ô∏è Time:** 60-90 minutes

**üåü Why This Matters for AI:**
- PyTorch powers cutting-edge AI research: GPT, DALL-E, Meta's LLaMA, Tesla Autopilot
- Preferred by researchers and AI labs (OpenAI, Meta, Tesla, DeepMind)
- More Pythonic and intuitive than TensorFlow
- Dynamic computation graphs = easier debugging and experimentation

**üî• 2024-2025 AI Trends:**
- Most transformer models (GPT, BERT, LLaMA) are built with PyTorch
- PyTorch dominates research papers and new AI innovations
- HuggingFace (largest AI model hub) uses PyTorch
- Essential for fine-tuning LLMs and building RAG systems

---

## üöÄ What is PyTorch?

**PyTorch** = Meta's (Facebook's) deep learning framework

**Why PyTorch is Special:**
- üêç **Pythonic**: Feels like natural Python code
- üîß **Dynamic**: Change model structure on the fly
- üîç **Debuggable**: Use standard Python debuggers
- üöÄ **Research-friendly**: Easy to experiment with new ideas

**Real-World Uses:**
- ChatGPT and GPT-4 (OpenAI)
- LLaMA and Llama 2 (Meta)
- Tesla Autopilot (computer vision)
- Midjourney (image generation)

**TensorFlow vs PyTorch:**
- TensorFlow: Better for production, mobile, web deployment
- PyTorch: Better for research, experimentation, prototyping

---

## üì¶ Installation & Setup

In [None]:
# Install PyTorch (run this once)
!pip install torch torchvision torchaudio numpy matplotlib scikit-learn

# Import libraries
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import DataLoader, TensorDataset
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt

# Check PyTorch version
print(f"‚úÖ PyTorch Version: {torch.__version__}")
print(f"‚úÖ CUDA Available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"‚úÖ GPU: {torch.cuda.get_device_name(0)}")
else:
    print("üíª Using CPU (GPU not available)")

# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"\nüéØ Using device: {device}")

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

## üéØ Section 1: PyTorch Tensors

**Tensors** = The fundamental data structure in PyTorch (like NumPy arrays but GPU-enabled)

**Key Difference from NumPy:**
- PyTorch tensors can run on GPU
- Automatic differentiation (autograd) for backpropagation

---

In [None]:
# Creating tensors

# From Python list
tensor_from_list = torch.tensor([1, 2, 3, 4, 5])
print(f"From list: {tensor_from_list}")
print(f"Data type: {tensor_from_list.dtype}\n")

# 2D tensor (matrix)
matrix = torch.tensor([[1, 2, 3],
                       [4, 5, 6]])
print(f"Matrix:\n{matrix}")
print(f"Shape: {matrix.shape}\n")

# From NumPy array
numpy_array = np.array([1, 2, 3])
tensor_from_numpy = torch.from_numpy(numpy_array)
print(f"From NumPy: {tensor_from_numpy}\n")

# Special tensors
zeros = torch.zeros(2, 3)  # 2x3 tensor of zeros
ones = torch.ones(2, 3)    # 2x3 tensor of ones
random = torch.randn(2, 3) # 2x3 tensor with random values

print(f"Zeros:\n{zeros}\n")
print(f"Ones:\n{ones}\n")
print(f"Random:\n{random}")

### üî¢ Tensor Operations

In [None]:
# Basic operations
a = torch.tensor([1, 2, 3])
b = torch.tensor([4, 5, 6])

# Addition
print(f"Addition: {a + b}")

# Multiplication (element-wise)
print(f"Multiplication: {a * b}")

# Matrix multiplication
mat1 = torch.tensor([[1, 2], [3, 4]])
mat2 = torch.tensor([[5, 6], [7, 8]])
print(f"\nMatrix multiplication:\n{torch.matmul(mat1, mat2)}")
# Or use @ operator
print(f"\nUsing @ operator:\n{mat1 @ mat2}")

# Reshaping
x = torch.randn(6)
print(f"\nOriginal shape: {x.shape}")
x_reshaped = x.view(2, 3)  # Reshape to 2x3
print(f"Reshaped:\n{x_reshaped}")
print(f"New shape: {x_reshaped.shape}")

### ‚ö° GPU Acceleration

In [None]:
# Moving tensors to GPU (if available)
cpu_tensor = torch.randn(3, 3)
print(f"CPU Tensor:\n{cpu_tensor}")
print(f"Device: {cpu_tensor.device}\n")

# Move to GPU
if torch.cuda.is_available():
    gpu_tensor = cpu_tensor.to('cuda')
    print(f"GPU Tensor:\n{gpu_tensor}")
    print(f"Device: {gpu_tensor.device}")
else:
    print("GPU not available, staying on CPU")

# Recommended: Use device variable
tensor = torch.randn(3, 3).to(device)
print(f"\nTensor on {device}:\n{tensor}")

## üéØ Section 2: Autograd - Automatic Differentiation

**Autograd** = PyTorch's automatic differentiation engine

**Why This Matters:**
- Automatically computes gradients for backpropagation
- No need to manually calculate derivatives
- Core of deep learning training

**How it works:**
1. Set `requires_grad=True` on tensors you want to track
2. Perform operations
3. Call `.backward()` to compute gradients
4. Access gradients with `.grad`

---

In [None]:
# Example: Computing gradients

# Create a tensor and enable gradient tracking
x = torch.tensor([2.0], requires_grad=True)
print(f"x = {x}")

# Define a function: y = x^2 + 2x + 1
y = x**2 + 2*x + 1
print(f"y = x^2 + 2x + 1 = {y}")

# Compute gradients (dy/dx)
y.backward()

# Access gradient: dy/dx = 2x + 2 = 2(2) + 2 = 6
print(f"\nGradient dy/dx at x=2: {x.grad}")
print(f"Expected: 2x + 2 = 2(2) + 2 = 6 ‚úì")

In [None]:
# Example: Neural network gradient

# Weights and biases
w = torch.tensor([1.0, 2.0, 3.0], requires_grad=True)
b = torch.tensor([1.0], requires_grad=True)

# Input
x = torch.tensor([1.0, 2.0, 3.0])

# Forward pass: y = w¬∑x + b
y = torch.dot(w, x) + b
print(f"Output y = {y}")

# Backward pass
y.backward()

# Gradients
print(f"\nGradient dw: {w.grad}")
print(f"Gradient db: {b.grad}")
print("\nüß† These gradients are used to update weights during training!")

## üèóÔ∏è Section 3: Building Models with nn.Module

**nn.Module** = Base class for all neural network modules in PyTorch

**Structure:**
1. `__init__()`: Define layers
2. `forward()`: Define forward pass (how data flows)

**PyTorch automatically handles:**
- Backward pass (gradient computation)
- Parameter management
- GPU/CPU transfers

---

In [None]:
# Example 1: Simple Neural Network
class SimpleNN(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(SimpleNN, self).__init__()
        # Define layers
        self.fc1 = nn.Linear(input_size, hidden_size)  # First hidden layer
        self.fc2 = nn.Linear(hidden_size, hidden_size) # Second hidden layer
        self.fc3 = nn.Linear(hidden_size, output_size) # Output layer
        
    def forward(self, x):
        # Define forward pass
        x = F.relu(self.fc1(x))  # Hidden layer 1 + ReLU
        x = F.relu(self.fc2(x))  # Hidden layer 2 + ReLU
        x = self.fc3(x)          # Output layer (no activation)
        return x

# Create model
model = SimpleNN(input_size=10, hidden_size=64, output_size=1)
print(model)

# Count parameters
total_params = sum(p.numel() for p in model.parameters())
print(f"\nTotal parameters: {total_params:,}")

In [None]:
# Test the model with random input
sample_input = torch.randn(5, 10)  # Batch of 5 samples, 10 features each
output = model(sample_input)
print(f"Input shape: {sample_input.shape}")
print(f"Output shape: {output.shape}")
print(f"\nOutput:\n{output}")

### üñºÔ∏è Example 2: CNN for Image Classification

In [None]:
# Convolutional Neural Network
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        # Convolutional layers
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)  # 1 input channel, 32 output
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1) # 32 input, 64 output
        self.conv3 = nn.Conv2d(64, 128, kernel_size=3, padding=1) # 64 input, 128 output
        
        # Pooling layer
        self.pool = nn.MaxPool2d(2, 2)
        
        # Fully connected layers
        self.fc1 = nn.Linear(128 * 3 * 3, 128)  # After 3 pooling: 28->14->7->3
        self.fc2 = nn.Linear(128, 10)           # 10 classes
        
        # Dropout for regularization
        self.dropout = nn.Dropout(0.5)
        
    def forward(self, x):
        # Conv block 1
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        
        # Conv block 2
        x = F.relu(self.conv2(x))
        x = self.pool(x)
        
        # Conv block 3
        x = F.relu(self.conv3(x))
        x = self.pool(x)
        
        # Flatten
        x = x.view(x.size(0), -1)  # Flatten: (batch_size, 128*3*3)
        
        # Fully connected layers
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        
        return x

# Create CNN model
cnn_model = CNN().to(device)
print(cnn_model)

# Count parameters
total_params = sum(p.numel() for p in cnn_model.parameters())
print(f"\nTotal parameters: {total_params:,}")

## üéØ Section 4: Training Loop in PyTorch

**PyTorch Training Loop** = You write the loop explicitly (unlike Keras's `.fit()`)

**Advantages:**
- ‚úÖ Full control over training process
- ‚úÖ Easy to customize
- ‚úÖ Better for research and experimentation

**Standard Training Loop:**
```python
for epoch in range(num_epochs):
    for batch in dataloader:
        # 1. Forward pass
        outputs = model(inputs)
        loss = criterion(outputs, targets)
        
        # 2. Backward pass
        optimizer.zero_grad()  # Clear gradients
        loss.backward()        # Compute gradients
        optimizer.step()       # Update weights
```

---

In [None]:
# Simple example: Train on synthetic data

# Generate synthetic data
X_train = torch.randn(1000, 10).to(device)
y_train = torch.randint(0, 2, (1000,)).to(device)  # Binary classification

# Create DataLoader
train_dataset = TensorDataset(X_train, y_train)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)

# Create model, loss, optimizer
model = SimpleNN(10, 64, 2).to(device)  # 2 output classes
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# Training loop
num_epochs = 5
for epoch in range(num_epochs):
    epoch_loss = 0.0
    correct = 0
    total = 0
    
    for batch_X, batch_y in train_loader:
        # 1. Forward pass
        outputs = model(batch_X)
        loss = criterion(outputs, batch_y)
        
        # 2. Backward pass
        optimizer.zero_grad()  # Clear previous gradients
        loss.backward()        # Compute gradients
        optimizer.step()       # Update weights
        
        # Track metrics
        epoch_loss += loss.item()
        _, predicted = torch.max(outputs.data, 1)
        total += batch_y.size(0)
        correct += (predicted == batch_y).sum().item()
    
    # Print epoch stats
    avg_loss = epoch_loss / len(train_loader)
    accuracy = 100 * correct / total
    print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {avg_loss:.4f}, Accuracy: {accuracy:.2f}%")

print("\n‚úÖ Training complete!")

## üñºÔ∏è Section 5: REAL AI EXAMPLE - Text Classifier

**Project:** Build a sentiment classifier for movie reviews

**Real-World Applications:**
- Social media sentiment analysis
- Customer review classification
- Brand monitoring
- Market sentiment prediction

**Dataset:** IMDB Movie Reviews (25,000 reviews)

---

In [None]:
# For this example, we'll simulate text data with embeddings
# In real applications, you'd use libraries like torchtext or HuggingFace

# Simulated text embeddings (pretend we already embedded text)
# Real dimensions: (num_samples, sequence_length, embedding_dim)
train_samples = 10000
test_samples = 2000
sequence_length = 200  # Max words in review
embedding_dim = 100    # Word embedding dimension

# Generate synthetic data (in reality, these would be real text embeddings)
X_train_text = torch.randn(train_samples, sequence_length, embedding_dim)
y_train_text = torch.randint(0, 2, (train_samples,))  # 0=negative, 1=positive

X_test_text = torch.randn(test_samples, sequence_length, embedding_dim)
y_test_text = torch.randint(0, 2, (test_samples,))

print(f"Training data shape: {X_train_text.shape}")
print(f"Training labels shape: {y_train_text.shape}")
print(f"\nEach review: {sequence_length} words, each word: {embedding_dim}-dim embedding")

In [None]:
# Build Text Classifier with LSTM
class TextClassifierLSTM(nn.Module):
    def __init__(self, embedding_dim, hidden_dim, output_dim, num_layers=2, dropout=0.5):
        super(TextClassifierLSTM, self).__init__()
        
        # LSTM layer
        self.lstm = nn.LSTM(
            embedding_dim,
            hidden_dim,
            num_layers=num_layers,
            batch_first=True,
            dropout=dropout if num_layers > 1 else 0
        )
        
        # Fully connected layer
        self.fc = nn.Linear(hidden_dim, output_dim)
        
        # Dropout
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, x):
        # x shape: (batch_size, sequence_length, embedding_dim)
        
        # LSTM output
        lstm_out, (hidden, cell) = self.lstm(x)
        
        # Use the last hidden state
        hidden = hidden[-1, :, :]  # Shape: (batch_size, hidden_dim)
        
        # Dropout and fully connected
        hidden = self.dropout(hidden)
        output = self.fc(hidden)
        
        return output

# Create model
text_model = TextClassifierLSTM(
    embedding_dim=100,
    hidden_dim=128,
    output_dim=2,  # 2 classes: positive/negative
    num_layers=2,
    dropout=0.5
).to(device)

print("üìù Text Classifier Architecture:")
print(text_model)
print(f"\nTotal parameters: {sum(p.numel() for p in text_model.parameters()):,}")

In [None]:
# Prepare data loaders
train_dataset = TensorDataset(X_train_text, y_train_text)
test_dataset = TensorDataset(X_test_text, y_test_text)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(text_model.parameters(), lr=0.001)

print("‚úÖ Data loaders ready!")
print(f"Training batches: {len(train_loader)}")
print(f"Test batches: {len(test_loader)}")

In [None]:
# Training loop with validation
def train_model(model, train_loader, test_loader, criterion, optimizer, num_epochs=5):
    train_losses = []
    test_accuracies = []
    
    for epoch in range(num_epochs):
        # Training phase
        model.train()
        epoch_loss = 0.0
        train_correct = 0
        train_total = 0
        
        for batch_X, batch_y in train_loader:
            batch_X, batch_y = batch_X.to(device), batch_y.to(device)
            
            # Forward pass
            outputs = model(batch_X)
            loss = criterion(outputs, batch_y)
            
            # Backward pass
            optimizer.zero_grad()
            loss.backward()
            optimizer.step()
            
            # Track metrics
            epoch_loss += loss.item()
            _, predicted = torch.max(outputs.data, 1)
            train_total += batch_y.size(0)
            train_correct += (predicted == batch_y).sum().item()
        
        # Validation phase
        model.eval()
        test_correct = 0
        test_total = 0
        
        with torch.no_grad():  # No gradient computation during validation
            for batch_X, batch_y in test_loader:
                batch_X, batch_y = batch_X.to(device), batch_y.to(device)
                outputs = model(batch_X)
                _, predicted = torch.max(outputs.data, 1)
                test_total += batch_y.size(0)
                test_correct += (predicted == batch_y).sum().item()
        
        # Calculate metrics
        avg_loss = epoch_loss / len(train_loader)
        train_acc = 100 * train_correct / train_total
        test_acc = 100 * test_correct / test_total
        
        train_losses.append(avg_loss)
        test_accuracies.append(test_acc)
        
        print(f"Epoch [{epoch+1}/{num_epochs}]")
        print(f"  Loss: {avg_loss:.4f} | Train Acc: {train_acc:.2f}% | Test Acc: {test_acc:.2f}%")
    
    return train_losses, test_accuracies

# Train the model
print("üöÄ Training Text Classifier...\n")
losses, accuracies = train_model(text_model, train_loader, test_loader, criterion, optimizer, num_epochs=5)
print("\n‚úÖ Training complete!")

In [None]:
# Visualize training progress
plt.figure(figsize=(12, 4))

plt.subplot(1, 2, 1)
plt.plot(losses, marker='o')
plt.title('Training Loss')
plt.xlabel('Epoch')
plt.ylabel('Loss')
plt.grid(True)

plt.subplot(1, 2, 2)
plt.plot(accuracies, marker='o', color='green')
plt.title('Test Accuracy')
plt.xlabel('Epoch')
plt.ylabel('Accuracy (%)')
plt.grid(True)

plt.tight_layout()
plt.show()

## üéØ Section 6: TensorFlow vs PyTorch Comparison

Let's implement the SAME model in both frameworks!

---

In [None]:
print("="*60)
print("TensorFlow vs PyTorch: Simple CNN Comparison")
print("="*60)

# PyTorch Version
print("\nüî• PyTorch Version:")
print("""
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, 3)
        self.conv2 = nn.Conv2d(32, 64, 3)
        self.fc1 = nn.Linear(64*5*5, 128)
        self.fc2 = nn.Linear(128, 10)
    
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2)
        x = x.view(-1, 64*5*5)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x

model = CNN()
optimizer = optim.Adam(model.parameters())
criterion = nn.CrossEntropyLoss()

# Training loop (you write it!)
for epoch in range(epochs):
    for batch in dataloader:
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
""")

# TensorFlow Version
print("\nüîµ TensorFlow Version:")
print("""
model = keras.Sequential([
    layers.Conv2D(32, 3, activation='relu'),
    layers.MaxPooling2D(2),
    layers.Conv2D(64, 3, activation='relu'),
    layers.MaxPooling2D(2),
    layers.Flatten(),
    layers.Dense(128, activation='relu'),
    layers.Dense(10)
])

model.compile(
    optimizer='adam',
    loss='sparse_categorical_crossentropy',
    metrics=['accuracy']
)

# Training (one line!)
model.fit(X_train, y_train, epochs=epochs)
""")

print("\n" + "="*60)
print("Key Differences:")
print("="*60)
print("""
PyTorch:
‚úÖ More control over training loop
‚úÖ Pythonic and intuitive
‚úÖ Better for research and experimentation
‚úÖ Dynamic computation graphs
‚úÖ Easier debugging (use Python debugger)

TensorFlow/Keras:
‚úÖ Simpler API (model.fit())
‚úÖ Better for production deployment
‚úÖ TensorFlow Lite for mobile
‚úÖ TensorFlow.js for web
‚úÖ Better documentation and tutorials

üéØ Recommendation:
- Research & Prototyping ‚Üí PyTorch
- Production & Deployment ‚Üí TensorFlow
- Learn BOTH for maximum flexibility!
""")

## üéØ Interactive Exercise 1: Build Your Own PyTorch Model

**Challenge:** Build a CNN to classify MNIST digits using PyTorch

**Requirements:**
1. Use torchvision to load MNIST
2. Build a CNN with 2 conv layers
3. Train for 3 epochs
4. Report test accuracy

**Starter Code Below** üëá

In [None]:
# Exercise 1: MNIST Classifier in PyTorch

# Step 1: Load MNIST
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])

train_dataset = torchvision.datasets.MNIST(
    root='./data',
    train=True,
    transform=transform,
    download=True
)

test_dataset = torchvision.datasets.MNIST(
    root='./data',
    train=False,
    transform=transform,
    download=True
)

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

print(f"‚úÖ Training samples: {len(train_dataset)}")
print(f"‚úÖ Test samples: {len(test_dataset)}")

# Step 2: Build your CNN
# TODO: Define your CNN class here

# Step 3: Create model, loss, optimizer
# TODO: Your code here

# Step 4: Train the model
# TODO: Your code here

# Step 5: Evaluate
# TODO: Your code here

### ‚úÖ Solution to Exercise 1

In [None]:
# Solution: MNIST Classifier

class MNISTNet(nn.Module):
    def __init__(self):
        super(MNISTNet, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)
        self.dropout = nn.Dropout(0.5)
    
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        x = F.relu(self.conv2(x))
        x = self.pool(x)
        x = x.view(-1, 64 * 7 * 7)
        x = F.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.fc2(x)
        return x

# Create model
mnist_model = MNISTNet().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(mnist_model.parameters(), lr=0.001)

# Train
print("üöÄ Training MNIST model...\n")
for epoch in range(3):
    mnist_model.train()
    running_loss = 0.0
    for i, (images, labels) in enumerate(train_loader):
        images, labels = images.to(device), labels.to(device)
        
        optimizer.zero_grad()
        outputs = mnist_model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
    
    print(f"Epoch [{epoch+1}/3], Loss: {running_loss/len(train_loader):.4f}")

# Evaluate
mnist_model.eval()
correct = 0
total = 0
with torch.no_grad():
    for images, labels in test_loader:
        images, labels = images.to(device), labels.to(device)
        outputs = mnist_model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

accuracy = 100 * correct / total
print(f"\nüìä Test Accuracy: {accuracy:.2f}%")

## üéØ Interactive Exercise 2: Compare TensorFlow vs PyTorch

**Challenge:** Implement the SAME model in both frameworks and compare:
1. Code simplicity
2. Training speed
3. Final accuracy

**Task:** Choose one and explain why you prefer it!

In [None]:
# Exercise 2: Your comparison notes
print("""
My Framework Comparison:
========================

TensorFlow:
- Pros: ___________
- Cons: ___________
- Use when: ___________

PyTorch:
- Pros: ___________
- Cons: ___________
- Use when: ___________

My Preference: ___________
Reason: ___________
""")

## üíæ Saving and Loading Models in PyTorch

In [None]:
# Method 1: Save entire model
torch.save(mnist_model, 'mnist_model.pth')
print("‚úÖ Model saved as 'mnist_model.pth'")

# Method 2: Save only state dict (recommended)
torch.save(mnist_model.state_dict(), 'mnist_model_state.pth')
print("‚úÖ State dict saved as 'mnist_model_state.pth'")

# Load model
loaded_model = MNISTNet().to(device)
loaded_model.load_state_dict(torch.load('mnist_model_state.pth'))
loaded_model.eval()
print("‚úÖ Model loaded successfully!")

## üéâ Congratulations!

**You just learned:**
- ‚úÖ PyTorch tensors and operations
- ‚úÖ Autograd for automatic differentiation
- ‚úÖ Building models with nn.Module
- ‚úÖ Writing custom training loops
- ‚úÖ Building CNN and LSTM models
- ‚úÖ Text classification with PyTorch
- ‚úÖ TensorFlow vs PyTorch comparison

**üî• Real-World Skills:**
- Build production-ready PyTorch models
- Implement custom architectures for research
- Fine-tune pre-trained models (like GPT, BERT)
- Deploy models with TorchServe

**üéØ Practice Challenges:**
1. Build a ResNet-style model with skip connections
2. Implement a Transformer encoder block
3. Create a custom loss function
4. Add learning rate scheduling
5. Implement early stopping

**üî• 2024-2025 Trends:**
- Fine-tuning LLMs with PyTorch
- Building RAG systems with PyTorch embeddings
- Multimodal AI with CLIP (PyTorch)
- Deploying models with HuggingFace

---

**üìö Next Lesson:** Day 3 - Deep Learning Project (End-to-end Fashion-MNIST classifier!)

**üí¨ Questions?** Experiment with different architectures, optimizers, and hyperparameters!

---

*Remember: PyTorch powers the latest AI breakthroughs - GPT, LLaMA, Stable Diffusion, and more!* üöÄ