# Lab 5: Deploying Your Language Model (SLM-2)
## From Training to Production

**Duration**: ~2-3 hours

### Learning Objectives
By the end of this lab, you will be able to:
1. Save and load trained PyTorch models
2. Understand inference settings (temperature, top-k, top-p)
3. Build a web interface with Gradio
4. Deploy your model for others to use!

### Prerequisites
- Completed Lab 4 (SLM Building)

### The Big Picture

```
Lab 4: TRAINING               Lab 5: DEPLOYMENT

┌─────────────┐              ┌─────────────┐
│   Dataset   │              │ Saved Model │
└──────┬──────┘              └──────┬──────┘
       │                            │
       ▼                            ▼
┌─────────────┐              ┌─────────────┐
│  Training   │     ──►      │  Inference  │
│    Loop     │   save       │   Engine    │
└──────┬──────┘              └──────┬──────┘
       │                            │
       ▼                            ▼
┌─────────────┐              ┌─────────────┐
│   Weights   │              │  Gradio UI  │
└─────────────┘              └─────────────┘
                                    │
                                    ▼
                             ┌─────────────┐
                             │   Users!    │
                             └─────────────┘
```

In [None]:
# Install Gradio (for web interface)
# !pip install gradio

In [None]:
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np
import os

torch.manual_seed(42)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")

---

# Part 1: Saving & Loading Models

## Why Save Models?

Training takes time! We don't want to retrain every time we use the model.

**Save once, use forever.**

In [None]:
# First, let's recreate our model from Lab 4
# (Or load it if you saved it!)

# Download dataset
import urllib.request

url = "https://raw.githubusercontent.com/karpathy/makemore/master/names.txt"
filename = "names.txt"

if not os.path.exists(filename):
    urllib.request.urlretrieve(url, filename)

with open(filename, 'r') as f:
    names = f.read().splitlines()

# Build vocabulary
chars = sorted(list(set(''.join(names).lower())))
chars = ['.'] + chars
vocab_size = len(chars)

char_to_idx = {c: i for i, c in enumerate(chars)}
idx_to_char = {i: c for i, c in enumerate(chars)}

print(f"Vocabulary: {chars}")
print(f"Vocab size: {vocab_size}")

In [None]:
# Model definition (same as Lab 4)

class CharLM(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, context_length):
        super().__init__()
        self.context_length = context_length
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.fc1 = nn.Linear(context_length * embed_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, vocab_size)
        
    def forward(self, x):
        emb = self.embedding(x)
        emb = emb.view(emb.size(0), -1)
        h = torch.tanh(self.fc1(emb))
        logits = self.fc2(h)
        return logits

# Config
context_length = 3
embed_dim = 10
hidden_dim = 100

In [None]:
# SOLVED EXAMPLE: Quick Training (or load saved model)

def build_dataset(names, context_length):
    X, Y = [], []
    for name in names:
        name = '.' * context_length + name.lower() + '.'
        for i in range(len(name) - context_length):
            context = name[i:i+context_length]
            target = name[i+context_length]
            X.append([char_to_idx[c] for c in context])
            Y.append(char_to_idx[target])
    return torch.tensor(X), torch.tensor(Y)

# Build and train
X, Y = build_dataset(names, context_length)
X, Y = X.to(device), Y.to(device)

model = CharLM(vocab_size, embed_dim, hidden_dim, context_length).to(device)
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()

# Quick training (5 epochs)
print("Training model...")
for epoch in range(10):
    # Random mini-batches
    idx = torch.randperm(len(X))[:5000]
    X_batch, Y_batch = X[idx], Y[idx]
    
    logits = model(X_batch)
    loss = criterion(logits, Y_batch)
    
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
    
    if (epoch + 1) % 2 == 0:
        print(f"Epoch {epoch+1}: Loss = {loss.item():.4f}")

print("Training complete!")

In [None]:
# SOLVED EXAMPLE: Saving the Model

# Method 1: Save state_dict (RECOMMENDED)
# Only saves the weights, not the architecture

save_path = "char_lm_weights.pt"

torch.save({
    'model_state_dict': model.state_dict(),
    'vocab_size': vocab_size,
    'embed_dim': embed_dim,
    'hidden_dim': hidden_dim,
    'context_length': context_length,
    'chars': chars,
    'char_to_idx': char_to_idx,
    'idx_to_char': idx_to_char,
}, save_path)

print(f"Model saved to {save_path}")
print(f"File size: {os.path.getsize(save_path) / 1024:.1f} KB")

In [None]:
# SOLVED EXAMPLE: Loading the Model

# Load checkpoint
checkpoint = torch.load(save_path, map_location=device)

# Recreate model with saved config
loaded_model = CharLM(
    vocab_size=checkpoint['vocab_size'],
    embed_dim=checkpoint['embed_dim'],
    hidden_dim=checkpoint['hidden_dim'],
    context_length=checkpoint['context_length']
).to(device)

# Load weights
loaded_model.load_state_dict(checkpoint['model_state_dict'])
loaded_model.eval()  # Set to evaluation mode

# Restore vocabulary mappings
chars = checkpoint['chars']
char_to_idx = checkpoint['char_to_idx']
idx_to_char = checkpoint['idx_to_char']

print("Model loaded successfully!")
print(f"Context length: {checkpoint['context_length']}")
print(f"Vocab size: {checkpoint['vocab_size']}")

## Model Checkpoints

For longer training, save periodically:

```python
for epoch in range(100):
    train()
    
    # Save every 10 epochs
    if (epoch + 1) % 10 == 0:
        torch.save({
            'epoch': epoch,
            'model_state_dict': model.state_dict(),
            'optimizer_state_dict': optimizer.state_dict(),
            'loss': loss,
        }, f'checkpoint_epoch_{epoch+1}.pt')
```

This lets you:
- Resume training if it crashes
- Go back to earlier versions if model gets worse

## Question 1.1

Compare the file sizes of:
1. Saving just `model.state_dict()`
2. Saving the entire model with `torch.save(model, ...)`

Which is smaller? Why?

In [None]:
# YOUR CODE HERE

# Save just state_dict
torch.save(model.state_dict(), 'just_weights.pt')

# Save entire model
torch.save(model, 'full_model.pt')

# Compare sizes
# ...

---

# Part 2: Inference & Sampling Strategies

## Temperature Revisited

Temperature controls how "creative" the model is:

```
TEMPERATURE EFFECT:

Original probabilities:  [0.7, 0.2, 0.05, 0.03, 0.02]

Low temp (0.5):         [0.9, 0.08, 0.01, 0.005, 0.005]
                        (Very confident - always picks 'a')

High temp (2.0):        [0.4, 0.25, 0.15, 0.12, 0.08]
                        (More uniform - might pick anything)
```

In [None]:
# SOLVED EXAMPLE: Temperature Visualization

import matplotlib.pyplot as plt

# Fake logits from a model
logits = torch.tensor([3.0, 1.5, 0.5, -0.5, -1.0])

temperatures = [0.3, 0.5, 1.0, 1.5, 2.0]

fig, axes = plt.subplots(1, len(temperatures), figsize=(15, 3))

for ax, temp in zip(axes, temperatures):
    probs = F.softmax(logits / temp, dim=0).numpy()
    ax.bar(range(5), probs, color=['#2E86AB', '#A23B72', '#F18F01', '#C73E1D', '#592941'])
    ax.set_title(f'Temp = {temp}')
    ax.set_ylim(0, 1)
    ax.set_xticks(range(5))
    ax.set_xticklabels(['a', 'b', 'c', 'd', 'e'])

plt.suptitle('Effect of Temperature on Probability Distribution', fontsize=14)
plt.tight_layout()
plt.show()

## Top-K Sampling

Only consider the K most likely tokens.

```
TOP-K SAMPLING (k=3):

Original:   [0.7, 0.15, 0.08, 0.04, 0.03]  (5 options)
After top-k: [0.7, 0.15, 0.08, 0,    0   ]  (only top 3)
Renormalized: [0.75, 0.16, 0.09, 0,   0   ]  (sums to 1)
```

**Why?** Prevents very unlikely tokens from being sampled.

In [None]:
# SOLVED EXAMPLE: Top-K Sampling

def top_k_sampling(logits, k):
    """Keep only top k logits, set rest to -inf."""
    # Get top k values and indices
    top_k_values, top_k_indices = torch.topk(logits, k)
    
    # Create new logits with only top k
    filtered_logits = torch.full_like(logits, float('-inf'))
    filtered_logits.scatter_(0, top_k_indices, top_k_values)
    
    return filtered_logits

# Example
logits = torch.tensor([3.0, 1.5, 0.5, -0.5, -1.0])
chars_example = ['a', 'b', 'c', 'd', 'e']

print("Original probabilities:")
probs = F.softmax(logits, dim=0)
for c, p in zip(chars_example, probs):
    print(f"  '{c}': {p.item():.2%}")

print("\nAfter top-3 filtering:")
filtered = top_k_sampling(logits, k=3)
probs = F.softmax(filtered, dim=0)
for c, p in zip(chars_example, probs):
    print(f"  '{c}': {p.item():.2%}")

## Top-P (Nucleus) Sampling

Keep tokens until cumulative probability reaches P.

```
TOP-P SAMPLING (p=0.9):

Sorted probs:   [0.7, 0.15, 0.08, 0.04, 0.03]
Cumulative:     [0.7, 0.85, 0.93, 0.97, 1.0]
                       ↑
                 Stop here (0.93 > 0.9)
                 
Keep:           [0.7, 0.15, 0.08]  (top 3, sum=0.93)
```

**Why?** Adapts to the distribution - uses more tokens when uncertain.

In [None]:
# SOLVED EXAMPLE: Top-P (Nucleus) Sampling

def top_p_sampling(logits, p):
    """Keep tokens until cumulative probability reaches p."""
    probs = F.softmax(logits, dim=0)
    
    # Sort probabilities
    sorted_probs, sorted_indices = torch.sort(probs, descending=True)
    cumulative_probs = torch.cumsum(sorted_probs, dim=0)
    
    # Find cutoff
    cutoff_idx = (cumulative_probs > p).nonzero()
    if len(cutoff_idx) > 0:
        cutoff_idx = cutoff_idx[0].item() + 1  # Include the token that crosses threshold
    else:
        cutoff_idx = len(probs)
    
    # Keep only tokens before cutoff
    filtered_logits = torch.full_like(logits, float('-inf'))
    filtered_logits.scatter_(0, sorted_indices[:cutoff_idx], logits[sorted_indices[:cutoff_idx]])
    
    return filtered_logits

# Example
logits = torch.tensor([3.0, 1.5, 0.5, -0.5, -1.0])

print("Original probabilities:")
probs = F.softmax(logits, dim=0)
for c, p in zip(chars_example, probs):
    print(f"  '{c}': {p.item():.2%}")

print("\nAfter top-p=0.9 filtering:")
filtered = top_p_sampling(logits, p=0.9)
probs = F.softmax(filtered, dim=0)
for c, p in zip(chars_example, probs):
    print(f"  '{c}': {p.item():.2%}")

In [None]:
# SOLVED EXAMPLE: Complete Generation Function

def generate(model, max_len=20, temperature=1.0, top_k=None, top_p=None):
    """
    Generate a name with various sampling strategies.
    
    Args:
        model: The trained language model
        max_len: Maximum name length
        temperature: Sampling temperature (higher = more random)
        top_k: If set, only sample from top k tokens
        top_p: If set, only sample from tokens with cumulative prob <= p
    """
    model.eval()
    context = [char_to_idx['.']] * context_length
    name = []
    
    with torch.no_grad():
        for _ in range(max_len):
            x = torch.tensor([context]).to(device)
            logits = model(x)[0]  # Get logits for single sample
            
            # Apply temperature
            logits = logits / temperature
            
            # Apply top-k if specified
            if top_k is not None:
                logits = top_k_sampling(logits, top_k)
            
            # Apply top-p if specified
            if top_p is not None:
                logits = top_p_sampling(logits, top_p)
            
            # Sample
            probs = F.softmax(logits, dim=-1)
            next_idx = torch.multinomial(probs, 1).item()
            next_char = idx_to_char[next_idx]
            
            if next_char == '.':
                break
            
            name.append(next_char)
            context = context[1:] + [next_idx]
    
    return ''.join(name).capitalize()

# Test different settings
print("Generation with different settings:")
print("=" * 50)

settings = [
    {'temperature': 0.5, 'top_k': None, 'top_p': None},
    {'temperature': 1.0, 'top_k': 5, 'top_p': None},
    {'temperature': 1.0, 'top_k': None, 'top_p': 0.9},
    {'temperature': 1.2, 'top_k': 10, 'top_p': None},
]

for setting in settings:
    print(f"\nSettings: {setting}")
    names = [generate(loaded_model, **setting) for _ in range(5)]
    print(f"  Generated: {names}")

## Question 2.1

Generate 20 names with each setting:
1. `temperature=0.5, top_k=5`
2. `temperature=1.0, top_p=0.9`
3. `temperature=1.5, top_k=10`

Which produces the most realistic names? The most creative?

In [None]:
# YOUR CODE HERE


---

# Part 3: Building a Web Interface with Gradio

## What is Gradio?

Gradio lets you create web interfaces for ML models with just a few lines of Python!

```
GRADIO WORKFLOW:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Python    │     │   Gradio    │     │   Users     │
│  Function   │ ──► │   Creates   │ ──► │  Access via │
│             │     │   Web UI    │     │   Browser   │
└─────────────┘     └─────────────┘     └─────────────┘
```

In [None]:
import gradio as gr

In [None]:
# SOLVED EXAMPLE: Simple Gradio Interface

def generate_names_gradio(num_names, temperature, use_top_k, top_k_value):
    """
    Generate names with user-specified settings.
    Returns a formatted string of generated names.
    """
    names = []
    for _ in range(int(num_names)):
        if use_top_k:
            name = generate(loaded_model, temperature=temperature, top_k=int(top_k_value))
        else:
            name = generate(loaded_model, temperature=temperature)
        names.append(name)
    
    return "\n".join([f"• {name}" for name in names])

# Test the function
print(generate_names_gradio(5, 0.8, False, 5))

In [None]:
# SOLVED EXAMPLE: Create Gradio Interface

# Define the interface
demo = gr.Interface(
    fn=generate_names_gradio,
    inputs=[
        gr.Slider(1, 20, value=5, step=1, label="Number of Names"),
        gr.Slider(0.1, 2.0, value=0.8, step=0.1, label="Temperature"),
        gr.Checkbox(label="Use Top-K Sampling", value=False),
        gr.Slider(3, 20, value=5, step=1, label="Top-K Value")
    ],
    outputs=gr.Textbox(label="Generated Names", lines=10),
    title="Name Generator",
    description="Generate unique names using a character-level language model!",
    examples=[
        [5, 0.5, False, 5],   # Conservative
        [10, 1.0, True, 8],   # Balanced
        [5, 1.5, False, 5],   # Creative
    ]
)

# Launch (will open in browser or inline)
demo.launch(share=False)

## Enhanced Interface with More Features

In [None]:
# SOLVED EXAMPLE: Advanced Gradio Interface

def generate_with_prefix(prefix, num_names, temperature, top_p):
    """
    Generate names starting with a given prefix.
    """
    prefix = prefix.lower().strip()
    names = []
    
    for _ in range(int(num_names)):
        # Start with the prefix
        if len(prefix) >= context_length:
            context = [char_to_idx.get(c, 0) for c in prefix[-context_length:]]
        else:
            padding = [char_to_idx['.']] * (context_length - len(prefix))
            context = padding + [char_to_idx.get(c, 0) for c in prefix]
        
        name = list(prefix)
        
        loaded_model.eval()
        with torch.no_grad():
            for _ in range(20 - len(prefix)):
                x = torch.tensor([context]).to(device)
                logits = loaded_model(x)[0] / temperature
                
                if top_p < 1.0:
                    logits = top_p_sampling(logits, top_p)
                
                probs = F.softmax(logits, dim=-1)
                next_idx = torch.multinomial(probs, 1).item()
                next_char = idx_to_char[next_idx]
                
                if next_char == '.':
                    break
                
                name.append(next_char)
                context = context[1:] + [next_idx]
        
        names.append(''.join(name).capitalize())
    
    return "\n".join([f"• {name}" for name in names])

# Test
print("Names starting with 'Al':")
print(generate_with_prefix('al', 5, 0.8, 0.95))

In [None]:
# SOLVED EXAMPLE: Full Featured Interface

with gr.Blocks(title="Name Generator Pro") as demo_advanced:
    gr.Markdown(
        """
        # Name Generator Pro
        Generate unique names using a neural language model trained on 32,000+ names!
        """
    )
    
    with gr.Row():
        with gr.Column():
            prefix_input = gr.Textbox(
                label="Starting Letters (optional)",
                placeholder="e.g., 'Al' or 'Jo'",
                value=""
            )
            num_names = gr.Slider(1, 20, value=5, step=1, label="Number of Names")
            temperature = gr.Slider(0.3, 1.5, value=0.8, step=0.1, label="Creativity (Temperature)")
            top_p = gr.Slider(0.5, 1.0, value=0.95, step=0.05, label="Top-P Sampling")
            generate_btn = gr.Button("Generate Names", variant="primary")
        
        with gr.Column():
            output = gr.Textbox(label="Generated Names", lines=12)
    
    # Event handler
    generate_btn.click(
        fn=generate_with_prefix,
        inputs=[prefix_input, num_names, temperature, top_p],
        outputs=output
    )
    
    gr.Markdown(
        """
        ### Tips:
        - **Low temperature (0.3-0.5)**: More common, safe names
        - **High temperature (1.0-1.5)**: More creative, unusual names
        - **Top-P**: Lower values = more focused on likely characters
        """
    )

demo_advanced.launch()

## Question 3.1

Add a new feature to the Gradio interface: a "style" dropdown that lets users choose between:
- "Classic" (temperature=0.5)
- "Balanced" (temperature=0.8)
- "Creative" (temperature=1.2)
- "Experimental" (temperature=1.5)

In [None]:
# YOUR CODE HERE

# Hint: Use gr.Dropdown with choices
# style = gr.Dropdown(
#     choices=["Classic", "Balanced", "Creative", "Experimental"],
#     value="Balanced",
#     label="Style"
# )


---

# Part 4: Deployment Options

## Sharing Your App

### Option 1: Gradio Share Link (Temporary)
```python
demo.launch(share=True)  # Creates temporary public URL
```

### Option 2: Hugging Face Spaces (Permanent, Free)
1. Create account at huggingface.co
2. Create new Space (Gradio SDK)
3. Upload your code and model

### Option 3: Self-hosted
- Deploy on your own server
- Use Docker for portability

In [None]:
# SOLVED EXAMPLE: Create files for Hugging Face Spaces

# app.py content
app_code = '''
import torch
import torch.nn as nn
import torch.nn.functional as F
import gradio as gr

# Model definition
class CharLM(nn.Module):
    def __init__(self, vocab_size, embed_dim, hidden_dim, context_length):
        super().__init__()
        self.context_length = context_length
        self.embedding = nn.Embedding(vocab_size, embed_dim)
        self.fc1 = nn.Linear(context_length * embed_dim, hidden_dim)
        self.fc2 = nn.Linear(hidden_dim, vocab_size)
        
    def forward(self, x):
        emb = self.embedding(x)
        emb = emb.view(emb.size(0), -1)
        h = torch.tanh(self.fc1(emb))
        return self.fc2(h)

# Load model
checkpoint = torch.load("char_lm_weights.pt", map_location="cpu")
model = CharLM(
    vocab_size=checkpoint["vocab_size"],
    embed_dim=checkpoint["embed_dim"],
    hidden_dim=checkpoint["hidden_dim"],
    context_length=checkpoint["context_length"]
)
model.load_state_dict(checkpoint["model_state_dict"])
model.eval()

chars = checkpoint["chars"]
char_to_idx = checkpoint["char_to_idx"]
idx_to_char = checkpoint["idx_to_char"]
context_length = checkpoint["context_length"]

def generate(num_names, temperature):
    names = []
    for _ in range(int(num_names)):
        context = [char_to_idx["."]] * context_length
        name = []
        with torch.no_grad():
            for _ in range(20):
                x = torch.tensor([context])
                logits = model(x)[0] / temperature
                probs = F.softmax(logits, dim=-1)
                next_idx = torch.multinomial(probs, 1).item()
                if idx_to_char[next_idx] == ".":
                    break
                name.append(idx_to_char[next_idx])
                context = context[1:] + [next_idx]
        names.append("".join(name).capitalize())
    return "\\n".join([f"• {n}" for n in names])

demo = gr.Interface(
    fn=generate,
    inputs=[
        gr.Slider(1, 20, value=5, step=1, label="Number of Names"),
        gr.Slider(0.3, 1.5, value=0.8, step=0.1, label="Temperature"),
    ],
    outputs=gr.Textbox(label="Generated Names", lines=10),
    title="Name Generator",
    description="Generate unique names using AI!",
)

demo.launch()
'''

# requirements.txt content
requirements = '''
torch
gradio
'''

# Save files
with open('app.py', 'w') as f:
    f.write(app_code)

with open('requirements.txt', 'w') as f:
    f.write(requirements)

print("Created deployment files:")
print("  - app.py")
print("  - requirements.txt")
print("  - char_lm_weights.pt (already exists)")
print("\nUpload these to Hugging Face Spaces to deploy!")

---

# Challenge Problems

## Challenge 1: Batch Generation

Modify the generation function to generate multiple names in parallel (batched), which is faster than generating one at a time.

In [None]:
# YOUR CODE HERE

def generate_batch(model, num_names, max_len=20, temperature=1.0):
    """
    Generate multiple names in parallel.
    """
    # Hint: Process all names at once using batch dimension
    pass

## Challenge 2: Model Comparison UI

Create a Gradio interface that compares outputs from two different models (e.g., bigram vs neural) side by side.

In [None]:
# YOUR CODE HERE


## Challenge 3: Add Filtering

Add options to filter generated names by:
- Minimum/maximum length
- Must contain certain letters
- Must not contain certain letters

In [None]:
# YOUR CODE HERE


---

# Summary

## What We Learned

| Concept | Description |
|---------|-------------|
| **Model Saving** | `torch.save(model.state_dict(), ...)` |
| **Model Loading** | `model.load_state_dict(torch.load(...))` |
| **Temperature** | Controls randomness in generation |
| **Top-K Sampling** | Only sample from K most likely tokens |
| **Top-P Sampling** | Sample from tokens with cumulative prob <= P |
| **Gradio** | Quick way to build ML web interfaces |

## Key Code Patterns

```python
# Save model
torch.save({
    'model_state_dict': model.state_dict(),
    'config': config,
}, 'model.pt')

# Load model
checkpoint = torch.load('model.pt')
model.load_state_dict(checkpoint['model_state_dict'])

# Gradio interface
demo = gr.Interface(
    fn=my_function,
    inputs=[...],
    outputs=[...]
)
demo.launch()
```

## What's Next?

In **Lab 6**, we'll switch gears to **Computer Vision** and learn about **Object Detection** with YOLO!