---

## **Section 1: Setup and Data Loading**

### üéØ Objective
Import necessary libraries and load the Tiny Shakespeare dataset.

### üìù Your Tasks

1. Import the following libraries:
   - `torch` and `torch.nn`
   - `torch.utils.data` (Dataset, DataLoader)
   - `numpy`, `matplotlib.pyplot`
   - `urllib.request` for downloading data
   - `time` for tracking training time

2. Check if GPU is available and set the device

3. Download the Tiny Shakespeare dataset from:
   - URL: `'https://raw.githubusercontent.com/karpathy/char-rnn/master/data/tinyshakespeare/input.txt'`
   - Save it as `'tinyshakespeare.txt'`

4. Load and print:
   - Total number of characters
   - First 500 characters

### üí° Hints
- Use `urllib.request.urlretrieve(url, filename)` to download
- Use `open(filename, 'r', encoding='utf-8')` to read the file
- Check device with: `'cuda' if torch.cuda.is_available() else 'cpu'`

In [None]:
# TODO: Import all necessary libraries
# Your code here

In [None]:
# TODO: Check device (GPU or CPU)
# Your code here

In [None]:
# TODO: Download the dataset
# Your code here

In [None]:
# TODO: Load the text and print statistics
# Your code here

---

## **Section 2: Character-Level Tokenization**

### üéØ Objective
Build vocabulary mappings for characters instead of words.

### üìù Your Tasks

1. **Get all unique characters** in the text:
   - Use `sorted(set(text))` to get unique characters
   - Create a list called `chars`

2. **Build character mappings**:
   - `char2idx`: Dictionary mapping each character to a unique index
   - `idx2char`: Dictionary mapping each index back to its character
   - Remember: dictionaries use `{key: value}` syntax

3. **Calculate vocabulary size**:
   - Store in variable `vocab_size`

4. **Convert text to indices**:
   - Create a list `data` containing the index for each character
   - Use list comprehension: `[char2idx[ch] for ch in text]`

5. **Print information**:
   - Vocabulary size
   - First 20 characters and their indices
   - Sample of the character-to-index mapping

### üí° Hints
- For `char2idx`: `{ch: i for i, ch in enumerate(chars)}`
- For `idx2char`: `{i: ch for i, ch in enumerate(chars)}`
- No special tokens needed for character-level! (Unlike word-level with `<PAD>`, `<UNK>`, etc.)

### ü§î Think About It
How many unique characters do you expect? Compare with word-level vocabulary (thousands of words)!

In [None]:
# TODO: Get all unique characters
# Your code here

In [None]:
# TODO: Build char2idx and idx2char mappings
# Your code here

In [None]:
# TODO: Convert entire text to indices
# Your code here

In [None]:
# TODO: Print vocabulary information
# Your code here

---

## **Section 3: Creating the Dataset**

### üéØ Objective
Build a PyTorch Dataset that creates character sequence pairs for training.

### üìù Your Tasks

Create a class `CharDataset` that inherits from `torch.utils.data.Dataset`:

1. **`__init__` method**:
   - Parameters: `data` (list of character indices), `seq_length` (how many characters to use as input)
   - Store both parameters as instance variables
   - Calculate `num_sequences` = total possible sequences you can create

2. **`__len__` method**:
   - Return the total number of sequences

3. **`__getitem__` method**:
   - Input: `idx` (which sequence to get)
   - Extract a window of `seq_length + 1` characters starting at position `idx`
   - Split into:
     - `input_seq`: First `seq_length` characters (as torch.long tensor)
     - `target`: The LAST character only (as torch.long tensor)
   - Return both

### üí° Hints
- For slicing: `data[idx:idx + seq_length + 1]`
- Split: `input_seq = sequence[:-1]`, `target = sequence[-1]`
- Convert to tensor: `torch.tensor(..., dtype=torch.long)`

### ü§î Think About It
**Example:** If text is "HELLO" and seq_length=3:
- Sample 0: Input="HEL", Target="L"
- Sample 1: Input="ELL", Target="O"

After creating the class, instantiate it with `seq_length=100`.

In [None]:
# TODO: Create CharDataset class
# Your code here

In [None]:
# TODO: Create dataset instance with seq_length=100
# Print dataset size and show a few examples
# Your code here

---

## **Section 4: Creating the DataLoader**

### üéØ Objective
Set up a DataLoader to batch and shuffle the data efficiently.

### üìù Your Tasks

1. Create a `DataLoader` with:
   - Your dataset from Section 3
   - `batch_size = 64`
   - `shuffle = True`
   - `drop_last = True` (drops incomplete batches)

2. Test the DataLoader:
   - Get one batch using `iter()` and `next()`
   - Print the shapes of inputs and targets
   - Convert a few examples back to characters to verify correctness

### üí° Hints
- `dataloader = DataLoader(dataset, batch_size=..., shuffle=..., drop_last=...)`
- Get batch: `batch_iter = iter(dataloader)`, `inputs, targets = next(batch_iter)`
- Convert back: `''.join([idx2char[idx.item()] for idx in input_seq])`

### ü§î Expected Shapes
- Inputs: `(batch_size, seq_length)` ‚Üí `(64, 100)`
- Targets: `(batch_size,)` ‚Üí `(64,)`

In [None]:
# TODO: Create DataLoader
# Your code here

In [None]:
# TODO: Test the DataLoader - get one batch and examine it
# Your code here

---

## **Section 5: Building the RNN Model**

### üéØ Objective
Create a character-level RNN for next character prediction.

### üìù Your Tasks

Build a class `CharRNN` that inherits from `nn.Module`:

1. **`__init__` method** - Initialize layers:
   - Parameters: `vocab_size`, `embedding_dim`, `hidden_dim`, `num_layers`
   - Create:
     - `self.embedding`: `nn.Embedding(vocab_size, embedding_dim)`
     - `self.lstm`: `nn.LSTM(embedding_dim, hidden_dim, num_layers, batch_first=True, dropout=0.2)`
     - `self.dropout`: `nn.Dropout(0.2)`
     - `self.fc`: `nn.Linear(hidden_dim, vocab_size)`
   - Store `hidden_dim` and `num_layers` as instance variables

2. **`forward` method**:
   - Input: `x` (input sequence), `hidden` (previous hidden state, optional)
   - Steps:
     1. Embed the input: `embedded = self.embedding(x)`
     2. Pass through LSTM: `lstm_out, hidden = self.lstm(embedded, hidden)`
     3. Take ONLY the last time step: `last_output = lstm_out[:, -1, :]`
     4. Apply dropout: `last_output = self.dropout(last_output)`
     5. Map to vocabulary: `output = self.fc(last_output)`
   - Return: `output, hidden`

3. **`init_hidden` method**:
   - Input: `batch_size`
   - Create two zero tensors for LSTM:
     - Shape: `(num_layers, batch_size, hidden_dim)`
     - Return as tuple: `(h0, c0)`
   - Move to device: `.to(device)`

### üí° Hints
- LSTM returns `(output, (h_n, c_n))` unlike GRU which returns `(output, h_n)`
- Use `batch_first=True` so input shape is `(batch, seq, features)`
- The last time step: `[:, -1, :]` means "all batches, last time step, all features"

### üé® Recommended Hyperparameters
- `embedding_dim = 128`
- `hidden_dim = 256`
- `num_layers = 2`

After creating the class, instantiate the model and print its architecture.

In [None]:
# TODO: Create CharRNN class
# Your code here

In [None]:
# TODO: Instantiate the model and print its structure
# Your code here

---

## **Section 6: Training Setup**

### üéØ Objective
Set up the loss function and optimizer for training.

### üìù Your Tasks

1. **Define the loss function**:
   - Use `nn.CrossEntropyLoss()`
   - This is perfect for classification (predicting which character is next)

2. **Define the optimizer**:
   - Use `torch.optim.Adam(model.parameters(), lr=0.002)`
   - Adam is a good default optimizer

3. **Print the setup**:
   - Confirm loss function and optimizer
   - Print learning rate
   - Count total trainable parameters

### üí° Hints
- Count parameters: `sum(p.numel() for p in model.parameters() if p.requires_grad)`
- Format with commas: `f"{count:,}"`

In [None]:
# TODO: Define loss function and optimizer
# Your code here

In [None]:
# TODO: Print training setup information
# Your code here

---

## **Section 7: The Training Loop**

### üéØ Objective
Train your character-level RNN model.

### üìù Your Tasks

Create a training function that:

1. **Function signature**: `train_model(model, dataloader, criterion, optimizer, num_epochs)`

2. **Training loop structure**:
   ```
   for each epoch:
       for each batch in dataloader:
           1. Move inputs and targets to device
           2. Initialize hidden state
           3. Zero gradients: optimizer.zero_grad()
           4. Forward pass: outputs, hidden = model(inputs, hidden)
           5. Calculate loss: loss = criterion(outputs, targets)
           6. Backward pass: loss.backward()
           7. Clip gradients: torch.nn.utils.clip_grad_norm_(model.parameters(), max_norm=5.0)
           8. Update weights: optimizer.step()
           9. Track loss
   ```

3. **Tracking and printing**:
   - Store average loss per epoch in a list
   - Print progress every 200 batches
   - Print epoch summary (avg loss, time taken)
   - Return the loss history

4. **Train for 5-10 epochs**

### üí° Hints
- Set model to training mode: `model.train()`
- Track time: `start_time = time.time()`, then `elapsed = time.time() - start_time`
- Average loss: `total_loss / len(dataloader)`
- Gradient clipping prevents exploding gradients (important for RNNs!)

### ‚ö†Ô∏è Important
- Training might take 5-15 minutes depending on your hardware
- Loss should decrease over epochs
- Don't worry if it's slow - character-level models process more sequences!

In [None]:
# TODO: Create the training function
# Your code here

In [None]:
# TODO: Train the model
# Your code here

---

## **Section 8: Visualizing Training Progress**

### üéØ Objective
Plot the training loss to see how your model improved.

### üìù Your Tasks

1. Create a line plot showing:
   - X-axis: Epoch number
   - Y-axis: Average loss
   - Add markers, grid, labels, and title

2. Print:
   - Initial loss (first epoch)
   - Final loss (last epoch)
   - Total improvement

### üí° Hints
- Use `plt.plot(epochs, losses, marker='o')`
- Add grid: `plt.grid(True, alpha=0.3)`
- Show plot: `plt.show()`

### ü§î What to Look For
- Loss should decrease over time
- Curve should start to flatten (model converging)
- If loss is still decreasing steeply, you could train longer!

In [None]:
# TODO: Plot training loss
# Your code here

---

## **Section 9: Text Generation Function**

### üéØ Objective
Create a function that generates text character by character.

### üìù Your Tasks

Create a function `generate_text(model, char2idx, idx2char, seed_text, length, temperature)`:

1. **Preparation**:
   - Set model to eval mode: `model.eval()`
   - Convert seed_text to indices
   - Initialize hidden state

2. **Generation loop** (repeat `length` times):
   ```
   with torch.no_grad():
       for _ in range(length):
           1. Convert current sequence to tensor
           2. Forward pass through model
           3. Get logits for last character
           4. Apply temperature: logits = logits / temperature
           5. Convert to probabilities: probs = torch.softmax(logits, dim=0)
           6. Sample next character: next_idx = torch.multinomial(probs, 1).item()
           7. Append to sequence
   ```

3. **Return**:
   - Convert indices back to characters
   - Join into string
   - Return the generated text

### üí° Hints
- Temperature controls randomness:
  - `temperature < 1.0`: More predictable (conservative)
  - `temperature = 1.0`: Balanced
  - `temperature > 1.0`: More random (creative)
- Use `torch.no_grad()` to disable gradient computation (faster)
- Convert character: `char2idx.get(ch, 0)` (use 0 if character not found)

### üé® Function Parameters
- `model`: Your trained model
- `char2idx`: Character to index mapping
- `idx2char`: Index to character mapping
- `seed_text`: Starting text (e.g., "The king ")
- `length`: How many characters to generate
- `temperature`: Controls randomness (default 1.0)

In [None]:
# TODO: Create text generation function
# Your code here

---

## **Section 10: Testing Your Model**

### üéØ Objective
Generate text with different seeds and temperatures to evaluate your model.

### üìù Your Tasks

1. **Test with different seed texts**:
   - "The king "
   - "To be or not to be"
   - "What is thy name"
   - Your own creative seed!

2. **Test with different temperatures**:
   - 0.5 (conservative)
   - 1.0 (balanced)
   - 1.5 (creative)

3. **Generate**:
   - 200-500 characters per sample
   - Print the results clearly formatted

4. **Analyze**:
   - Does it spell words correctly?
   - Does it use proper grammar?
   - Does it sound like Shakespeare?
   - How does temperature affect quality?

### üí° Hints
- Use a loop to test multiple combinations
- Format output nicely:
  ```python
  print(f"\nSeed: '{seed}'")
  print(f"Temperature: {temp}")
  print("-" * 80)
  print(generated_text)
  ```

### ü§î What to Observe
- **Low temperature**: Repetitive but more correct
- **High temperature**: Creative but might make mistakes
- Character-level models learn spelling naturally!
- Compare with word-level model - what differences do you notice?

In [None]:
# TODO: Test text generation with different seeds and temperatures
# Your code here

---

## **Section 11: Saving Your Model**

### üéØ Objective
Save your trained model so you can use it later without retraining.

### üìù Your Tasks

1. Create a checkpoint dictionary containing:
   - `'model_state_dict'`: Model weights (`model.state_dict()`)
   - `'optimizer_state_dict'`: Optimizer state
   - `'vocab_size'`: Size of character vocabulary
   - `'embedding_dim'`: Embedding dimension used
   - `'hidden_dim'`: Hidden dimension used
   - `'num_layers'`: Number of LSTM layers
   - `'char2idx'`: Character to index mapping
   - `'idx2char'`: Index to character mapping
   - `'loss_history'`: Training loss history

2. Save using:
   - `torch.save(checkpoint, 'char_rnn_model.pth')`

3. Print confirmation:
   - File saved location
   - What's included in the checkpoint

### üí° Hints
- This saves everything needed to recreate and use your model
- You can load it later with: `checkpoint = torch.load('char_rnn_model.pth')`

In [None]:
# TODO: Save the model checkpoint
# Your code here