# Transformer News Classifier: Training Notebook

This notebook demonstrates the training process for the Transformer-based news classifier. It leverages the custom `transformer_news` Python package, which contains all the necessary functions for data processing, model definition, and training.

**Note:** Before running, ensure you have installed the package in editable mode from the project's root directory:
```bash
pip install -e .
```

## Acknowledgments:
The core concepts and architectural patterns implemented here were learned from and inspired by several excellent educational resources, including Jay Alammar's "The Illustrated Transformer", Andrej Karpathy's "Let's build GPT", and Josh Starmer's course on DeepLearning.AI.

### 1. Imports

First, we import all necessary libraries. Most importantly, we import the `train` module from our own `transformer_news` package, which contains the main training orchestration logic.

In [1]:
# Standard Libraries
import logging
import sys

# Import the main training function from our package
from transformer_news import train

2025-08-19 13:29:16,612 - Starting training run...


### 2. Setup Logging

We'll configure the logging system to print messages directly to the notebook's output. This allows us to see the progress from the training functions inside our package.

In [2]:
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
    # Force logging to be sent to the notebook's stdout
    handlers=[
        logging.StreamHandler(sys.stdout)
    ]
)

### 3. Set Training Parameters

Here, we define the parameters for this specific training run. This is the only place you need to make changes to experiment with different settings.

In [3]:
# Set to False to use the full AG_NEWS dataset (takes longer)
USE_SAMPLE_DATASET = True

# Number of training epochs
NUM_EPOCHS = 5

# The size of the sample to use if USE_SAMPLE_DATASET is True
SAMPLE_SIZE = 2000

### 4. Run Training

With everything set up, we can now start the training process with a single call to `train.main()`. This function, located in `src/transformer_news/train.py`, will handle everything: loading data, creating the model, and executing the training loop.

In [4]:
train.main(
    full_dataset=(not USE_SAMPLE_DATASET),
    num_epochs=NUM_EPOCHS,
    sample_size=SAMPLE_SIZE
)

2025-08-19 13:29:16,685 - Loading SAMPLE of 2000...
2025-08-19 13:29:16,868 - ✅ Vocab saved to /home/hyd_in_zrh/projects/personal_projects/transformer-from-scratch-news-classifier/transformer-from-scratch-news-classifier/models/newsclassification_vocab.pth
2025-08-19 13:29:16,869 - ✅ Train and Test DataLoaders created successfully.
2025-08-19 13:29:16,869 - --- Inspecting the first batch from the DataLoader ---
2025-08-19 13:29:16,875 - Batch token shape: torch.Size([64, 119])
2025-08-19 13:29:16,876 - Batch label shape: torch.Size([64])
2025-08-19 13:29:16,876 - Unique labels in this batch: tensor([0, 1, 2, 3])
2025-08-19 13:29:16,877 - Labels range: 0–3
2025-08-19 13:29:16,877 - ✅ Labels are correct.
2025-08-19 13:29:18,465 - ✅ Model, optimizer, and loss function defined.
2025-08-19 13:29:22,481 - Epoch 01/5 | Train Loss: 1.3428 | Test Accuracy: 0.3020
2025-08-19 13:29:22,565 -   -> New best model saved with accuracy: 0.3020
2025-08-19 13:29:24,767 - Epoch 02/5 | Train Loss: 1.2080 |

--- 
**Training complete!** The best model has been saved to the `models/` directory, and the vocabulary is stored as `newsclassification_vocab.pth`.