# Transformer News Classifier: Prediction Notebook

This notebook demonstrates how to use the trained Transformer model to make predictions on new headlines. It uses the functions from the custom `transformer_news` Python package.

**Note:** Before running, ensure trained model (`transformer_news_classifier_best.pth`) and vocabulary (`newsclassification_vocab.pth`) in the `models/` directory are present. You also must have installed the package in editable mode:
```bash
pip install -e .
```

### 1. Imports

We import the `predict` module from our package, which contains all the necessary logic for loading artifacts and running inference.

In [6]:
# Standard Libraries
import logging
import sys
import torch

# Import the prediction functions from our package
from transformer_news import predict, config

### 2. Setup Logging

We'll configure logging to see the status messages from our prediction module.

In [7]:
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(levelname)s - %(message)s",
    # Force logging to be sent to the notebook's stdout
    handlers=[
        logging.StreamHandler(sys.stdout)
    ]
)

### 3. High-Level Prediction (Using `main`)

The simplest way to get a prediction is to use the `predict.main()` function. This orchestrates the entire process: loading the vocab, loading the model, and running the prediction.

In [8]:
news_headline = "Will AI make language dubbing easy for film and TV?"
predict.main(news_headline)

2025-08-19 13:47:37,949 - ▶ Prediction run started...
2025-08-19 13:47:37,951 -    News Article Headline: Will AI make language dubbing easy for film and TV?
2025-08-19 13:47:38,014 - ✅ Vocab loaded from /home/hyd_in_zrh/projects/personal_projects/transformer-from-scratch-news-classifier/transformer-from-scratch-news-classifier/models/newsclassification_vocab.pth | Size: 11571
2025-08-19 13:47:39,192 - ✅ Model weights loaded from /home/hyd_in_zrh/projects/personal_projects/transformer-from-scratch-news-classifier/transformer-from-scratch-news-classifier/models/transformer_news_classifier_best.pth.
2025-08-19 13:47:39,200 - 
Article: 'Will AI make language dubbing easy for film and TV?'
2025-08-19 13:47:39,201 - Predicted Category: Sci/Tech



### 4. Interactive Prediction (Using Individual Functions)

For more interactive use cases, like in a notebook, it can be useful to call the individual functions from the `predict` module. This allows us to load the model and vocabulary once, and then run predictions on multiple headlines efficiently.

#### Step 4a: Load the Artifacts

First, we load the vocabulary and the trained model. Notice how we use the helper functions from our `predict` module.

In [9]:
# Load vocab and get its size
vocab, vocab_size, _ = predict.get_vocab()

# Load the inference model, passing the vocab size to it
inference_model = predict.get_model(vocab_size)

2025-08-19 13:47:39,261 - ✅ Vocab loaded from /home/hyd_in_zrh/projects/personal_projects/transformer-from-scratch-news-classifier/transformer-from-scratch-news-classifier/models/newsclassification_vocab.pth | Size: 11571
2025-08-19 13:47:39,422 - ✅ Model weights loaded from /home/hyd_in_zrh/projects/personal_projects/transformer-from-scratch-news-classifier/transformer-from-scratch-news-classifier/models/transformer_news_classifier_best.pth.


#### Step 4b: Run Predictions on a Batch of Headlines

Now that the model and vocab are loaded into memory, we can create a list of headlines and loop through them, calling the core `predict.exec_predict` function for each one.

In [10]:
headlines_to_test = [
    "The US economy is a puzzle but the pieces aren't fitting together",
    "Premier League: Chelsea held by Crystal Palace & Forest beat Brentford - reaction",
    "Putin agreed to security guarantees for Ukraine being part of potential peace deal, US envoy says",
    "Breakthrough in quantum computing promises to revolutionize data encryption"
]

# Get the tokenizer from the predict module
tokenizer = predict.get_tokenizer("basic_english")

for headline in headlines_to_test:
    predicted_category = predict.exec_predict(
        text=headline,
        model=inference_model,
        vocab=vocab,
        tokenizer=tokenizer,
        device=config.DEVICE
    )
    print(f"Article  : '{headline}'")
    print(f"Predicted: {predicted_category}\n")

Article  : 'The US economy is a puzzle but the pieces aren't fitting together'
Predicted: Sci/Tech

Article  : 'Premier League: Chelsea held by Crystal Palace & Forest beat Brentford - reaction'
Predicted: Sci/Tech

Article  : 'Putin agreed to security guarantees for Ukraine being part of potential peace deal, US envoy says'
Predicted: Sci/Tech

Article  : 'Breakthrough in quantum computing promises to revolutionize data encryption'
Predicted: Sci/Tech

