Problem Statement:
News headlines cover diverse topics, and manually categorizing them is time-consuming and error-prone. An automated system is needed to accurately classify news headlines into predefined categories.

Objective:
To fine-tune a BERT-based transformer model to classify news headlines from the AG News dataset into topic categories and evaluate its performance using accuracy and F1-score.

In [27]:
#Step 1: Setup & Install Dependencies
!pip install datasets transformers torch scikit-learn gradio accelerate

Defaulting to user installation because normal site-packages is not writeable
Collecting gradio
  Using cached gradio-6.5.1-py3-none-any.whl.metadata (16 kB)
Collecting accelerate
  Downloading accelerate-1.12.0-py3-none-any.whl.metadata (19 kB)
Collecting aiofiles<25.0,>=22.0 (from gradio)
  Using cached aiofiles-24.1.0-py3-none-any.whl.metadata (10 kB)
Collecting brotli>=1.1.0 (from gradio)
  Downloading brotli-1.2.0-cp312-cp312-win_amd64.whl.metadata (6.3 kB)
Collecting fastapi<1.0,>=0.115.2 (from gradio)
  Using cached fastapi-0.128.0-py3-none-any.whl.metadata (30 kB)
Collecting ffmpy (from gradio)
  Using cached ffmpy-1.0.0-py3-none-any.whl.metadata (3.0 kB)
Collecting gradio-client==2.0.3 (from gradio)
  Using cached gradio_client-2.0.3-py3-none-any.whl.metadata (7.1 kB)
Collecting groovy~=0.1 (from gradio)
  Using cached groovy-0.1.2-py3-none-any.whl.metadata (6.1 kB)
Collecting orjson~=3.0 (from gradio)
  Downloading orjson-3.11.7-cp312-cp312-win_amd64.whl.metadata (43 kB)
Coll



In [1]:
import torch

print("CUDA available:", torch.cuda.is_available())
print("GPU:", torch.cuda.get_device_name(0))


CUDA available: True
GPU: NVIDIA RTX A4000


In [3]:
import datasets
import transformers
import torch
import sklearn
import gradio

print("All packages installed correctly!")


  from .autonotebook import tqdm as notebook_tqdm


All packages installed correctly!


Step 2: Load the AG News Dataset

AG News has 4 classes:

0 → World

1 → Sports

2 → Business

3 → Sci/Tech

In [5]:
from datasets import load_dataset

dataset = load_dataset("ag_news")

print(dataset)


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`
Generating train split: 100%|███████████████████████████████████████| 120000/120000 [00:00<00:00, 906353.96 examples/s]
Generating test split: 100%|███████████████████████████████████████████| 7600/7600 [00:00<00:00, 1574780.67 examples/s]


DatasetDict({
    train: Dataset({
        features: ['text', 'label'],
        num_rows: 120000
    })
    test: Dataset({
        features: ['text', 'label'],
        num_rows: 7600
    })
})


Step 3: Tokenization & Preprocessing

In [7]:
from transformers import BertTokenizer

tokenizer = BertTokenizer.from_pretrained("bert-base-uncased")

def tokenize_function(examples):
    return tokenizer(
        examples["text"],
        padding="max_length",
        truncation=True,
        max_length=128
    )

tokenized_datasets = dataset.map(tokenize_function, batched=True)

tokenized_datasets = tokenized_datasets.remove_columns(["text"])
tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
tokenized_datasets.set_format("torch")


To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
Map: 100%|█████████████████████████████████████████████████████████████| 120000/120000 [02:02<00:00, 979.49 examples/s]
Map: 100%|████████████████████████████████████████████████████████████████| 7600/7600 [00:07<00:00, 1004.35 examples/s]


Step 4: Load BERT for Sequence Classification

In [9]:
from transformers import BertForSequenceClassification

model = BertForSequenceClassification.from_pretrained(
    "bert-base-uncased",
    num_labels=4
)


Some weights of BertForSequenceClassification were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


In [None]:
Step 5: Training Configuration

In [13]:
from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    eval_strategy="epoch",
    save_strategy="epoch",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=3,
    weight_decay=0.01,
    fp16=True,  # GPU acceleration
    load_best_model_at_end=True,
    metric_for_best_model="f1"
)


Step 6: Evaluation Metrics (Accuracy & F1)

In [15]:
import numpy as np
from sklearn.metrics import accuracy_score, f1_score

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=1)
    
    return {
        "accuracy": accuracy_score(labels, predictions),
        "f1": f1_score(labels, predictions, average="weighted")
    }


In [None]:
Step 7: Fine-Tune the Model

In [17]:
from transformers import Trainer

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)

trainer.train()


  trainer = Trainer(


Epoch,Training Loss,Validation Loss,Accuracy,F1
1,0.1983,0.175362,0.945263,0.945298
2,0.1305,0.178245,0.950395,0.950457
3,0.0809,0.220323,0.949342,0.949375


TrainOutput(global_step=22500, training_loss=0.14848362189398873, metrics={'train_runtime': 1883.5102, 'train_samples_per_second': 191.132, 'train_steps_per_second': 11.946, 'total_flos': 2.368042020864e+16, 'train_loss': 0.14848362189398873, 'epoch': 3.0})

Step 8: Final Evaluation

In [19]:
results = trainer.evaluate()
print(results)


{'eval_loss': 0.17824463546276093, 'eval_accuracy': 0.9503947368421053, 'eval_f1': 0.9504574953167206, 'eval_runtime': 9.0741, 'eval_samples_per_second': 837.552, 'eval_steps_per_second': 52.347, 'epoch': 3.0}


Step 9: Save the Model

In [21]:
model.save_pretrained("news_classifier")
tokenizer.save_pretrained("news_classifier")

('news_classifier\\tokenizer_config.json',
 'news_classifier\\special_tokens_map.json',
 'news_classifier\\vocab.txt',
 'news_classifier\\added_tokens.json')

Step 10 : Deploy with Gradio (Live Demo)

In [23]:
import gradio as gr
import torch
from transformers import BertTokenizer, BertForSequenceClassification

tokenizer = BertTokenizer.from_pretrained("news_classifier")
model = BertForSequenceClassification.from_pretrained("news_classifier")
model.eval()

labels = ["World", "Sports", "Business", "Sci/Tech"]

def classify_news(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits, dim=1).item()
    return labels[prediction]

interface = gr.Interface(
    fn=classify_news,
    inputs=gr.Textbox(lines=2, placeholder="Enter a news headline..."),
    outputs="text",
    title="News Topic Classifier (BERT)",
    description="Classifies news headlines into World, Sports, Business, or Sci/Tech."
)

interface.launch()


* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.




Final Summary / Insights:

The BERT-based news classification model achieved strong performance with an accuracy and F1-score of approximately 95%, demonstrating its effectiveness in capturing semantic context within news headlines. The results highlight the strength of transformer-based transfer learning for text classification tasks, even with minimal task-specific feature engineering. Overall, the model generalizes well across topic categories and is suitable for real-time deployment.