# **Fine-Tuning BERT with Hugging Face**
This notebook fine-tunes a pre-trained BERT model on sentiment analysis using the IMDb dataset.
It includes training, debugging, evaluation, and a real-world application.
---
**Enable GPU before running the notebook!**

In [None]:
import torch
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"Using device: {device}")
model.to(device)


In [None]:
!pip install transformers datasets accelerate huggingface_hub

## **1️⃣ Load Dataset**
We use the IMDb dataset for sentiment analysis.

In [None]:

from datasets import load_dataset

# Load IMDb dataset
dataset = load_dataset("imdb")
dataset

## **2️⃣ Tokenization**
We tokenize the text using BERT's tokenizer.

In [None]:

from transformers import AutoTokenizer

# Load BERT tokenizer
tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")

# Tokenization function
def tokenize_function(example):
    return tokenizer(example["text"], padding="max_length", truncation=True)

# Tokenize dataset
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets

## **3️⃣ Prepare Data for Training**
Convert datasets into PyTorch format and use a data collator.

In [None]:

import torch
from transformers import DataCollatorWithPadding

# Remove unnecessary columns
tokenized_datasets = tokenized_datasets.remove_columns(["text"])
tokenized_datasets = tokenized_datasets.rename_column("label", "labels")
tokenized_datasets.set_format("torch")

# Data collator for dynamic padding
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)


## **4️⃣ Load Pre-Trained BERT Model**
We use `bert-base-uncased` for sentiment classification.

In [None]:

from transformers import AutoModelForSequenceClassification

# Load pre-trained BERT model for sentiment analysis (2 classes: positive/negative)
model = AutoModelForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=2)


## **5️⃣ Define Training Arguments**
Set hyperparameters for training.

In [None]:

from transformers import TrainingArguments

training_args = TrainingArguments(
    output_dir="./results",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,
    weight_decay=0.01,
    logging_dir="./logs",
    logging_steps=100,
    load_best_model_at_end=True,
)


## **6️⃣ Train BERT Using Hugging Face Trainer**

In [None]:

from transformers import Trainer

# Define Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    tokenizer=tokenizer,
    data_collator=data_collator,
)

# Train the model
trainer.train()


## **7️⃣ Save the Fine-Tuned Model**

In [None]:

model.save_pretrained("./bert-sentiment-analysis")
tokenizer.save_pretrained("./bert-sentiment-analysis")


## **8️⃣ Debugging Common Issues**

In [None]:

# Example: Reducing batch size to fix memory issues
training_args = TrainingArguments(
    per_device_train_batch_size=4,  # Reduce batch size
    gradient_accumulation_steps=2,  # Accumulate gradients
    num_train_epochs=3,
)


## **9️⃣ Evaluate the Model**

In [None]:

from datasets import load_metric

# Load accuracy metric
metric = load_metric("accuracy")

# Compute Accuracy
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = torch.argmax(torch.tensor(logits), axis=-1)
    return metric.compute(predictions=predictions, references=labels)

trainer.compute_metrics = compute_metrics

# Evaluate Model
results = trainer.evaluate()
print(results)


## **🔟 Real-World Sentiment Analysis**

In [None]:

def predict_sentiment(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding="max_length", max_length=512)
    outputs = model(**inputs)
    prediction = torch.argmax(outputs.logits).item()
    return "Positive" if prediction == 1 else "Negative"

# Example Predictions
print(predict_sentiment("This movie was absolutely fantastic!"))
print(predict_sentiment("I hated this movie, it was the worst."))
