# 🤗 Hugging Face LLM Course Practical Exercise

## 📝 Exercise: Train and Share Your First Transformer

### 🎯 Goal
By completing this exercise, you will apply the concepts from Chapters 1–4 of the Hugging Face LLM course:

- Use a pretrained Transformer model for text classification
- Fine-tune the model on a small dataset
- Evaluate its performance
- Share the model publicly on the Hugging Face Hub

---

### 📚 Chapters Covered
- **Chapter 1-2**: Using pretrained models with pipelines
- **Chapter 3**: Fine-tuning models on custom datasets
- **Chapter 4**: Sharing models on the Hugging Face Hub

---

### 🚀 Let's get started!

## 📦 Install Required Libraries

First, let's install all the necessary libraries:

In [None]:
# Install required libraries
!pip install transformers datasets torch evaluate accelerate huggingface_hub ipywidgets

## 🔹 1. Load a Pretrained Model (Chapter 1–2)

Let's start by using the Hugging Face `pipeline` API to load a sentiment analysis model.

In [1]:
from transformers import pipeline
import torch

# Load the sentiment analysis pipeline
sentiment_pipeline = pipeline(
    "sentiment-analysis",
    model="distilbert-base-uncased-finetuned-sst-2-english"
)

print("✅ Sentiment analysis pipeline loaded successfully!")
print(f"Model: {sentiment_pipeline.model.name_or_path}")

Device set to use mps:0


✅ Sentiment analysis pipeline loaded successfully!
Model: distilbert-base-uncased-finetuned-sst-2-english


### Test the Pretrained Model

Now let's test our model with some example sentences:

In [2]:
# Test sentences
test_sentences = [
    "I love this movie, it's absolutely fantastic!",
    "This is the worst experience I've ever had.",
    "The food was okay, nothing special.",
    "I'm feeling great about this project!",
    "This product is terrible, don't buy it."
]

print("🔍 Testing the pretrained model:")
print("=" * 50)

for i, sentence in enumerate(test_sentences, 1):
    result = sentiment_pipeline(sentence)
    label = result[0]['label']
    score = result[0]['score']
    
    print(f"{i}. Sentence: {sentence}")
    print(f"   Prediction: {label} (Confidence: {score:.3f})")
    print()

🔍 Testing the pretrained model:
1. Sentence: I love this movie, it's absolutely fantastic!
   Prediction: POSITIVE (Confidence: 1.000)

2. Sentence: This is the worst experience I've ever had.
   Prediction: NEGATIVE (Confidence: 1.000)

3. Sentence: The food was okay, nothing special.
   Prediction: NEGATIVE (Confidence: 0.983)

4. Sentence: I'm feeling great about this project!
   Prediction: POSITIVE (Confidence: 1.000)

5. Sentence: This product is terrible, don't buy it.
   Prediction: NEGATIVE (Confidence: 1.000)



## 🔹 2. Fine-tune the Model on a Dataset (Chapter 3)

Now let's fine-tune our model on the IMDb dataset. We'll use a **balanced random subset** to ensure proper training.

In [3]:
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification, TrainingArguments, Trainer
import evaluate
import numpy as np
import random

# Load IMDb dataset
print("📚 Loading IMDb dataset...")
dataset = load_dataset("imdb")

# Create a BALANCED random subset for proper training
print("🎲 Creating balanced random subset...")
random.seed(42)  # For reproducibility

# Get random indices for balanced sampling
all_indices = list(range(len(dataset["train"])))
random.shuffle(all_indices)

# Take 500 negative and 500 positive samples
neg_indices = [i for i in all_indices if dataset["train"][i]["label"] == 0][:500]
pos_indices = [i for i in all_indices if dataset["train"][i]["label"] == 1][:500]

balanced_indices = neg_indices + pos_indices
random.shuffle(balanced_indices)  # Shuffle again for training

train_dataset = dataset["train"].select(balanced_indices)
test_dataset = dataset["test"].select(range(200))

print(f"✅ Training set: {len(train_dataset)} samples")
print(f"✅ Test set: {len(test_dataset)} samples")
print(f"✅ Dataset features: {train_dataset.features}")

# Verify balance
train_labels = train_dataset["label"]
unique, counts = np.unique(train_labels, return_counts=True)
print(f"✅ Training set balance:")
for label, count in zip(unique, counts):
    percentage = (count / len(train_labels)) * 100
    print(f"   - Label {label}: {count} samples ({percentage:.1f}%)")

📚 Loading IMDb dataset...


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


🎲 Creating balanced random subset...
✅ Training set: 1000 samples
✅ Test set: 200 samples
✅ Dataset features: {'text': Value('string'), 'label': ClassLabel(names=['neg', 'pos'])}
✅ Training set balance:
   - Label 0: 500 samples (50.0%)
   - Label 1: 500 samples (50.0%)


In [4]:
# Load tokenizer and model
print("🔧 Loading tokenizer and model...")

model_name = "distilbert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)

# FIXED: Use correct label mapping for IMDb dataset
model = AutoModelForSequenceClassification.from_pretrained(
    model_name, 
    num_labels=2,
    id2label={0: "neg", 1: "pos"},  # IMDb uses 'neg' and 'pos'
    label2id={"neg": 0, "pos": 1}   # IMDb uses 'neg' and 'pos'
)

print(f"✅ Tokenizer: {tokenizer.name_or_path}")
print(f"✅ Model: {model.name_or_path}")
print(f"✅ Number of labels: {model.num_labels}")
print(f"✅ Label mapping: {model.config.id2label}")

🔧 Loading tokenizer and model...


Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight', 'pre_classifier.bias', 'pre_classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.


✅ Tokenizer: distilbert-base-uncased
✅ Model: distilbert-base-uncased
✅ Number of labels: 2
✅ Label mapping: {0: 'neg', 1: 'pos'}


In [5]:
# Tokenization function
def tokenize_function(examples):
    return tokenizer(
        examples["text"], 
        padding="max_length", 
        truncation=True, 
        max_length=512
    )

# Tokenize datasets
print("🔤 Tokenizing datasets...")
tokenized_train = train_dataset.map(tokenize_function, batched=True)
tokenized_test = test_dataset.map(tokenize_function, batched=True)

print("✅ Tokenization completed!")

🔤 Tokenizing datasets...


Map:   0%|          | 0/1000 [00:00<?, ? examples/s]

✅ Tokenization completed!


In [6]:
# Load accuracy metric
accuracy_metric = evaluate.load("accuracy")

# Define compute_metrics function
def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return accuracy_metric.compute(predictions=predictions, references=labels)

# Training arguments
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=3,  # Increased epochs for better learning
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    push_to_hub=False,  # We'll do this manually later
    report_to=None,  # Disable wandb logging
)

# Initialize trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_test,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics,
)

print("🚀 Starting training...")
print(f"Training arguments: {training_args}")

  trainer = Trainer(


🚀 Starting training...
Training arguments: TrainingArguments(
_n_gpu=1,
accelerator_config={'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None, 'use_configured_state': False},
adafactor=False,
adam_beta1=0.9,
adam_beta2=0.999,
adam_epsilon=1e-08,
auto_find_batch_size=False,
average_tokens_across_devices=False,
batch_eval_metrics=False,
bf16=False,
bf16_full_eval=False,
data_seed=None,
dataloader_drop_last=False,
dataloader_num_workers=0,
dataloader_persistent_workers=False,
dataloader_pin_memory=True,
dataloader_prefetch_factor=None,
ddp_backend=None,
ddp_broadcast_buffers=None,
ddp_bucket_cap_mb=None,
ddp_find_unused_parameters=None,
ddp_timeout=1800,
debug=[],
deepspeed=None,
disable_tqdm=False,
dispatch_batches=None,
do_eval=True,
do_predict=False,
do_train=False,
eval_accumulation_steps=None,
eval_delay=0,
eval_do_concat_batches=True,
eval_on_start=False,
eval_steps=None,


In [7]:
# Train the model
print("🔥 Training in progress...")
trainer.train()

print("✅ Training completed!")
print(f"Best model saved at: {trainer.state.best_model_checkpoint}")

🔥 Training in progress...


  0%|          | 0/375 [00:00<?, ?it/s]

  0%|          | 0/25 [00:00<?, ?it/s]

{'eval_loss': 0.34465354681015015, 'eval_accuracy': 0.855, 'eval_runtime': 10.0093, 'eval_samples_per_second': 19.982, 'eval_steps_per_second': 2.498, 'epoch': 1.0}


  0%|          | 0/25 [00:00<?, ?it/s]

{'eval_loss': 0.5784940123558044, 'eval_accuracy': 0.84, 'eval_runtime': 9.9742, 'eval_samples_per_second': 20.052, 'eval_steps_per_second': 2.506, 'epoch': 2.0}


  0%|          | 0/25 [00:00<?, ?it/s]

{'eval_loss': 0.5707761645317078, 'eval_accuracy': 0.845, 'eval_runtime': 9.6925, 'eval_samples_per_second': 20.634, 'eval_steps_per_second': 2.579, 'epoch': 3.0}
{'train_runtime': 524.0186, 'train_samples_per_second': 5.725, 'train_steps_per_second': 0.716, 'train_loss': 0.2763664957682292, 'epoch': 3.0}
✅ Training completed!
Best model saved at: ./results/checkpoint-125


## 🔹 3. Evaluate the Model

Let's evaluate our fine-tuned model and test it with some examples.

In [8]:
# Evaluate the model
print("📊 Evaluating the fine-tuned model...")
eval_results = trainer.evaluate()

print("\n📈 Evaluation Results:")
print("=" * 30)
for key, value in eval_results.items():
    print(f"{key}: {value:.4f}")

📊 Evaluating the fine-tuned model...


  0%|          | 0/25 [00:00<?, ?it/s]


📈 Evaluation Results:
eval_loss: 0.3447
eval_accuracy: 0.8550
eval_runtime: 10.0009
eval_samples_per_second: 19.9980
eval_steps_per_second: 2.5000
epoch: 3.0000


In [9]:
# Test the fine-tuned model with some examples
print("\n🧪 Testing the fine-tuned model:")
print("=" * 40)

# Create a pipeline with our fine-tuned model
fine_tuned_pipeline = pipeline(
    "sentiment-analysis",
    model=trainer.model,
    tokenizer=tokenizer
)

# Test sentences
test_sentences = [
    "This movie was absolutely incredible!",
    "I hated every minute of this film.",
    "The acting was superb and the story was engaging.",
    "This is the worst movie I've ever seen.",
    "I really enjoyed this film, highly recommended!"
]

for i, sentence in enumerate(test_sentences, 1):
    result = fine_tuned_pipeline(sentence)
    label = result[0]['label']
    score = result[0]['score']
    
    print(f"{i}. Sentence: {sentence}")
    print(f"   Prediction: {label} (Confidence: {score:.3f})")
    print()

Device set to use mps:0



🧪 Testing the fine-tuned model:
1. Sentence: This movie was absolutely incredible!
   Prediction: pos (Confidence: 0.916)

2. Sentence: I hated every minute of this film.
   Prediction: neg (Confidence: 0.837)

3. Sentence: The acting was superb and the story was engaging.
   Prediction: pos (Confidence: 0.934)

4. Sentence: This is the worst movie I've ever seen.
   Prediction: neg (Confidence: 0.873)

5. Sentence: I really enjoyed this film, highly recommended!
   Prediction: pos (Confidence: 0.943)



## 🔹 4. Share the Model on the Hub (Chapter 4)

Now let's share our fine-tuned model on the Hugging Face Hub!

In [None]:
# First, let's login to Hugging Face
print("🔐 Logging in to Hugging Face Hub...")
print("Please run 'huggingface-cli login' in your terminal first if you haven't already.")
print("Or use the notebook_login() function below:")

from huggingface_hub import notebook_login

# Uncomment the line below to login from the notebook
notebook_login()

In [10]:
# Save the model and tokenizer locally first
print("💾 Saving model and tokenizer locally...")

output_dir = "./my-fine-tuned-sentiment-model"
trainer.save_model(output_dir)
tokenizer.save_pretrained(output_dir)

print(f"✅ Model and tokenizer saved to: {output_dir}")

💾 Saving model and tokenizer locally...
✅ Model and tokenizer saved to: ./my-fine-tuned-sentiment-model


In [11]:
# Push to Hub
print("🚀 Pushing model to Hugging Face Hub...")

# You can customize this repository name
repo_name = "sebastiancaraballo/imdb-sentiment-finetuned"  # Change this!

print(f"Repository name: {repo_name}")

# Push the model
model.push_to_hub(repo_name)
tokenizer.push_to_hub(repo_name)

print(f"✅ Model successfully pushed to: https://huggingface.co/{repo_name}")

🚀 Pushing model to Hugging Face Hub...
Repository name: sebastiancaraballo/imdb-sentiment-finetuned


model.safetensors:   0%|          | 0.00/268M [00:00<?, ?B/s]

No files have been modified since last commit. Skipping to prevent empty commit.


✅ Model successfully pushed to: https://huggingface.co/sebastiancaraballo/imdb-sentiment-finetuned


## 🎉 Congratulations!

You've successfully completed the Hugging Face LLM Course practical exercise!

### 📋 What You've Accomplished:

✅ **Chapter 1-2**: Loaded and used a pretrained sentiment analysis model
✅ **Chapter 3**: Fine-tuned the model on the IMDb dataset with proper balanced sampling
✅ **Chapter 4**: Prepared to share your model on the Hugging Face Hub

### 🚀 Next Steps:

1. **Set your repository name** in the cell above
2. **Run the push_to_hub commands** to share your model
3. **Visit your model page** on huggingface.co
4. **Test your model** using the inference widget

### 🔗 Useful Links:
- [Hugging Face Hub](https://huggingface.co/)
- [Model Hub Documentation](https://huggingface.co/docs/hub/index)
- [Transformers Documentation](https://huggingface.co/docs/transformers/)

### 💡 Tips for Future Projects:
- **Always verify dataset balance** before training
- **Use random sampling** instead of taking the first N samples
- **Ensure label mapping matches** your dataset
- **Monitor training metrics** to detect issues early
- **Experiment with different model architectures**
- **Try different datasets and tasks**
- **Use larger training sets for better performance**
- **Explore hyperparameter tuning**
- **Consider using model evaluation metrics beyond accuracy**

### 🐛 Common Issues and Solutions:

1. **All predictions are the same class**: Check dataset balance and label mapping
2. **100% accuracy**: Usually indicates overfitting or dataset issues
3. **Poor performance**: Try more training data or different hyperparameters
4. **Label mismatch**: Ensure your model's label mapping matches the dataset

Happy modeling! 🎯