<a href="https://colab.research.google.com/github/pmadhyastha/INM434/blob/main/Large_Language_Models.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

__author__ = "Pranava Madhyastha"

__version__ = "INM434/IN3045 City, University of London, Spring 2025"

In [None]:
!pip uninstall -y wandb  # Explicitly uninstall wandb
import os
os.environ["WANDB_DISABLED"] = "true" # Set environment variable to disable wandb
os.environ["WANDB_MODE"] = "disabled"
!echo "WandB disabled forcefully."

Found existing installation: wandb 0.19.8
Uninstalling wandb-0.19.8:
  Successfully uninstalled wandb-0.19.8
WandB disabled forcefully.


# We will cover the following topics:

# 1. **Loading and Using a Pre-trained LLM:** We'll start by loading the pre-trained GPT-2 model and use it for text generation.
# 2. **Fine-tuning an LLM for Sentiment Analysis:** We will then fine-tune GPT-2 for a specific task - sentiment analysis of movie reviews.
# 3. **Direct Preference Optimization (DPO):** Finally, we'll delve into a simplified Reinforcement Learning from Human Feedback (RLHF) technique called Direct Preference Optimization to align our model with preferences, using a summarization dataset.

# **Before you begin:**

# *   **Runtime Environment:** Make sure you are running this notebook in an environment with **GPU acceleration enabled**. In Google Colab, you can do this by going to "Runtime" -> "Change runtime type" and selecting "A100 GPU" as the hardware accelerator.


In [None]:
!pip install -U fsspec
!pip install -q -U transformers datasets accelerate bitsandbytes peft trl

# Let's start by loading a pre-trained **GPT-2** model and using it for text generation. We will use the `pipeline` from Hugging Face `transformers` library, which simplifies the process of using pre-trained models for various NLP tasks.

# **Model Hub - Hugging Face:**

# Before we load the model, let's briefly talk about the **Hugging Face Hub** ([https://huggingface.co/models](https://huggingface.co/models)). This is a central repository where thousands of pre-trained models, datasets, and other resources are shared by the community. You can find models for various tasks and languages here.

# **Finding GPT-2 Models:** You can find GPT-2 models by searching for "GPT-2" on the Hugging Face Hub. Open AI community hosts official GPT-2 models under the organization "openai-community".

# We will be using the **"distillgpt2"** as an example.

In [None]:
!pip install -U huggingface_hub
from huggingface_hub import login
login()

In [None]:
from transformers import pipeline

model_name = "distilgpt2" # Specify the model name
generator = pipeline('text-generation', model=model_name)

prompt = "Write a short story about a cat who goes on an adventure."
result = generator(prompt, max_length=100, num_return_sequences=2) # Generate 2 different stories, max 100 tokens each

print("Generated texts:")
for output in result:
    print(f"- {output['generated_text']}")

# What does the model generate?

# Examine the generated stories. Are they coherent? Do they follow the prompt?
# GPT-2, being a powerful LLM, should generate reasonably good and creative stories based on the prompt.

# The `pipeline` handles all the complexities behind the scenes, including:
# *   **Downloading the model weights:** It automatically downloads the model weights from the Hugging Face Hub if they are not already cached locally.
# *   **Loading the tokenizer:** It loads the appropriate tokenizer associated with the GPT-2 model.
# *   **Performing text generation:** It uses the model and tokenizer to generate text based on your prompt.



# 2. Fine-tuning GPT-2 for Sentiment Analysis

# Pre-trained LLMs are versatile, but fine-tuning them on a specific dataset can significantly improve their performance on a particular task. Let's fine-tune GPT-2 for **sentiment analysis**. We'll use the **IMDB reviews dataset**, a standard dataset for sentiment classification.


In [None]:
from datasets import load_dataset

# Load the IMDB sentiment analysis dataset -- we have seen this before, haven't we?
dataset = load_dataset("imdb")
print(dataset) # check the dataset please -- does this make sense?

The IMDB dataset is divided into 'train' and 'test' splits. Each example contains:
*   `text`: The movie review text.
*   `label`: The sentiment label (0 for negative, 1 for positive).

Now, we need to tokenize the text data so that it can be processed by the GPT-2 model. We'll use the tokenizer associated with GPT-2

In [None]:
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained(model_name) #
tokenizer.pad_token = tokenizer.eos_token # Important: Set pad token to EOS token for GPT-2 models -- we are trying a hack here!

def tokenize_function(examples):
    return tokenizer(examples["text"], padding="max_length", truncation=True, max_length=128) # Tokenize and pad/truncate sequences

tokenized_datasets = dataset.map(tokenize_function, batched=True) # Apply tokenization to the entire dataset
print(tokenized_datasets) # Inspect the tokenized dataset now

Map:   0%|          | 0/25000 [00:00<?, ? examples/s]

DatasetDict({
    train: Dataset({
        features: ['text', 'label', 'input_ids', 'attention_mask'],
        num_rows: 25000
    })
    test: Dataset({
        features: ['text', 'label', 'input_ids', 'attention_mask'],
        num_rows: 25000
    })
    unsupervised: Dataset({
        features: ['text', 'label', 'input_ids', 'attention_mask'],
        num_rows: 50000
    })
})


In this step, we used the `AutoTokenizer` to load the tokenizer for GPT-2.  `AutoTokenizer` automatically fetches the correct tokenizer configuration from the model repository on Hugging Face Hub.

We then defined a `tokenize_function` that:
 *   Takes text examples as input.
 *   Tokenizes the text using the GPT-2 tokenizer.
 *   Applies `padding="max_length"` to pad sequences to a maximum length (128 in this case).
 *   Applies `truncation=True` to truncate sequences longer than the maximum length.

The `dataset.map()` function applies this `tokenize_function` to the entire IMDB dataset in batches, resulting in `tokenized_datasets`.

Now, we will set up the training pipeline using Hugging Face `Trainer` to fine-tune GPT-2 for sentiment classification.


In [None]:
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer

# please have a look at AutoModelForSequenceClassification -- this is a specialised module for prediction over a few classes!

# Load distilgpt2 for sequence classification with 2 output labels (positive/negative)
model = AutoModelForSequenceClassification.from_pretrained(model_name, num_labels=2)
model.config.pad_token_id = model.config.eos_token_id

training_args = TrainingArguments(
    output_dir="./distilgpt2-sentiment-model",
    learning_rate=2e-5,
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    num_train_epochs=2,
    weight_decay=0.01,
    evaluation_strategy="epoch",
    save_strategy="epoch",
    logging_steps=100,
    load_best_model_at_end=True,
    metric_for_best_model="eval_loss",
    push_to_hub=False,
    report_to=None,  # Disable WandB logging (again, for good measure)
)

trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_datasets["train"],
    eval_dataset=tokenized_datasets["test"],
    tokenizer=tokenizer,
    # We can add metrics computation here if needed, but for simplicity we'll focus on loss
)

trainer.train() # Start the fine-tuning process

Some weights of GPT2ForSequenceClassification were not initialized from the model checkpoint at distilgpt2 and are newly initialized: ['score.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).
  trainer = Trainer(


Epoch,Training Loss,Validation Loss


KeyboardInterrupt: 

The `trainer.train()` command starts the fine-tuning process. This will take some time depending on your GPU and the number of epochs. You will see training logs printed during the process, including loss values and evaluation results at the end of each epoch.

# **Important Note:** Training large models like GPT-2 can be computationally intensive. If you are running into memory issues (crashes or out-of-memory errors), try reducing the `per_device_train_batch_size`, `per_device_eval_batch_size`, or `max_length` in the `tokenize_function`. You can also reduce `num_train_epochs` for faster experimentation, but keep in mind that fewer epochs might lead to less optimal performance.

After training is complete, the best model (based on `eval_loss`) will be loaded. Let's evaluate the fine-tuned model.

In [None]:
evaluation_results = trainer.evaluate()
print(evaluation_results)

{'eval_loss': 0.3653855323791504, 'eval_runtime': 29.1474, 'eval_samples_per_second': 857.71, 'eval_steps_per_second': 107.214, 'epoch': 2.0}


The `trainer.evaluate()` function calculates the loss on the evaluation dataset (the 'test' split of IMDB).  The `eval_loss` value indicates how well the model is performing on unseen data. A lower `eval_loss` generally means better performance.

In [None]:
def predict_sentiment(sentence):
    inputs = tokenizer(sentence, return_tensors="pt", truncation=True, padding=True).to(model.device) # Tokenize and move to device
    outputs = model(**inputs)                                                                    # Get model outputs
    predictions = outputs.logits.argmax(dim=-1)                                                     # Get predicted class (0 or 1)
    sentiment = "positive" if predictions.item() == 1 else "negative"
    return f"Sentiment: {sentiment}"

sentences = [
    "This movie was amazing! The acting was superb and the plot was captivating.",
    "I absolutely hated this film. It was boring and predictable.",
    "It was a mediocre movie, nothing special but not terrible."
]

print("\nSentiment Predictions from Fine-tuned GPT-2:")
for sentence in sentences:
    print(f"- Sentence: '{sentence}' - {predict_sentiment(sentence)}")

Observe the sentiment predictions for the example sentences. Does the fine-tuned GPT-2 model correctly classify the sentiment?  Try experimenting with your own sentences to see how well it performs!

You have now successfully fine-tuned GPT-2 for sentiment analysis! This demonstrates the power of fine-tuning pre-trained LLMs for specific downstream tasks.


In [None]:
from trl import DPOTrainer
from transformers import AutoModelForCausalLM

Now, let's explore a more advanced technique called **Direct Preference Optimization (DPO)**. DPO is a simplified approach to **Reinforcement Learning from Human Feedback (RLHF)**. RLHF aims to align LLMs with human preferences, making them generate outputs that humans find more desirable.

Traditional RLHF methods can be complex, involving training a reward model and using algorithms like Proximal Policy Optimization (PPO). DPO simplifies this by directly optimizing the language model based on pairwise preference data, without explicitly training a separate reward model.

In DPO, we use datasets where for a given prompt, there are two model outputs: a "chosen" output and a "rejected" output. The "chosen" output is preferred over the "rejected" one. DPO learns to increase the likelihood of the chosen outputs and decrease the likelihood of rejected outputs relative to the chosen ones.

We'll use the `trl` (Transformer Reinforcement Learning) library from Hugging Face to demonstrate DPO. We'll use the **"CarperAI/openai_summarize_comparisons"** dataset, which contains summarization preferences.


In [None]:
dpo_dataset = load_dataset("CarperAI/openai_summarize_comparisons", split="train")
print(dpo_dataset[0]) # Inspect an example from the DPO dataset

README.md:   0%|          | 0.00/462 [00:00<?, ?B/s]

(…)-00000-of-00001-3cbd295cedeecf91.parquet:   0%|          | 0.00/20.7M [00:00<?, ?B/s]

(…)-00000-of-00001-0845e2eec675b16a.parquet:   0%|          | 0.00/20.4M [00:00<?, ?B/s]

(…)-00000-of-00001-b647616a2be5f333.parquet:   0%|          | 0.00/7.12M [00:00<?, ?B/s]

(…)-00000-of-00001-2655c5b3621b6116.parquet:   0%|          | 0.00/13.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/92534 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/83629 [00:00<?, ? examples/s]

Generating valid1 split:   0%|          | 0/33082 [00:00<?, ? examples/s]

Generating valid2 split:   0%|          | 0/50715 [00:00<?, ? examples/s]

{'prompt': 'SUBREDDIT: r/relationships\nTITLE: To admit or not to admit snooping...\nPOST: I [25M] have snooped in the past and copped up to it to my gf [25F] of 6 years.  We talked it through.  It had been a year or two since the last time.  That\'s an issue I\'m working on.\n\nNow she has a new close male work friend.  I won\'t go into details, but she hides things from me with him and does other things to make me a bit suspicious.  So...I snooped again, and this time, all texts from her new friend have been deleted and I saw a google search for "how to get over a guy" near some searches of his name and views of his Facebook profile.\n\nI asked her about this guy, not mentioning the snooping, and she denied any feelings, we talked for a long time about our relationship and she insisted that she only loves me and I mean the world to her, and that she really wants to work towards getting this relationship back out of the rut we\'ve been in (we both work all the time and barely see each

 The "CarperAI/openai_summarize_comparisons" dataset contains examples with:
 *   `prompt`: The original text to be summarized.
 *   `chosen`: A preferred summary of the prompt.
 *   `rejected`: A less preferred summary of the prompt.

We need to format and tokenize this dataset for DPO training. We'll use the GPT-2 tokenizer again and prepare the dataset in the format expected by `DPOTrainer`

In [None]:
def format_dpo_dataset(examples):
    return {
        "prompt": examples["prompt"],
        "chosen": examples["chosen"],
        "rejected": examples["rejected"],
    }

formatted_dpo_dataset = dpo_dataset.map(format_dpo_dataset)

def tokenize_dpo_dataset(examples):
    tokenized_prompts = tokenizer(examples["prompt"], truncation=True, padding="longest")
    tokenized_chosen = tokenizer(examples["chosen"], truncation=True, padding="longest")
    tokenized_rejected = tokenizer(examples["rejected"], truncation=True, padding="longest")
    return {
        "prompt_input_ids": tokenized_prompts["input_ids"],
        "prompt_attention_mask": tokenized_prompts["attention_mask"],
        "chosen_input_ids": tokenized_chosen["input_ids"],
        "chosen_attention_mask": tokenized_chosen["attention_mask"],
        "rejected_input_ids": tokenized_rejected["input_ids"],
        "rejected_attention_mask": tokenized_rejected["attention_mask"],
    }

tokenized_dpo_dataset = formatted_dpo_dataset.map(tokenize_dpo_dataset, batched=True)
print(tokenized_dpo_dataset[0]) # Inspect a tokenized DPO example

Map:   0%|          | 0/92534 [00:00<?, ? examples/s]

Map:   0%|          | 0/92534 [00:00<?, ? examples/s]

{'prompt': 'SUBREDDIT: r/relationships\nTITLE: To admit or not to admit snooping...\nPOST: I [25M] have snooped in the past and copped up to it to my gf [25F] of 6 years.  We talked it through.  It had been a year or two since the last time.  That\'s an issue I\'m working on.\n\nNow she has a new close male work friend.  I won\'t go into details, but she hides things from me with him and does other things to make me a bit suspicious.  So...I snooped again, and this time, all texts from her new friend have been deleted and I saw a google search for "how to get over a guy" near some searches of his name and views of his Facebook profile.\n\nI asked her about this guy, not mentioning the snooping, and she denied any feelings, we talked for a long time about our relationship and she insisted that she only loves me and I mean the world to her, and that she really wants to work towards getting this relationship back out of the rut we\'ve been in (we both work all the time and barely see each

Now we will set up the `DPOTrainer` from the `trl` library. We'll use GPT-2 as our base model for DPO fine-tuning. For DPO, we need a causal language model (for text generation).


In [None]:
model_name_dpo = "distilgpt2"
ref_model_name_dpo = "distilgpt2" # Using the same model as reference for simplicity

# Load the causal language model (for text generation)
model_dpo = AutoModelForCausalLM.from_pretrained(model_name_dpo)
ref_model_dpo = AutoModelForCausalLM.from_pretrained(ref_model_name_dpo) # Reference model for DPO

dpo_training_args = TrainingArguments(
    output_dir="./dpo-distilgpt2-model",
    learning_rate=1e-5, # DPO often benefits from a smaller learning rate
    per_device_train_batch_size=2, # Reduce batch size if memory issues occur
    num_train_epochs=1, # For demonstration, we'll use a small number of epochs
    logging_steps=100,
    save_strategy="epoch",
    evaluation_strategy="no", # Evaluation in DPO is less straightforward, skipping for simplicity
)

dpo_trainer = DPOTrainer(
    model=model_dpo,
    ref_model=ref_model_dpo, # Reference model
    args=dpo_training_args,
    train_dataset=tokenized_dpo_dataset,
    tokenizer=tokenizer,
    beta=0.1, # Beta parameter controls the strength of preference modeling
)

dpo_trainer.train() # Start DPO training

Using the `WANDB_DISABLED` environment variable is deprecated and will be removed in v5. Use the --report_to flag to control the integrations used for logging result (for instance --report_to none).


NameError: name 'tokenized_dpo_dataset' is not defined

After training, let's generate text using the DPO fine-tuned model and see if it produces summaries that are more aligned with the preferences it learned.

In [None]:
def generate_dpo_text(prompt_text):
    inputs = tokenizer(prompt_text, return_tensors="pt").to(model_dpo.device)
    outputs = model_dpo.generate(**inputs, max_length=100, num_return_sequences=2)
    generated_texts = [tokenizer.decode(output, skip_special_tokens=True) for output in outputs]
    return generated_texts

prompt_for_dpo = "Summarize the following article about the benefits of exercise:" # A summarization prompt

print("\nGenerated Summaries from DPO Fine-tuned GPT-2:")
summaries = generate_dpo_text(prompt_for_dpo)
for summary in summaries:
    print(f"- {summary}")

TODO: To see the effect of DPO, you could compare these summaries with summaries generated by the original, pre-DPO fine-tuned GPT-2 model (from section 1) for the same prompt. Ideally, the DPO fine-tuned model's summaries should be more aligned with human-like preferences for summarization, as learned from the comparison dataset.

* Try different prompts for text generation with the base GPT-2 model and the DPO fine-tuned model. Observe how the outputs change.
* Try fine-tuning other models on other datasets for different tasks, such as text classification, question answering, or text summarization. You can find many datasets on the Hugging Face Hub ([https://huggingface.co/datasets](https://huggingface.co/datasets)).
* Experiment with other other LLM variants or different LLMs available on the Hugging Face Hub. Compare their performance and characteristics.
* Modify the training hyperparameters (learning rate, batch size, number of epochs, weight decay, beta in DPO) and observe how they affect the fine-tuning process and model performance.
* Investigate Parameter-Efficient Fine-tuning techniques like LoRA (Low-Rank Adaptation) to fine-tune large models more efficiently, especially when resources are limited. Hugging Face `peft` library (already installed) provides tools for this.
* Explore more advanced RLHF techniques and the theoretical underpinnings of preference learning and alignment.