In this notebook we will train an LLM on our dataset. There will be three sections in which we will give zero examples (zero-shot), one (one-shot) and few (few-shot) to see the different performance.

In [1]:
from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


In [2]:
import os

os.chdir(f'/content/drive/MyDrive/Polimi/NLP/project')
os.getcwd()

'/content/drive/MyDrive/Polimi/NLP/project'

# Data upload and import libreries

In [3]:
!pip install -q "transformers" "peft" "accelerate" "bitsandbytes" "trl" "safetensors" "tiktoken"

In [4]:
!pip install -q -U langchain
!pip install -q langchain-community langchain-core

In [5]:
!pip install -q --no-deps peft

In [6]:
!pip install -q lightning

In [7]:
import pandas as pd
import torch
from transformers import AutoTokenizer, Trainer, TrainingArguments, pipeline
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
from sentence_transformers import SentenceTransformer, util
from peft import LoraConfig
from peft import prepare_model_for_kbit_training, get_peft_model
from datasets import Dataset
from torch.utils.data import DataLoader
import lightning as L
from torch.optim import AdamW
import torch.nn.functional as F
from huggingface_hub import login

In [8]:
# Run this on colab
train_set = pd.read_parquet("tuning_data.parquet")
test_set = pd.read_parquet("test_data.parquet")

In [None]:
# Run this on kaggle
train_set=pd.read_parquet("/kaggle/input/nlp-resources/tuning_data.parquet")
test_set=pd.read_parquet("/kaggle/input/nlp-resources/test_data.parquet")

In [9]:
login(token="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")

# Load of the model

Here we load the model and his tokenizer.

In [10]:
model_id = "NousResearch/Llama-2-7b-chat-hf"

The model is too big, so we will make use of a 4-bit version of the model.

In [11]:
# BitsAndBytesConfig int-4 config
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

In [12]:
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    quantization_config=quantization_config
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

In [13]:
model

LlamaForCausalLM(
  (model): LlamaModel(
    (embed_tokens): Embedding(32000, 4096, padding_idx=0)
    (layers): ModuleList(
      (0-31): 32 x LlamaDecoderLayer(
        (self_attn): LlamaAttention(
          (q_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (k_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (v_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
          (o_proj): Linear4bit(in_features=4096, out_features=4096, bias=False)
        )
        (mlp): LlamaMLP(
          (gate_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
          (up_proj): Linear4bit(in_features=4096, out_features=11008, bias=False)
          (down_proj): Linear4bit(in_features=11008, out_features=4096, bias=False)
          (act_fn): SiLU()
        )
        (input_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
        (post_attention_layernorm): LlamaRMSNorm((4096,), eps=1e-05)
      )
    )
    (norm): Lla

In [13]:
tokenizer = AutoTokenizer.from_pretrained(model_id)

## Settings

Here we give the task to the LLM and we build the function to generate the answer.

In [15]:
generation_args = {
    "max_new_tokens": 3000,
    "return_full_text": False,
    "temperature": 0.1,
    "do_sample": True,
}

In [16]:
qa_pipeline = pipeline("text-generation", model=model, tokenizer=tokenizer)

Device set to use cuda:0


In [17]:
def chatbot(prompt):
    output = qa_pipeline(prompt, **generation_args)
    return output[0]['generated_text']

We will use a SentenceTransformer to give a score to the answers generated by our model.

In [18]:
# Load the semantic embedding model once
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.5k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/612 [00:00<?, ?B/s]

Xet Storage is enabled for this repo, but the 'hf_xet' package is not installed. Falling back to regular HTTP download. For better performance, install the package with: `pip install huggingface_hub[hf_xet]` or `pip install hf_xet`


model.safetensors:   0%|          | 0.00/90.9M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/350 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

In [19]:
def compute_similarity_score(candidate_answer, target_answer):
    """
    Given a single candidate answer and a target answer,
    return the semantic similarity score between them.
    """
    # Encode the target and the candidate
    target_embedding = embedding_model.encode(target_answer, convert_to_tensor=True)
    candidate_embedding = embedding_model.encode(candidate_answer, convert_to_tensor=True)

    # Compute cosine similarity
    similarity = util.cos_sim(target_embedding, candidate_embedding).item()

    return similarity

# Zero-shot performance

We begin with the zero-shot LLM.

## Test of the model

Now we can see how the model performs the given task of question-answering without any example. We will print the answer to the first 5 samples of the test set.

In [None]:
n = 5

for i, row in test_set.head(n).reset_index(drop=True).iterrows():
    question = row['question']
    context = row['context']
    target_answer = row['answer']

    prompt = (
        "[INST] <<SYS>>\nYou are a helpful assistant.\n<</SYS>>\n\n"
        f"Question: {question}\nContext: {context}\nAnswer: [/INST]"
    )

    # Generate the answer
    result = chatbot(prompt)

    # Compute similarity between generated and target answer
    score = compute_similarity_score(result, target_answer)

    # Output
    print(f"Example {i+1}:")
    print("Question:", question)
    print("\nTarget answer:", target_answer)
    print("\nGenerated answer:", result)
    print("\nSimilarity score:", score)
    print("-" * 40)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Example 1:
Question: What are the criticisms of the movie "Pitch Perfect" according to the review?

Target answer: The review criticizes "Pitch Perfect" for being more annoying than entertaining, encouraging sexual promiscuity, having ludicrous setup scenes, featuring irritating characters, and containing only a few worth-watching production numbers. It also criticizes the performances of Brittany Snow and Anna Kendrick, and the direction of Jason Moore. The review also finds the premise of the movie, which is based on a book by Mickey Rapkin, to be flimsy.

Generated answer: According to the review, the criticisms of the movie "Pitch Perfect" include:

1. Lack of entertainment value: The reviewer found the movie to be more annoying than entertaining, with few production numbers worth watching.
2. Promotion of sexual promiscuity: The reviewer argues that the movie's portrayal of sexual promiscuity among teenagers is irresponsible and deplorable, particularly since it is aimed at a youn

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Example 2:
Question: Who was John Sinclair and what was his contribution to California in the 1800s?

Target answer: John Sinclair was among the earliest Anglo-Saxon adventurers to California. He emigrated to this country fifteen years ago and was a member of the pioneer company of Mr. Graham. He obtained a grant of land from the Mexican Government, raised his lone cot in the then unsettled valley of the Sacramento, and turned the first furrow in that wild region. He was one of the first to adapt to the purposes of agriculture the rich waste surrounding him, and encounter the dangers of hardships, and endure the privations attendant upon its settlement. During his residence in California, he had enjoyed offices of trust.

Generated answer: John Sinclair was an early Anglo-Saxon settler in California during the 1800s. He emigrated to California in 1834, fifteen years before his death, and was one of the first pioneers to settle in the Sacramento Valley. Sinclair was granted land by the 

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Example 3:
Question: What were the two flagship programs completed by EcoKnights in 2015?

Target answer: The two flagship programs completed by EcoKnights in 2015 were the 8th Kuala Lumpur Eco Film Festival (KLEFF) and the 7th Anugerah Hijau (Green Awards) competition.

Generated answer: According to the November 2015 issue of the EcoKnights newsletter, the two flagship programs completed by EcoKnights in 2015 were:

1. The 8th Kuala Lumpur Eco Film Festival (KLEFF)
2. The 7th Anugerah Hijau (Green Awards) competition.

Similarity score: 0.9532694816589355
----------------------------------------


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Example 4:
Question: What was the author's main concern about the Girl Guides' decision to replace the promise to God with a promise to be 'true to myself'?

Target answer: The author's main concern about the Girl Guides' decision to replace the promise to God with a promise to be 'true to myself' was that it orients the girls' fulfilment towards themselves and not toward others.

Generated answer: The author's main concern about the Girl Guides' decision to replace the promise to God with a promise to be "true to myself" is that it orients the girls' fulfillment towards themselves rather than towards others. The author believes that this approach undermines the weight of the promise for those who believe in a god, and suggests that the new promise is too vague and nonsensical. The author also argues that the maintenance of the Queen in the promise somehow undermines the attempt to make it a promise for all, and that the change to the promise is not inclusive but rather hides behind pu

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Example 5:
Question: What are the five ways to attract the best intern talent?

Target answer: The five ways to attract the best intern talent are: start early, get personal, be straightforward, add value, and cultivate a program.

Generated answer: Great, I'd be happy to help you with that! Here are five ways to attract the best intern talent:

1. Start Early: Advertise your internship up to three months before your target start date. This will give you a larger pool of candidates to choose from and also allow you to attract more proactive students.
2. Get Personal: Make the application and interview processes as friendly and personal as possible. This will help you gauge the personalities of your intern candidates and find the best fit for your organization.
3. Be Straightforward: Be honest and concise in your job descriptions and throughout the interview period. Don't sensationalize your internship, and be clear about the responsibilities and what the intern will take away from the 

# One-shot performance

In this section we will give to the model a single example to make the inference.

In [21]:
EOS_TOKEN = tokenizer.eos_token

# Prompt template matching LLaMA-2 chat format
llama_prompt = """<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>

Context: {context}

Question: {question}
[/INST]
{answer}"""

def format_prompt(context, question, answer=None):
    # If answer is None, prepare prompt for generation (no answer)
    if answer is None:
        answer = ""
    return llama_prompt.format(context=context, question=question, answer=answer) + EOS_TOKEN

In [None]:
one_shot_example = train_set.iloc[0]
one_shot_prompt = format_prompt(
    one_shot_example["context"],
    one_shot_example["question"],
    one_shot_example["answer"]
)

In [None]:
n = 5

for i, row in test_set.head(n).reset_index(drop=True).iterrows():
    question = row['question']
    context = row['context']
    target_answer = row['answer']

    # Format current test prompt WITHOUT answer
    test_prompt = format_prompt(context, question, answer=None)

    # Combine one-shot example + current test prompt
    full_prompt = one_shot_prompt + "\n" + test_prompt

    # Generate answer using your chatbot function adapted to receive full_prompt
    result = chatbot(full_prompt)  # Pass full prompt, not just context+question

    # Compute similarity
    score = compute_similarity_score(result, target_answer)

    print(f"Example {i+1}:")
    print("Question:", question)
    print("\nTarget answer:", target_answer)
    print("\nGenerated answer:", result)
    print("\nSimilarity score:", score)
    print("-" * 40)

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Example 1:
Question: What are the criticisms of the movie "Pitch Perfect" according to the review?

Target answer: The review criticizes "Pitch Perfect" for being more annoying than entertaining, encouraging sexual promiscuity, having ludicrous setup scenes, featuring irritating characters, and containing only a few worth-watching production numbers. It also criticizes the performances of Brittany Snow and Anna Kendrick, and the direction of Jason Moore. The review also finds the premise of the movie, which is based on a book by Mickey Rapkin, to be flimsy.

Generated answer: 
The criticisms of the movie "Pitch Perfect" according to the review are:

1. The movie is too violent and cold-bloodedly violent.
2. The film encourages sexual promiscuity by making light of it.
3. The setup scenes are ludicrous and feature a campus where singing teams hang around the campus and break out into song with the least possible provocation.
4. Some of the characters are ludicrous, including an Asian gi

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Example 2:
Question: Who was John Sinclair and what was his contribution to California in the 1800s?

Target answer: John Sinclair was among the earliest Anglo-Saxon adventurers to California. He emigrated to this country fifteen years ago and was a member of the pioneer company of Mr. Graham. He obtained a grant of land from the Mexican Government, raised his lone cot in the then unsettled valley of the Sacramento, and turned the first furrow in that wild region. He was one of the first to adapt to the purposes of agriculture the rich waste surrounding him, and encounter the dangers of hardships, and endure the privations attendant upon its settlement. During his residence in California, he had enjoyed offices of trust.

Generated answer: John Sinclair was an early Anglo-Saxon settler in California during the 1800s. He was one of the first to settle in the then unsettled valley of the Sacramento River and was among the first to adapt the surrounding land to agriculture. He was known f

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Example 3:
Question: What were the two flagship programs completed by EcoKnights in 2015?

Target answer: The two flagship programs completed by EcoKnights in 2015 were the 8th Kuala Lumpur Eco Film Festival (KLEFF) and the 7th Anugerah Hijau (Green Awards) competition.

Generated answer: The two flagship programs completed by EcoKnights in 2015 were the 8th Kuala Lumpur Eco Film Festival (KLEFF) and the 7th Anugerah Hijau (Green Awards) competition.</s>

Similarity score: 0.9949971437454224
----------------------------------------


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Example 4:
Question: What was the author's main concern about the Girl Guides' decision to replace the promise to God with a promise to be 'true to myself'?

Target answer: The author's main concern about the Girl Guides' decision to replace the promise to God with a promise to be 'true to myself' was that it orients the girls' fulfilment towards themselves and not toward others.

Generated answer: The author's main concern about the Girl Guides' decision to replace the promise to God with a promise to be "true to myself" is that it orients the girls' fulfillment towards themselves and not towards others. The author believes that this approach undermines the weight of the promise for those who base their commitment to do good and serve on a humanist premise, and that it may make some feel more included but lessens the promise for those whose belief in a god is the orienting perspective of their life.

Similarity score: 0.9425569772720337
----------------------------------------


Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Batches:   0%|          | 0/1 [00:00<?, ?it/s]

Example 5:
Question: What are the five ways to attract the best intern talent?

Target answer: The five ways to attract the best intern talent are: start early, get personal, be straightforward, add value, and cultivate a program.

Generated answer: The five ways to attract the best intern talent are:

1. Start early: Advertise the internship up to three months before the target start date to attract more total candidates and find the best fit for your organization.
2. Get personal: Make the internship application and interview processes as friendly and personal as possible to gauge the personalities of the intern candidates and find the best fit.
3. Be straightforward: Be concise and honest in job descriptions and throughout the interview period to attract the right candidates.
4. Add value: Offer a reasonable pay rate, in-depth training and education, and make the program appealing to attract more interested and energetic candidates.
5. Cultivate a program: Create an entire internshi

# Few-shot performance

Finally we give few-shot to the model and see how much it improves with respect to zero and one-shot.

We build the prompt containing few-shot

In [19]:
def build_few_shot_prompt(train_set, format_prompt, n_shots=2, target_index=None):
    if target_index is None:
        target_index = n_shots  # usa l'esempio successivo

    few_shot_examples = train_set.iloc[:n_shots]
    target_example = train_set.iloc[target_index]

    few_shot_prompts = [
        format_prompt(row["context"], row["question"], row["answer"])
        for _, row in few_shot_examples.iterrows()
    ]

    target_prompt = format_prompt(
        target_example["context"],
        target_example["question"],
        answer=None
    )

    return "\n\n".join(few_shot_prompts) + "\n\n" + target_prompt

In [22]:
few_shot_prompt = build_few_shot_prompt(train_set, format_prompt)

In [23]:
n = 5

for i, row in test_set.head(n).reset_index(drop=True).iterrows():
    question = row['question']
    context = row['context']
    target_answer = row['answer']

    # Format current test prompt WITHOUT answer
    test_prompt = format_prompt(context, question, answer=None)

    # Combine few-shot example + current test prompt
    full_prompt = few_shot_prompt + "\n" + test_prompt

    result = chatbot(full_prompt)

    # Compute similarity
    score = compute_similarity_score(result, target_answer)

    print(f"Example {i+1}:")
    print("Question:", question)
    print("\nTarget answer:", target_answer)
    print("\nGenerated answer:", result)
    print("\nSimilarity score:", score)
    print("-" * 40)

This is a friendly reminder - the current text generation call will exceed the model's predefined maximum length (4096). Depending on the model, you may observe exceptions, performance degradation, or nothing at all.


KeyboardInterrupt: 

# Fine tuning

We will use the LoRA adapter due to the fact that the model is very big and we wouldn't be able to train it.

In [14]:
adapter_configs = {
    'target_modules': 'all-linear',
    'lora_alpha': 16,
    'lora_dropout': 0.1,
    'r': 16,
    'bias': 'none',
    'task_type': 'CAUSAL_LM'
}

lora_configs = LoraConfig(**adapter_configs)

In [27]:
prepared_model_4bit = prepare_model_for_kbit_training(model)
model2 = get_peft_model(prepared_model_4bit, lora_configs)

In [22]:
# Prompt format for LLaMA-2 chat models
llama_prompt = """<s>[INST] <<SYS>>
You are a helpful assistant.
<</SYS>>

Context: {context}

Question: {question}
[/INST]
{answer}"""  # No </s> here; will be added with EOS_TOKEN

EOS_TOKEN = tokenizer.eos_token  # Should be "</s>"

def formatting_prompts_func(examples):
    contexts  = examples["context"]
    questions = examples["question"]
    answers   = examples["answer"]
    texts = []
    for context, question, answer in zip(contexts, questions, answers):
        # Format the prompt and add the EOS token to ensure generation stops
        text = llama_prompt.format(context=context, question=question, answer=answer) + EOS_TOKEN
        texts.append(text)
    return { "text": texts }

train_dataset = Dataset.from_pandas(train_set)
formatted_train_set = train_dataset .map(formatting_prompts_func, batched=True,)

Map:   0%|          | 0/9600 [00:00<?, ? examples/s]

In [23]:
tokenizer.pad_token = tokenizer.eos_token

def collate(mini_batch):
    input_encodings = tokenizer([sample['text'] for sample in mini_batch], return_tensors='pt', padding=True)
    labels = input_encodings.input_ids.clone()
    labels[~input_encodings.attention_mask.bool()] = -100

    return input_encodings, labels

data_loader = DataLoader(
    formatted_train_set, collate_fn=collate, shuffle=True, batch_size=1
)

In [24]:
class LightningWrapper(L.LightningModule):
    def __init__(self, model, tokeniser, lr=1.e-4):
        super().__init__()
        self._model = model
        self._tokeniser = tokeniser
        self._lr = lr

    def configure_optimizers(self):
        # Build optimiser
        optimiser = AdamW(self.parameters(), lr=self._lr)

        return optimiser

    def forward(self, *args, **kwargs):
        return self._model.forward(*args, **kwargs)

    def training_step(self, mini_batch, mini_batch_idx):
        # Unpack the encoding and the target labels
        input_encodings, labels = mini_batch
        # Run generic forward step
        output = self.forward(**input_encodings)
        # Compute logits
        logits: torch.tensor = output.logits
        # Shift logits to exclude the last element
        logits = logits[..., :-1, :].contiguous()
        # shift labels to exclude the first element
        labels = labels[..., 1:].contiguous()
        # Compute LM loss token-wise
        loss: torch.tensor = F.cross_entropy(logits.view(-1, logits.size(-1)), labels.view(-1))

        return loss

lightning_model = LightningWrapper(model2, tokenizer)

In [25]:
trainer = L.Trainer(
    accumulate_grad_batches=8,
    precision='bf16-mixed',  # Mixed precision (bf16-mixed or 16-mixed)
    gradient_clip_val=1.0,  # Gradient clipping
    max_epochs=1
)

INFO: Using bfloat16 Automatic Mixed Precision (AMP)
INFO:lightning.pytorch.utilities.rank_zero:Using bfloat16 Automatic Mixed Precision (AMP)
INFO: Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
INFO:lightning.pytorch.utilities.rank_zero:Using default `ModelCheckpoint`. Consider installing `litmodels` package to enable `LitModelCheckpoint` for automatic upload to the Lightning model registry.
INFO: GPU available: True (cuda), used: True
INFO:lightning.pytorch.utilities.rank_zero:GPU available: True (cuda), used: True
INFO: TPU available: False, using: 0 TPU cores
INFO:lightning.pytorch.utilities.rank_zero:TPU available: False, using: 0 TPU cores
INFO: HPU available: False, using: 0 HPUs
INFO:lightning.pytorch.utilities.rank_zero:HPU available: False, using: 0 HPUs


In [26]:
trainer.fit(lightning_model, train_dataloaders=data_loader)

INFO: LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
INFO:lightning.pytorch.accelerators.cuda:LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]


OutOfMemoryError: CUDA out of memory. Tried to allocate 22.00 MiB. GPU 0 has a total capacity of 14.74 GiB of which 18.12 MiB is free. Process 85386 has 14.72 GiB memory in use. Of the allocated memory 14.28 GiB is allocated by PyTorch, and 324.48 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

In [None]:
torch.save(model.state_dict(), 'model_Llama.pth')