<a href="https://colab.research.google.com/github/nicolaiberk/llm_ws/blob/main/notebooks/06a_rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Informed Prompting

In an earlier session, we have explored how to query generative models and how these queries can be enriched with examples (or 'context') to provide more information to the model in one- or few-shot queries. In these cases, we provided the *same* context disregarding the query entry. Today, we will see that model responses can be substantially improved by carefully selecting the context provided to the model.

> ❗ ACTIVATE THE GPU BY SELECTING RUNTIME IN THE UPPER RIGHT > CONNECT TO RUNTIME > T4 GPU

In [36]:
!pip install sentence_transformers datasets faiss-gpu-cu12 transformers torch



> ❗ RESTART THE NOTEBOOK (DROPDOWN NEXT TO RUN ALL > RESTART SESSION)

The [sentence-transformers](https://sbert.net/) library provides an ecosystem of models designed specifically for efficient embedding generation. It works very similar to transformers:

In [37]:
from sentence_transformers import SentenceTransformer, CrossEncoder
import torch

# Check for GPU availability and set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

Using device: cuda


We load a pretrained model:

In [38]:
similarity_model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2").to(device)

Then we encode some sentences of interest:

In [68]:
sentences = [
    "The Great Wall of China was built over several dynasties, with most of the existing structure dating from the Ming Dynasty (1368-1644).",
    "Polar bears and Kodiak bears are the world's largest land carnivores, with most adult males weighing 300-600 kg 660-1320 lb; adult females are about half the size of males.",
    "Studies show that the Dunning-Kruger effect causes people with low ability in a domain to overestimate their competence in that area.",
]

And encode them as embeddings:

In [69]:

# 2. Calculate embeddings by calling model.encode()
embeddings = similarity_model.encode(sentences)
print(embeddings.shape)

(3, 384)


We can then calculate the cosine similarity of the sentences with each other:

In [70]:
# 3. Calculate the embedding similarities
similarities = similarity_model.similarity(embeddings, embeddings)
print(similarities)

tensor([[ 1.0000, -0.0065, -0.0810],
        [-0.0065,  1.0000, -0.0898],
        [-0.0810, -0.0898,  1.0000]])


## Similarity Search

This is particularly useful if we are searching something using a query:

In [92]:
query = "Are there other bears as large as polar bears??"
query_embedding = similarity_model.encode([query])
similarities = similarity_model.similarity(query_embedding, embeddings)
print(similarities)

tensor([[ 0.0132,  0.6603, -0.0465]])


Looks good! Now we can then select the most similar context to add to the prompt:

In [93]:
best_index = similarities.squeeze().argmax().item() # get the index of the highest similarity

We can now add this context to our query, providing the relevant information to our model:



In [94]:
prompt = [
    {"role": "system", "content": "Answer the Question."},
    {"role": "user", "content": query},
    {"role": "system", "content": "Context: " + sentences[best_index]}
]
print(prompt)

[{'role': 'system', 'content': 'Answer the Question.'}, {'role': 'user', 'content': 'Are there other bears as large as polar bears??'}, {'role': 'system', 'content': "Context: Polar bears and Kodiak bears are the world's largest land carnivores, with most adult males weighing 300-600 kg 660-1320 lb; adult females are about half the size of males."}]


Let's provide this prompt to the model and see how it responds (it will take a moment to load the model):

In [95]:
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
import torch

tokenizer = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct")
model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM2-1.7B-Instruct", dtype=torch.float16).to(device)

In [96]:
inputs = tokenizer.apply_chat_template(
	prompt,
	add_generation_prompt=True,
  padding=True,
	return_dict=True, # retains attention mask
	return_tensors="pt", # returns tensors
).to(model.device) # more efficient to put on device

In [97]:
output = model.generate(**inputs, max_new_tokens=100)

In [98]:
tokenizer.decode(output[0])

"<|im_start|>system\nAnswer the Question.<|im_end|>\n<|im_start|>user\nAre there other bears as large as polar bears??<|im_end|>\n<|im_start|>system\nContext: Polar bears and Kodiak bears are the world's largest land carnivores, with most adult males weighing 300-600 kg 660-1320 lb; adult females are about half the size of males.<|im_end|>\n<|im_start|>assistant\nYes, there are other large land carnivores that are similar in size to polar bears and Kodiak bears. Some examples include:\n\n1. Brown bears (Ursus arctos): Also known as grizzly bears, brown bears are the second-largest land carnivores after polar bears. They can weigh between 130-550 kg (290-1210 lb) and reach lengths of up to 3.5 meters (11.5 feet)."

## Retrieval-Augmented Generation

This, of course is more useful when you have a larger set of information to choose from to provide the context. Let's therefore get a mini-version of wikipedia content to choose the relevant context from. This data is conveniently available on the huggingface hub:

In [99]:
from datasets import load_dataset

dataset = load_dataset("rag-datasets/rag-mini-wikipedia", "text-corpus")

As you can see below, the data consists of different text passages from Wikipedia articles:

In [100]:
dataset['passages'][1234]

{'passage': 'The ears are also used in certain displays of aggression and during the males\' mating period. If an elephant wants to intimidate a predator or rival, it will spread its ears out wide to make itself look more massive and imposing. During the breeding season, males give off an odour from a gland located behind their eyes. Joyce Poole, a well-known elephant researcher, has theorized that the males will fan their ears in an effort to help propel this "elephant cologne" great distances.',
 'id': 1235}

Let's clean this corpus up a little bit and encode all texts to embeddings. We start by writing the cleaning function removing empty texts and writing all texts to a list:

In [114]:
import re

## cleanup function
def clean_text(example):
    text = example["passage"]
    text = re.sub(r"[^a-zA-Z0-9\s.,!?;:'/\"-]", "", text)  # remove weird chars
    text = re.sub(r"\s+", " ", text).strip()  # normalize spaces
    example["passage"] = text
    return example

And apply it to our texts:

In [115]:
dataset = dataset.map(clean_text)

Map:   0%|          | 0/3197 [00:00<?, ? examples/s]

Lastly, we remove empty texts and reset the index:

In [116]:
dataset = dataset.filter(lambda example: example["passage"].strip() != "")

Filter:   0%|          | 0/3197 [00:00<?, ? examples/s]

Now we can use the embedding model from above to generate the embeddings:

In [117]:
corpus_embeddings = similarity_model.encode([text for text in dataset["passages"]['passage']], convert_to_tensor=True).cpu().numpy()

In [118]:
corpus_embeddings.shape # we get our vectors

(3197, 384)

We then use a library called [`faiss`](https://github.com/facebookresearch/faiss) to provide fast search through our vectors - this is especially important when we have large context datasets.

In [119]:
import faiss

# FAISS index
index = faiss.IndexFlatL2(corpus_embeddings.shape[1])
index.add(corpus_embeddings)

In [120]:
query_embedding = similarity_model.encode([query], convert_to_tensor=True).to(device).cpu().numpy()

In [121]:
# Retrieve top-k from FAISS
D, I = index.search(query_embedding, k=5)
retrieved_docs = [dataset['passages'][int(idx)]['passage'] for idx in I[0]]

In [122]:
context = '\n Context passage: '.join(retrieved_docs)
print(context)

Polar bears rank with the Kodiak bear as among the largest living land carnivores, and male polar bears may weigh twice as much as a Siberian tiger. Most adult males weigh 350 650 kg 770 1500 lb and measure 2.5 3.0 m 8.2 9.8 ft in length. Adult females are roughly half the size of males and normally weigh 150 250 kg 330 550 lb, measuring 2 2.5 m 6.6 8.2 ft, but double their weight during pregnancy. Stirling makes no mention of length, these are from SeaWorld The great difference in body size makes the polar bear among the most sexually dimorphic of mammals, surpassed only by the eared seals. At birth, cubs weigh only 600 700 g or about a pound and a half. The largest polar bear on record was a huge male, allegedly weighing 1002 kg 2200 lb shot at Kotzebue Sound in northwestern Alaska in 1960.
 Context passage: The polar bear Ursus maritimus is a bear native to the Arctic. Polar bears and Kodiak bears are the world's largest land carnivores, with most adult males weighing 300-600 kg 660

In [123]:
prompt = [
    {"role": "system", "content": "Answer the Question. If no relevant information is provided in the context, respond with 'I cannot answer this question based on the provided context'."},
    {"role": "user", "content": query},
    {"role": "context", "content": context}
]

Tokenize the chat template and provide it to the model:

In [124]:
inputs = tokenizer.apply_chat_template(
	prompt,
	add_generation_prompt=True,
  padding=True,
	return_dict=True, # retains attention mask
	return_tensors="pt", # returns tensors
).to(model.device) # more efficient to put on device

In [125]:
output = model.generate(**inputs, max_new_tokens=1000)

In [126]:
print(tokenizer.decode(output[0]))

<|im_start|>system
Answer the Question. If no relevant information is provided in the context, respond with 'I cannot answer this question based on the provided context'.<|im_end|>
<|im_start|>user
Are there other bears as large as polar bears??<|im_end|>
<|im_start|>context
Polar bears rank with the Kodiak bear as among the largest living land carnivores, and male polar bears may weigh twice as much as a Siberian tiger. Most adult males weigh 350 650 kg 770 1500 lb and measure 2.5 3.0 m 8.2 9.8 ft in length. Adult females are roughly half the size of males and normally weigh 150 250 kg 330 550 lb, measuring 2 2.5 m 6.6 8.2 ft, but double their weight during pregnancy. Stirling makes no mention of length, these are from SeaWorld The great difference in body size makes the polar bear among the most sexually dimorphic of mammals, surpassed only by the eared seals. At birth, cubs weigh only 600 700 g or about a pound and a half. The largest polar bear on record was a huge male, allegedly 

...and this is how a chatbot works!

Note that simple similarity is often not sufficient to retain relevant documents: while sufficient for annotation tasks (where the context should provide the most relevant examples and their associated labels), in retrieval tasks a query might after all differ substantially from the answer/correct information. In such cases, [cross-encoders](https://sbert.net/examples/cross_encoder/applications/README.html) are used to decide which of the subset of texts retrieved based on similarity are most relevant.