# Retrieval Augmented Generation Workshop

<span style="text-transform: uppercase;
        font-size: 14px;
        letter-spacing: 1px;
        font-family: 'Segoe UI', sans-serif;">
    Author
</span><br>
efrén cruz cortés
<hr style="border: none; height: 1px; background: linear-gradient(to right, transparent 0%, #ccc 10%, transparent 100%); margin-top: 10px;">

## Section 2 -  Adding Generation

OK, we are done with the R of RAG. Time to get the A done!

For this, we need a generation model and a library that helps us run such model.

I recommend either using `transformers` (default) or `ollama` (only if you already have it installed locally).

## Imports

In [None]:
try:
    import google.colab
    import subprocess
    print("Looks like you are working on google Colab! Let's install the necessary packages...\n")
    try:
      subprocess.run(["nvidia-smi"]) # add args check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE if necessary
      print("You're using Colab's GPU runtime!\n")
      !pip install faiss-gpu-cu12==1.12.0 hf_xet
      # if the above doesn't work, try faiss-gpu-cu11
    except (subprocess.CalledProcessError, FileNotFoundError):
      print("You're NOT using Colab's GPU runtime. Embeddings will be slow :-/\n")
      !pip install faiss-cpu hf_xet
except ModuleNotFoundError:
    print(("Looks like you are working locally!"
            " Make sure you create a virtual environment and install the necessary packages as described in the README file."))
    pass

In [None]:
# LLM libraries
from transformers import pipeline
from sentence_transformers import SentenceTransformer
import faiss
try:
    import ollama
    print('Looks like you have ollama working. Nice!')
except ImportError:
    print('ollama not available. Make sure you select transformers below')

# Other packages
import pandas as pd
from pathlib import Path

In [None]:
# Setting up local vs github directories:
try:
    import google.colab
    !wget -O rag_system_prompt.txt "https://raw.githubusercontent.com/nuitrcs/AI_Week_RAG/refs/heads/main/system_prompts/rag_system_prompt.txt"
    sysm_dir = Path("rag_system_prompt.txt")
except ModuleNotFoundError:
    sysm_dir = Path('system_prompts/rag_system_prompt.txt')
    

### Setup

The following is just to help us accommodate the use of both `transformers` and `ollama`.

In [None]:
gen_lib = input("Which library are you using? [transformers/ollama]")

# Choose model based on library
if gen_lib == 'transformers':
    generation_model = "allenai/OLMo-2-0425-1B-Instruct"
    print(f"You're using {generation_model}")
elif 'ollama':
    generation_model = "llama3.2"
    print(f"You're using {generation_model}. Make sure Ollama is installed in your computer!")
else:
    print("I don't know that library. Make sure you know what you are doing!")

In [None]:
# Download model if necessary. This may take a few minutes
if gen_lib == 'transformers':
    gen_ppln = pipeline(task="text-generation", model=generation_model)

In [None]:
# I'll make a handy wrapper function to account for the possible use of both libraries:
def generate_response(prompt, gen_model, gen_lib):
    if gen_lib == 'transformers':
        response = gen_ppln(prompt, max_new_tokens=200)
        response = response[0]['generated_text']
    elif gen_lib == 'ollama':
        response = ollama.generate(model=gen_model, prompt=prompt)
        response = response.response
    else:
        print("Please specify either transformers or ollama")
    return response

### Generating text

Some models are better at chat behavior than others. If you're using a llama model, you can just submit your question, else, you need to add some extra instructions (a system prompt) and format the query. We'll talk about this in more detail below.

In [None]:
if gen_lib == 'ollama':
    sample_query = "Why is the sky blue and not pink?"
elif gen_lib == 'transformers':
    sample_query = (    
        "System: You are a knowledgable assistant who will respond to the user's query. Once a query is answered, you stop. \n"
        "User: Why is the sky blue and not pink? \n"
        "Assistant:"
    )
response = generate_response(sample_query, gen_model=generation_model, gen_lib=gen_lib)
print(response)

**NOTE**

The above response may not look as good as what you're used to with ChatGPT. That's OK, we're using a much smaller model.

<hr>

**<span style="color:red">EXERCISE</span>** <span style="color:darkred"></span>

1. Play a little bit with your generation model. Feel free to ask either serious or silly questions. Provide specific instructions on how you want it to behave, etc.

<hr>

### Creating a system prompt

Let's create a system prompt with instructions for the LLM, this system prompt will accept some extra grounding data which we obtain by searching our index. **This is the heart of RAG**, the prompt for our generation model will be supplemented by query-relevant information from a large pool, this is done through the index search.

We could write our system prompt here, but these can get quite long, and maybe used in different scripts, hence I recommend writing your prompts in text files and then just loading them. Let's head to `system_prompts/rag_system_prompt.txt`.

**(Optional) Quick Review of python Strings**

We'll be modifying the system prompt with the variable output from our LLM, so here's a quick review of strings:

In [None]:
temp_q = "What is the answer to the ultimate question of life, the universe, and everything?"
temp_a = 42

In [None]:
# f-strings
f_string = f"Question: {temp_q}\nAnswer: {temp_a}"
print(f_string)

In [None]:
# .format() with format fields {}
temp_text = "Question: {question}\nAnswer: {answer}"
temp_text = temp_text.format(question=temp_q, answer=temp_a)
print(temp_text)

**Load the system prompt**

In [None]:
sysm_text = sysm_dir.read_text()
print(sysm_text)

<hr>

**<span style="color:red">EXERCISE</span>** <span style="color:darkred"><<*ERASE solutions after debugging*>></span>

1. Create a new text file called `exercise_sysm.txt` or something like that. Write another system prompt with any instructions your heart desires. Incorporate as format fields both the user input (which can be generic, as in my case, or can be a question, a thought, a chat exchange, etc.) and other piece of information to be filled later, like songs.
2. If you are using Colab, you will need to upload the file, click on the folder icon on the left menu bar to do so.
3. Load the exercise system prompt (call it `exercise_sysm`) and print it to make sure it works.

<hr>

**Format the System Prompt**

In [None]:
test_song = 'Hey Macarena, ay!'
test_question = 'What is the most philosophical song ever?'
full_prompt = sysm_text.format(songs=test_song, user_input=test_question)
print(full_prompt)

<hr>

**<span style="color:red">EXERCISE</span>** <span style="color:darkred"></span>

1. Try the above with your exercise prompt, feel free to input other text.

<hr>

**Generate an answer based on the full prompt**

In [None]:
response = generate_response(prompt=full_prompt, gen_model=generation_model, gen_lib=gen_lib)
print(response)

<hr>

**<span style="color:red">EXERCISE</span>** <span style="color:darkred"></span>

1. Repeat, with your exercise system message + prompt.

<hr>

### Connecting retrieval and generation

OK, we have all the elements now. Let's put it all together. Let's perform our first RAAAAG!!!

**Part 1 - You create an index from a large database**

In [None]:
# :: Load the data ::
data_path = "https://raw.githubusercontent.com/nuitrcs/AI_Week_RAG/refs/heads/main/data/songs.csv"
lyrics = pd.read_csv(data_path)

# :: Create embeddings ::
model_name = 'all-mpnet-base-v2' 
emb_model = SentenceTransformer(model_name)
embeddings = emb_model.encode(lyrics['Lyrics'], normalize_embeddings=True)

# :: Create index ::
d_emb = len(embeddings[0])
faiss_index = faiss.IndexFlatIP(d_emb)
# Add embeddings to index
faiss_index.add(embeddings)

**Step 2 - You perform a query and find the most relevant vectors in your index**

In [None]:
# :: Search ::
# Make query and embed it
user_query = "Throughout the echoes of history, nobody has thought as originally as me: I will make a song about August! About the memories and moments we shared in August, before it got away from us. Surely there are no other songs about August, right?"
query_emb = emb_model.encode(user_query, normalize_embeddings=True)
# Reshape if necessary
query_emb = query_emb.reshape((1, query_emb.shape[0]))
# Search
k = 2
D_matched, I_matched = faiss_index.search(query_emb, k)

**Step 3a - You format the relevant songs**

In [None]:
relevant_data = []
for i, song_idx in enumerate(I_matched[0]):
    row = lyrics.iloc[song_idx]
    song_data = f"Song {i}\nTitle: {row['Title']}\nAuthor: {row['Artist']}\nLyrics: {row['Lyrics']}"
    relevant_data.append(song_data)
relevant_data = "\n\n".join(relevant_data)

In [None]:
print(relevant_data)

**Step 2b - You add to the system prompt!**

In [None]:
sysm_text = sysm_dir.read_text()
full_prompt = sysm_text.format(songs=relevant_data, user_input=user_query)
print(full_prompt)

**You feed the full prompt to the LLM**

In [None]:
response = generate_response(prompt=full_prompt, gen_model=generation_model, gen_lib=gen_lib)
print(response)

And Abracadabra!

<hr>

**<span style="color:red">EXERCISE</span>** <span style="color:darkred"></span>

Perform all of the above steps but:
1. Using the embeddings from the exercises, keeping our original system prompt.
2. Keeping our original embeddings, using the exercise system prompt.
3. Using the exercises embeddings and prompt.

<hr>