# GenAI-Camp: Day 03
## Lesson: Basic RAG

This lesson is intended to show you the basics of a Retrieval Augmented Generation (RAG) system.

During this lesson you will learn how to ...

- implement the different building blocks of RAG
- create an ingestion pipeline from the building blocks
- create a retrieval pipeline from the building blocks
- use a RAG system to generate responses to user inputs

### Set up the environment
Import the necessary libraries, set constants, and define helper functions.

In [23]:
import os
from google import genai
from google.genai import types
from pydantic import BaseModel
import json

In [24]:
if os.getenv("COLAB_RELEASE_TAG"):
   from google.colab import userdata
   GOOGLE_API_KEY=userdata.get('GEMINI_API_KEY')
   COLAB = True
   print("Running on COLAB environment.")
else:
   from dotenv import load_dotenv, find_dotenv
   load_dotenv(find_dotenv())
   GOOGLE_API_KEY = os.getenv("GEMINI_API_KEY")
   COLAB = False
   print("WARNING: Running on LOCAL environment.")
client = genai.Client(api_key=GOOGLE_API_KEY)



In [None]:
# Install additional libraries
if COLAB:
  !pip install -qU langchain-text-splitters chromadb

In [None]:
# Import additional libraries
from langchain_text_splitters import RecursiveCharacterTextSplitter
from chromadb import EphemeralClient
import uuid

In [27]:
# Define path of ressources
if COLAB:
    # Clone the data repository into colab
    !git clone https://github.com/openknowledge/workshop-genai-camp-data.git
    ROOT_PATH = "/content/workshop-genai-camp-data/day-03"
else:
    ROOT_PATH = ".."
DATA_PATH = ROOT_PATH + "/data"
KNOWLEDGEBASE_PATH = ROOT_PATH + "/knowledgebase"
BOOK_CATALOG_FILE = DATA_PATH + "/books.json"

In [28]:
# Set default values for model, model parameters and prompt
DEFAULT_MODEL = "gemini-1.5-flash"
DEFAULT_CONFIG_TEMPERATURE = 0.9
DEFAULT_CONFIG_TOP_K = 1
DEFAULT_CONFIG_MAX_OUTPUT_TOKENS = 200
DEFAULT_SYSTEM_PROMPT = "Your are a friendly assistant"
DEFAULT_USER_PROMPT = " "

In [None]:
# This will be the chromadb collection we use as a knowledge base. We do not need the in-memory EphemeralClient.
chromadb_collection =  EphemeralClient().get_or_create_collection(name="default")

# Have a look into the knowledgebase
def peek_knowledgebase():
  """Shows the first ten items of the knowledgebase"""
  print(chromadb_collection.peek())

In [None]:
# Function to generate a completion with the Gemini model
def generate_gemini_completion(
        model_name: str = DEFAULT_MODEL, 
        temperature:float = DEFAULT_CONFIG_TEMPERATURE,
        top_k: int = DEFAULT_CONFIG_TOP_K, 
        max_output_tokens: int = DEFAULT_CONFIG_MAX_OUTPUT_TOKENS, 
        system_prompt : str = DEFAULT_SYSTEM_PROMPT, 
        user_prompt : str = DEFAULT_USER_PROMPT,
        verbose: bool = False
        ) -> str: 
    
    """ Calls a gemini model with a given set of parameters and returns the completions 
    
    Parameters
    ----------
    model_name : str, optional [default: DEFAULT_GEMINI_MODEL]
        The name of the model to use for the completion
    temperature : float, optional [default: DEFAULT_CONFIG_TEMPERATURE]
        The temperature of the model
    top_k : int, optional [default: DEFAULT_CONFIG_TOP_K]
        The number of most recent matches to return
    max_output_tokens : int, optional [default: DEFAULT_CONFIG_MAX_OUTPUT_TOKENS]
        The maximum number of output tokens to return
    system_prompt : str, optional [default: DEFAULT_SYSTEM_PROMPT]
        The system prompt to use for the completion
    user_prompt : str, optional [default: DEFAULT_USER_PROMPT]
        The user prompt to use for the completion
    verbose : bool, optional [default: False]
        Whether to print details of the completion process or not. Defaults to False            
    Returns 
    -------
    str :
        the generated text      
    """    
    if verbose: 
        # print out summary of input values / parameters
        print(f'Generating answer for following config:')
        print(f'  - SYSTEM PROMPT used:\n {system_prompt}')
        print(f'  - USER PROMPT used:\n {user_prompt}')
        print(f'  - MODEL used:\n {model_name} (temperature = {temperature}, top_k = {top_k}, max_output_tokens = {max_output_tokens})')

    # create generation config 
    model_config = types.GenerateContentConfig(
        max_output_tokens=max_output_tokens,
        temperature=temperature,
        top_k=top_k,
        system_instruction=system_prompt,
    )
    
    # create generation request
    response = client.models.generate_content(
        model=model_name,
        contents=user_prompt,
        config=model_config,
    )
    
    return response.text

In [None]:
# Define a function to read objects from a JSON file
def read_objects_from_json(file_path: str, cls: BaseModel) -> list:
    """Reads list of objects from a JSON file and returns the list."""
    with open(file_path, 'r') as file:
        data = json.load(file)
        objects = [cls(**item) for item in data]
    return objects

# Define classes used in the ingestion process
class Metadata(BaseModel):
    """Represents the metadata of a document which is stored in the knowledgebase."""
    url: str
    title: str
    pub_year: int

class Book(BaseModel):
    """Represents a book with its metadata."""
    metadata: Metadata
    summary: str   

### Configure the genAI models

In [33]:
GENERATION_MODEL = "gemini-1.5-flash"
EMBEDDING_MODEL = "models/text-embedding-004"

### Configure retriever

In [34]:
DEFAULT_K = 3
DEFAULT_CHUNK_SIZE = 2000
DEFAULT_CHUNK_OVERLAP = 100

### Define RAG Building Blocks

In [35]:
# Building Block "Chunking": Split the content into smaller chunks
def do_chunk(text: str, chunk_size: int, chunk_overlap: int) -> list[str]:
  """ Chunks a given text by a given chunk size and chunk overlap and returns a list of chunks

  Parameters
  ----------
  text : str
      The text to be chunked
  chunk_size : int, optional [default: DEFAULT_CHUNK_SIZE]
        The desired chunk size
  chunk_overlap : int, optional [default: DEFAULT_CHUNK_OVERLAP]
        The desired chunk overlap
  Returns
  -------
  chunks: [str]
      The created chunks
  """
  text_splitter = RecursiveCharacterTextSplitter(
      chunk_size=chunk_size,
      chunk_overlap=chunk_overlap,
      length_function=len,
  )
  return text_splitter.split_text(text=text)

In [36]:
# Building Block "Embedding": Create multi dimensional embeddings for a given chunk.
def do_embed(chunk: str) -> list[float]:
  """ Embeds a given chunk and returns the embedding

  Parameters
  ----------
  chunk : str
      The chunk to be embedded
  Returns
  -------
  embedding: [float]
      The created embedding
  """
  content_embeddings = client.models.embed_content(model=EMBEDDING_MODEL, contents=chunk).embeddings
  return content_embeddings[0].values

def do_batch_embed(chunks: list[str]) -> list[list[float]]:
  """ Embeds a list of chunks and returns the embeddings

  Parameters
  ----------
  chunks : [str]
      The chunks to be embedded
  Returns
  -------
  embeddings: [list[float]]
      The created embeddings
  """
  content_embeddings = client.models.embed_content(model=EMBEDDING_MODEL, contents=chunks).embeddings
  return [content_embedding.values for content_embedding in content_embeddings]

In [37]:
# Building Block "Knowledgebase": Store embeddings and the corresponding content in a vectorstore
def persist_embeddings(chunks: list[str], embeddings: list[list[float]], metadatas: list[dict])-> None:
  """ Persists the embeddings and the chunks in the knowledgebase

  Parameters
  ----------
  chunks : [str]
      The chunks to be persisted
  embeddings: [list[float]]
      The corresponding embeddings to be persisted
  """
  ids = [str(uuid.uuid4()) for _ in chunks]
  chromadb_collection.add(ids=ids, documents=chunks, embeddings=embeddings, metadatas=metadatas)

In [38]:
# Building Block "Augmentation": Create an updated prompt by merging the original user input with the provided context
def augment(user_input: str, context: list[str]) -> str:
  """ Augments a given user input by merging it with the provided context and returns the augmented prompt

  Parameters
  ----------
  user_input : str
      The user input to be augmented
  context : [str]
      The context to be merged with the user input
  Returns
  -------
  augmented_prompt: str
      The created augmented prompt
  """
  prepared_context = "\n".join(context)
  augmented_prompt = f"""
    Answer the question as detailed as possible from the provided context, make sure to provide all the details, if the answer is not in
    provided context just say, "answer is not available in the context", don't provide the wrong answer\n\n
    Context:\n{prepared_context}?\n
    Question: \n{user_input}\n

    Answer:
  """
  return augmented_prompt

In [39]:
# Building Block "Top-k Fetching": Get the k semantically closest chunks to the user input from the knowledgebase
def do_top_k_fetching(user_input_embedding: list[float], top_k: int) -> list[str]:
  """ Fetches the k semantically closest chunks to the user input from the knowledgebase

  Parameters
  ----------
  user_input_embedding : [float]
      The embedding of the user input
  top_k : int
      The number of semantically closest chunks to be fetched

  Returns
  -------
  context: [str]
      The fetched chunks
  """
  # Since we will do the fetching always only for one user_input,
  # instead of querying for multiple embeddings simultanously as allowed by the choma API,
  # we add the embeddings below to a list and return only the first document (chunk)
  return chromadb_collection.query(
      query_embeddings=[user_input_embedding],
      n_results=top_k,
  )["documents"][0]

In [None]:
# Building Block "Generation": Use the generation model to create a response
def generate_response(prompt: str) -> str:
  """ Generates a response for a given prompt

  Parameters
  ----------
  prompt : str
      The prompt to be used for the generation
  Returns
  -------
  response: str
      The generated response
  """
  response = generate_gemini_completion(
      model_name=GENERATION_MODEL,
      user_prompt=prompt,
  )
  print(response)

### Create the ingestion pipeline

In [None]:
# This function defines the ingestion process for the book catalog
def do_ingestion(book_catalog_file: str, chunk_size: int, chunk_overlap: int, batch_size: int = 100) -> None:
  """ Ingests a list of files by a given file name

  Parameters
  ----------
  file_names : [str]
      The names of the files to be ingested
  """
  books = read_objects_from_json(file_path=book_catalog_file, cls=Book)

  all_chunks = []  # Collect all chunks
  all_metadatas = []  # Collect all metadatas

  # Iterate over all books
  for book in books:
    # Load prepared book content
    text_content = book.summary

    # Chunk the content into smaller chunks
    chunks = do_chunk(text_content, chunk_size=chunk_size, chunk_overlap=chunk_overlap)

    # Add the chunks and their corresponding metadata to the lists
    # Note: We use the book.metadata for each chunk, assuming that the metadata is the same for all chunks of a book
    all_chunks.extend(chunks)
    all_metadatas.extend([book.metadata] * len(chunks))  # Create a list of metadatas for each chunk

  # Process chunks in batches. Otherwise, we might run into quota issues.
  for i in range(0, len(all_chunks), batch_size):

    # Get the current batch of chunks and their corresponding metadata
    chunk_batch = all_chunks[i:i + batch_size]
    metadata_batch = all_metadatas[i:i + batch_size]
    metadatas = [metadata.model_dump() for metadata in metadata_batch]  # Convert Metadata objects to dictionaries

    # Embed the batch of chunks
    embeddings = do_batch_embed(chunk_batch)

    # Persist the embeddings and the chunks in the knowledgebase
    persist_embeddings(chunk_batch, embeddings, metadatas)



### Perform ingestion

In [42]:
# Perform ingestion. Depending on the chunk_size this might take some minutes.
do_ingestion(book_catalog_file=BOOK_CATALOG_FILE, chunk_size=DEFAULT_CHUNK_SIZE, chunk_overlap=DEFAULT_CHUNK_OVERLAP)

In [46]:
# Use helper function to peek into knowledgebase
peek_knowledgebase()

{'ids': ['a01718be-9141-4997-808d-3184c788c353', 'bc47ebb8-015e-4635-8fd9-7da482f5aa61', 'd3df5fe8-f00c-4a03-813e-b72493d5fe5c', 'a7126423-0de2-483e-8158-27c0fea3e4b8', '14ad51b2-7f78-40b0-b35d-49e284d63805', '6fa5a2cf-9b47-4e91-a042-4819027280bf', '01c86af0-666f-48c1-8e38-3561c125837b', '7f975e82-c9fd-4251-8808-47c5e0e339ea', '7744cc86-63c5-4443-a6ff-ef8561d4574c', 'ee599d9c-a77b-4db0-b161-60ab94609836'], 'embeddings': array([[ 0.00450815,  0.0427744 ,  0.02168448, ..., -0.03367527,
         0.01990015, -0.08333741],
       [-0.02189463,  0.03620462,  0.0077862 , ..., -0.06787197,
        -0.01269853, -0.03503731],
       [ 0.02164824,  0.03319897,  0.02389836, ..., -0.03954765,
         0.02527347, -0.03462225],
       ...,
       [ 0.04369463,  0.03214902,  0.00271296, ..., -0.02699474,
         0.02897209, -0.03008798],
       [ 0.074299  ,  0.03978257, -0.01100965, ..., -0.04376825,
         0.04558893, -0.02382861],
       [ 0.01814854,  0.01455541, -0.00995817, ..., -0.01635353,

### Exercise 01: Create RAG pipeline
In this exercise you will create a rag pipeline for retrieving relevant chunks and generating a grounded response

In [None]:
# TODO: Update the following function to perfom the retrieval pipeline
def do_rag(user_input: str, verbose: bool = False) -> None:
  """ Runs the RAG pipeline with a given user input and prints the response

  Parameters
  ----------
  user_input : str
      The user input to be used for the RAG pipeline
  verbose : bool, optional [default: False]
      Whether to print details of the RAG process or not. Defaults to False
  """
  # TODO: Embed the user input
  user_input_embedding = do_embed(chunk=user_input)

  # TODO: "R" like "Retrieval": Get the top-k semantically closest chunks to the user input from the knowledgebase
  context = do_top_k_fetching(user_input_embedding=user_input_embedding, top_k=DEFAULT_K)
  if verbose:
    print(f'Retrieved context:\n {context}')

  # TODO: "A" like "Augmented": Create the augmented prompt
  augmented_prompt = augment(user_input=user_input, context=context)
  if verbose:
    print(f'Augmented prompt:\n {augmented_prompt}')

  # TODO: "G" like "Generation": Generate a response
  generate_response(prompt=augmented_prompt)


In [None]:
# Define user input. This should be a question regarding one ingested book
user_input= "Hey, I'm looking for a novel about a young boy at the Mississippi river. I think the boy is named Huck?"

# TODO: Perform retrieval by executing the do_rag function
do_rag(user_input=user_input, verbose=False)

Yes, that sounds like *Adventures of Huckleberry Finn* by Mark Twain.  The novel follows the adventures of a young boy named Huckleberry Finn as he travels down the Mississippi River.  The provided text describes the book in detail, including Huck's escape from his restrictive life, his journey with Jim (a runaway slave), and his grappling with moral dilemmas related to societal expectations and personal conscience.

