<img align="left" src="https://ithaka-labs.s3.amazonaws.com/static-files/images/tdm/tdmdocs/tapi-logo-small.png" />

This notebook free for educational reuse under [Creative Commons CC BY License](https://creativecommons.org/licenses/by/4.0/).

Created by [Grant Glass](https://glassgrant.com) for the 2024 Text Analysis Pedagogy Institute, with support from [Constellate](https://constellate.org).

For questions/comments/improvements, email grantg@unc.edu.<br />
____

# Large Language Models and Embeddings for Retrieval Augmented Generation: Day 3 7/19/24

This is lesson `3` of 3 in the educational series on `Large Language Models (LLMs) and Retrieval Augmented Generation (RAG)`. This notebook focuses on advanced techniques for optimizing RAG systems.

**Skills:** 
* Data analysis
* Machine learning
* Text analysis
* Language models
* Vector embeddings
* Retrieval Augmented Generation
* Performance optimization

**Audience:** `Learners`

**Use case:** `Tutorial`

This tutorial guides users through advanced techniques for optimizing Retrieval Augmented Generation systems.



**Difficulty:** `Advanced`

Advanced assumes users are very familiar with Python and have been programming for years, but they may not be familiar with the specific optimization techniques for RAG systems.


**Completion time:** `90 minutes`

**Knowledge Required:** 
* Python programming (including object-oriented programming)
* Understanding of LLMs and embeddings (covered in Days 1 and 2)
* Basic knowledge of RAG systems (covered in Day 2)


**Knowledge Recommended:**
* Experience with natural language processing (NLP)
* Familiarity with information retrieval concepts

**Learning Objectives:**
After this lesson, learners will be able to:
1. Implement advanced retrieval techniques for RAG systems
2. Optimize prompt engineering for improved RAG performance
3. Develop strategies for handling long contexts in RAG
4. Implement and evaluate different reranking methods
5. Create a more sophisticated RAG pipeline integrating multiple optimization techniques


**Research Pipeline:**
1. Introduction to LLMs and their applications (Day 1)
2. Exploring embeddings and introduction to RAG (Day 2)
3. **Optimizing RAG systems for enhanced performance**
4. Applying optimized RAG in research contexts

___

# Required Python Libraries

* [OpenAI](https://github.com/openai/openai-python) for generating embeddings and interacting with GPT models
* [Pandas](https://pandas.pydata.org/) for data manipulation
* [NumPy](https://numpy.org/) for numerical operations
* [Scikit-learn](https://scikit-learn.org/) for similarity calculations and evaluation metrics
* [FAISS](https://github.com/facebookresearch/faiss) for efficient similarity search
* [NLTK](https://www.nltk.org/) for text preprocessing

## Install Required Libraries

In [None]:
### Install Libraries ###
!pip install openai pandas numpy scikit-learn faiss-cpu nltk

In [None]:
# Import Libraries

# Import the OpenAI library to interact with the OpenAI API, useful for tasks like text generation or semantic search.
import openai

# Import the OpenAI class from the openai library for direct use of its methods (though this seems redundant given the previous import)
from openai import OpenAI

# Import pandas, a powerful data manipulation and analysis library for Python.
import pandas as pd

# Import numpy, a library for numerical operations on large, multi-dimensional arrays and matrices.
import numpy as np

# Import cosine_similarity from sklearn, a method for calculating the cosine similarity between vectors, useful in various machine learning tasks.
from sklearn.metrics.pairwise import cosine_similarity

# Import precision_score, recall_score, f1_score from sklearn for evaluating the accuracy of a classification.
from sklearn.metrics import precision_score, recall_score, f1_score

# Import faiss, a library for efficient similarity search and clustering of dense vectors.
import faiss

# Import nltk, a toolkit for natural language processing (NLP) tasks.
import nltk

# From nltk, import word_tokenize for splitting strings into words and stopwords for filtering out common words.
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords

# Import os, a module for interacting with the operating system, useful for file paths, environment variables, etc.
import os

# The line intended to import the faiss library is misspelled. It should be "import faiss".
# Faiss is a library for efficient similarity search and clustering of dense vectors.
import faiss

# Import the ast library, which is used for processing trees of the Python abstract syntax grammar.
# The ast module helps in introspecting and analyzing Python code.
# Add this import to handle string to list conversion
import ast  

# Download the 'punkt' tokenizer models. This is necessary for tokenizing words in text.
nltk.download('punkt')

# Download the list of stopwords to filter out common words that are usually irrelevant in NLP tasks.
nltk.download('stopwords')

# Required Data

We'll continue using the dataset from Day 2.

## Prepare Data

In [None]:
# Load the dataset
df = pd.read_csv('day2_dataset_adaptation.csv')  # Assuming we saved the DataFrame from Day 2


# Introduction

In this final lesson of our LLMs with RAG workshop, we'll focus on advanced techniques for optimizing Retrieval Augmented Generation (RAG) systems. We'll explore methods to enhance retrieval accuracy, improve prompt engineering, handle long contexts, and implement reranking strategies.

Key topics we'll cover:
1. Advanced retrieval techniques
2. Optimizing prompt engineering
3. Handling long contexts
4. Implementing reranking methods
5. Building an optimized RAG pipeline

Let's begin by setting up our OpenAI API access and defining some utility functions:

## Configure the OpenAI client

To setup the client for our use, we need to create an API key to use with our request. Skip these steps if you already have an API key for usage.

You can get an API key by following these steps:

1. [Create a new project](https://help.openai.com/en/articles/9186755-managing-your-work-in-the-api-platform-with-projects)
2. [Generate an API key in your project](https://platform.openai.com/api-keys)
3. (RECOMMENDED, BUT NOT REQUIRED) [Setup your API key for all projects as an env var](https://platform.openai.com/docs/quickstart/step-2-set-up-your-api-key)

In [None]:
## Method 1: Directly paste the API key (not recommended for production or shared code)
client = OpenAI(api_key="your_actual_openai_api_key_here")

# Method 2: Use an environment variable (recommended for most use cases)
# Ensure the environment variable OPENAI_API_KEY is set in your environment before running the script
#client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# Method 3: Use a configuration file (alternative for keeping keys out of code)
# Create a file named config.py (or similar) and define OPENAI_API_KEY in it, then import it here
#from config import OPENAI_API_KEY
#client = OpenAI(api_key=OPENAI_API_KEY)

# Method 4: Use Python's built-in `getpass` module to securely input the API key at runtime (useful for notebooks or temporary scripts)
#from getpass import getpass
#api_key = getpass("Enter your OpenAI API key: ")
#client = OpenAI(api_key=api_key)

In [None]:
# Define a function to get the embedding of a given text.
# This function takes a text string and an optional model name as input.
def get_embedding(text, model="text-embedding-3-small"):
    # Replace newline characters with spaces in the text to ensure it's in a single line.
    text = text.replace("\n", " ")
    # Use the OpenAI API client to create an embedding for the text using the specified model.
    # The function returns the embedding of the first (and only) input text.
    return client.embeddings.create(input=[text], model=model).data[0].embedding

# Define a function to get a completion for a given prompt using GPT.
# This function takes a prompt string and an optional model name as input.
def get_completion(prompt, model="gpt-4o-mini"):
    # Structure the prompt into a format suitable for the OpenAI API, specifying the role as "user".
    messages = [{"role": "user", "content": prompt}]
    # Use the OpenAI API client to create a chat completion using the specified model.
    # The temperature is set to 0 for deterministic output, meaning no randomness in the response.
    response = client.chat.completions.create(
        model=model,
        messages=messages,
        temperature=0,
    )
    # Return the content of the first (and only) message in the response.
    return response.choices[0].message.content

# Print a message indicating that the OpenAI API is ready for use.
print("OpenAI API is ready.")

# Lesson

## 1. Advanced Retrieval Techniques

Let's implement a more sophisticated retrieval system using FAISS for efficient similarity search:

In [None]:

class FAISSRetriever:
    # Initialize the retriever with a DataFrame containing the documents and their embeddings
    def __init__(self, df, client):
        self.df = df  # Store the DataFrame
        self.index = None  # Initialize the FAISS index as None
        self.client = client  # Store the OpenAI API client
        # Check if the DataFrame is not empty, contains an 'embeddings' column, and all embeddings are not null
        if not df.empty and 'embeddings' in df.columns and not df['embeddings'].isnull().all():
            self.build_index()  # Build the FAISS index
        else:
            # If conditions are not met, print a warning and do not build the index
            print("DataFrame is empty or does not contain valid embeddings. Index will not be built.")

    # Method to build the FAISS index
    def build_index(self):
        try:
            # Check if embeddings are stored as strings; if so, convert them to lists
            if isinstance(self.df['embeddings'].iloc[0], str):
                # Convert string embeddings to lists using ast.literal_eval and stack them vertically
                embeddings = np.vstack([ast.literal_eval(emb) for emb in self.df['embeddings'].values])
            else:
                # If embeddings are not strings, stack them vertically as they are
                embeddings = np.vstack(self.df['embeddings'].values)
        except ValueError as e:
            # Catch and print any ValueError that occurs during stacking
            print(f"Error stacking embeddings: {e}")
            return

        # Check if the embeddings are a 2D array as expected
        if embeddings.ndim != 2:
            print(f"Expected embeddings to be a 2D array, got {embeddings.ndim}D instead.")
            return

        # Create a FAISS index for L2 distance and add the embeddings
        self.index = faiss.IndexFlatL2(embeddings.shape[1])
        self.index.add(embeddings.astype('float32'))

    # Method to get embedding using the OpenAI API
    def get_embedding(self, text, model="text-embedding-3-small"):
        # Replace newline characters with spaces in the text to ensure it's in a single line.
        text = text.replace("\n", " ")
        # Use the OpenAI API client to create an embedding for the text using the specified model.
        # The function returns the embedding of the first (and only) input text.
        response = self.client.embeddings.create(input=[text], model=model)
        return response.data[0].embedding

    # Method to retrieve documents similar to a given query
    def retrieve(self, query, k=10):
        # Generate query embedding using the get_embedding method
        query_embedding = np.array(self.get_embedding(query)).reshape(1, -1)
        # Perform the search with the query embedding
        _, indices = self.index.search(query_embedding.astype('float32'), k)
        # Filter out-of-bounds indices
        valid_indices = [index for index in indices[0] if index < len(self.df)]
        # Check if any indices were filtered out
        if len(valid_indices) != len(indices[0]):
            print(f"Warning: {len(indices[0]) - len(valid_indices)} out-of-bounds indices were removed.")
        # Return the DataFrame rows corresponding to the valid indices of the retrieved documents
        return self.df.iloc[valid_indices]

## 2. Optimizing Prompt Engineering

Let's create a more sophisticated prompt template that includes multiple retrieved documents and encourages the model to synthesize information:

In [None]:
# Define a function to create an optimized prompt for text generation tasks.
# This function limits the total number of tokens, the number of documents, and the length of each document.
def create_optimized_prompt(query, retrieved_docs, max_tokens=16385, max_docs=3, max_length_per_doc=500):
    """
    Create an optimized prompt with a limit on the total number of tokens, the number of documents,
    and the length of each document.
    """
    # Start building the prompt with an introduction and the query.
    prompt = f"""Given the following query and relevant information from multiple sources, 
    provide a comprehensive and accurate answer. Synthesize the information from all sources,
    and if there are any contradictions or gaps in the information, point them out.

    Query: {query}

    Relevant Information:
    """
    
    # Initialize a counter for the number of documents added to the prompt.
    doc_count = 0
    # Iterate over the retrieved documents.
    for i, doc in retrieved_docs.iterrows():
        # Stop adding documents if the maximum number of documents has been reached.
        if doc_count >= max_docs:
            break
        # Truncate the document text to the maximum length per document.
        doc_text = doc['fullText'][:max_length_per_doc]
        # Add the document title and truncated text to the prompt.
        prompt += f"\nSource {i+1} - {doc['title']}:\n{doc_text}\n"
        # Increment the document counter.
        doc_count += 1
    
    # Append a section for the synthesized answer to the prompt.
    prompt += "\nSynthesized Answer:"
    
    # Check if the prompt exceeds the maximum token limit.
    if len(prompt) > max_tokens:
        # Truncate the prompt to fit the token limit.
        prompt = prompt[:max_tokens]
    
    # Return the constructed prompt.
    return prompt

# Example usage of the function.
retriever = FAISSRetriever(df, client)
# Define a query.
query = "How do different authors explore the concept of adaptation?"
# Retrieve documents relevant to the query.
retrieved = retriever.retrieve(query)  # Assume 'retriever' is a previously defined object with a 'retrieve' method.
# Create an optimized prompt using the retrieved documents.
optimized_prompt = create_optimized_prompt(query, retrieved)
# Generate a response using the optimized prompt.
response = get_completion(optimized_prompt)
# Print the query and the generated response.
print(f"Query: {query}")
print(f"Optimized RAG Response: {response}")

## 3. Handling Long Contexts

To handle longer contexts, we'll implement a chunking strategy and use a sliding window approach:

In [None]:
# Define a function to create an optimized prompt for a Retrieval-Augmented Generation (RAG) model.
def create_optimized_prompt_long(query, retrieved_docs, max_tokens=16385, max_docs=3, max_length_per_doc=1500, chunk_size=550, window_size=50):
    """
    Create an optimized prompt with a limit on the total number of tokens, the number of documents,
    and the length of each document. Implements a chunking strategy with a sliding window for longer contexts.
    """
    # Initialize the prompt with a standard introduction and the user's query.
    prompt = f"""Given the following query and relevant information from multiple sources, 
    provide a comprehensive and accurate answer. Synthesize the information from all sources,
    and if there are any contradictions or gaps in the information, point them out.

    Query: {query}

    Relevant Information:
    """
    
    # Initialize a counter for the number of documents processed.
    doc_count = 0
    # Iterate over each document retrieved.
    for i, doc in retrieved_docs.iterrows():
        # Stop adding documents if the maximum number has been reached.
        if doc_count >= max_docs:
            break
        # Extract the full text of the current document.
        doc_text = doc['fullText']
        # Split the document text into chunks using a sliding window approach.
        chunks = [doc_text[i:i+chunk_size] for i in range(0, len(doc_text), chunk_size - window_size)]
        # Iterate over each chunk, ensuring not to exceed the maximum document count or token limit.
        for j, chunk in enumerate(chunks[:max_docs]):
            if len(prompt) + len(chunk) > max_tokens:
                break
            # Add each chunk to the prompt, including a header indicating the source and part number.
            prompt += f"\nSource {i+1}, Part {j+1} - {doc['title']}:\n{chunk}\n"
        # Increment the document counter.
        doc_count += 1
    
    # Append a section for the synthesized answer to the prompt.
    prompt += "\nSynthesized Answer:"
    
    # If the prompt exceeds the maximum token limit, truncate it.
    if len(prompt) > max_tokens:
        prompt = prompt[:max_tokens]
    
    # Return the final prompt.
    return prompt

# Initialize a FAISSRetriever object with a DataFrame and a FAISS client.
retriever = FAISSRetriever(df, client)
# Define a query.
query = "How do different authors explore the concept of adaptation?"
# Retrieve documents relevant to the query using the retriever.
retrieved = retriever.retrieve(query)  # Assume 'retriever' is a previously defined object with a 'retrieve' method.
# Create an optimized prompt using the retrieved documents.
optimized_prompt = create_optimized_prompt_long(query, retrieved)
# Generate a response using the optimized prompt.
response = get_completion(optimized_prompt)
# Print the query and the generated response.
print(f"Query: {query}")
print(f"Optimized RAG Response: {response}")

## 4. Implementing Reranking Methods

Let's implement a simple reranking method based on keyword matching:

In [None]:
# Define a function to count the number of keyword matches in a document.
def keyword_match_count(document, keywords):
    # Convert the document text to lowercase and split it into words.
    words = document.lower().split()
    # Count the occurrences of each keyword in the document's words.
    count = sum(word in keywords for word in words)
    return count  # Return the total count of keyword matches.

# Define a function to rerank documents based on keyword match count.
def rerank_documents(retrieved_docs, keywords):
    # Score each document by the number of keyword matches.
    scored_documents = [(index, doc, keyword_match_count(doc['fullText'], keywords)) for index, doc in retrieved_docs.iterrows()]
    # Sort the documents by their score in descending order.
    scored_documents.sort(key=lambda x: x[2], reverse=True)
    # Return the sorted documents, discarding the scores for simplicity.
    return [doc for _, doc, _ in scored_documents]

# Define a function to create an optimized prompt for a RAG model, incorporating document reranking.
def create_optimized_prompt_long(query, retrieved_docs, keywords, max_tokens=16385, max_docs=3, max_length_per_doc=1500, chunk_size=550, window_size=50):
    """
    This function reranks documents based on keyword matching before creating the prompt.
    It ensures the prompt is constructed from the most relevant documents.
    """
    # Start constructing the prompt with the query and an introduction.
    prompt = f"""Given the following query and relevant information from multiple sources, 
    provide a comprehensive and accurate answer. Synthesize the information from all sources,
    and if there are any contradictions or gaps in the information, point them out.

    Query: {query}

    Relevant Information:
    """
    
    # Rerank the documents based on keyword matching.
    reranked_docs = rerank_documents(retrieved_docs, keywords)
    
    # Initialize a counter for the number of documents added to the prompt.
    doc_count = 0
    # Iterate over the reranked documents to add their content to the prompt.
    for i, doc in enumerate(reranked_docs):
        if doc_count >= max_docs:
            break  # Stop if the maximum number of documents has been reached.
        doc_text = doc['fullText']
        # Break the document text into chunks to fit within the prompt size limits.
        chunks = [doc_text[i:i+chunk_size] for i in range(0, len(doc_text), chunk_size - window_size)]
        for j, chunk in enumerate(chunks[:max_docs]):
            if len(prompt) + len(chunk) > max_tokens:
                break  # Stop adding chunks if the prompt would exceed the token limit.
            # Add the chunk to the prompt, including a header with the source number and part.
            prompt += f"\nSource {i+1}, Part {j+1} - {doc['title']}:\n{chunk}\n"
        doc_count += 1  # Increment the document counter.
    
    # Finalize the prompt with a section for the synthesized answer.
    prompt += "\nSynthesized Answer:"
    
    # Trim the prompt if it exceeds the maximum token limit.
    if len(prompt) > max_tokens:
        prompt = prompt[:max_tokens]
    
    return prompt  # Return the constructed prompt.

# Example usage of the functions to generate an optimized RAG response.
keywords = ["adaptation", "authors", "concept"]  # Define relevant keywords for reranking.
retrieved = retriever.retrieve(query)  # Retrieve documents relevant to the query.
optimized_prompt = create_optimized_prompt_long(query, retrieved, keywords)  # Adjusted to include keywords for reranking.
response = get_completion(optimized_prompt)  # Generate a response using the optimized prompt.
print(f"Query: {query}")
print(f"Optimized RAG Response: {response}")

In [None]:
# Import the PorterStemmer class for stemming words
from nltk.stem import PorterStemmer

# Define a function to count the overlap of stemmed keywords with stemmed document tokens
def enhanced_keyword_match_count(document, keywords):
    stemmer = PorterStemmer()  # Initialize the stemmer
    # Tokenize and stem the document text
    doc_tokens = word_tokenize(document.lower())
    doc_stems = [stemmer.stem(token) for token in doc_tokens]
    # Stem the keywords for comparison
    keyword_stems = [stemmer.stem(keyword.lower()) for keyword in keywords]
    # Count the occurrences of each stemmed keyword in the document's stemmed tokens
    count = sum(stem in doc_stems for stem in keyword_stems)
    return count  # Return the total count of keyword occurrences

# Function to rerank documents based on the count of keyword matches
def rerank_documents(retrieved_docs, keywords):
    # Ensure the input is a DataFrame for easier manipulation
    if not isinstance(retrieved_docs, pd.DataFrame):
        retrieved_docs = pd.DataFrame(retrieved_docs)
    # Score each document by the number of keyword matches
    scored_documents = [(index, doc, enhanced_keyword_match_count(doc['fullText'], keywords)) for index, doc in retrieved_docs.iterrows()]
    # Sort the documents by their score in descending order
    scored_documents.sort(key=lambda x: x[2], reverse=True)
    # Return the sorted documents, discarding the scores for simplicity
    return [doc for _, doc, _ in scored_documents]

# Function to create an optimized prompt for a RAG model, incorporating document reranking
def create_optimized_prompt_long(query, retrieved_docs, keywords, max_tokens=16385, max_docs=3, max_length_per_doc=1500, chunk_size=550, window_size=50):
    """
    This function reranks documents based on keyword matching before creating the prompt.
    It ensures the prompt is constructed from the most relevant documents.
    """
    # Start constructing the prompt with the query and an introduction
    prompt = f"""Given the following query and relevant information from multiple sources, 
    provide a comprehensive and accurate answer. Synthesize the information from all sources,
    and if there are any contradictions or gaps in the information, point them out.

    Query: {query}

    Relevant Information:
    """
    
    # Rerank the documents based on keyword matching
    reranked_docs = rerank_documents(retrieved_docs, keywords)
    
    # Initialize a counter for the number of documents added to the prompt
    doc_count = 0
    # Iterate over the reranked documents to add their content to the prompt
    for i, doc in enumerate(reranked_docs):
        if doc_count >= max_docs:
            break  # Stop if the maximum number of documents has been reached
        doc_text = doc['fullText']
        # Break the document text into chunks to fit within the prompt size limits
        chunks = [doc_text[i:i+chunk_size] for i in range(0, len(doc_text), chunk_size - window_size)]
        for j, chunk in enumerate(chunks[:max_docs]):
            if len(prompt) + len(chunk) > max_tokens:
                break  # Stop adding chunks if the prompt would exceed the token limit
            # Add the chunk to the prompt, including a header with the source number and part
            prompt += f"\nSource {i+1}, Part {j+1} - {doc['title']}:\n{chunk}\n"
        doc_count += 1  # Increment the document counter
    
    # Finalize the prompt with a section for the synthesized answer
    prompt += "\nSynthesized Answer:"
    
    # Trim the prompt if it exceeds the maximum token limit
    if len(prompt) > max_tokens:
        prompt = prompt[:max_tokens]
    
    return prompt  # Return the constructed prompt

# Example usage of the functions to generate an optimized RAG response
keywords = ["adaptation", "authors", "concept"]  # Define relevant keywords for reranking
retrieved = retriever.retrieve(query)  # Retrieve documents relevant to the query.
optimized_prompt = create_optimized_prompt_long(query, retrieved, keywords)  # Adjusted to include keywords for reranking
response = get_completion(optimized_prompt)  # Generate a response using the optimized prompt.
print(f"Query: {query}")
print(f"Optimized RAG Response: {response}")

## 5. Building an Optimized RAG Pipeline

Now, let's put everything together into an optimized RAG pipeline:

In [None]:
# Function to count the number of keyword matches in a document
def enhanced_keyword_match_count(document, keywords):
    # Initialize the Porter Stemmer for stemming words
    stemmer = PorterStemmer()
    # Tokenize and stem the document text
    doc_tokens = word_tokenize(document.lower())
    doc_stems = set(stemmer.stem(token) for token in doc_tokens)
    # Stem the keywords for matching
    keyword_stems = set(stemmer.stem(keyword.lower()) for keyword in keywords)
    # Count the number of keyword stems present in the document stems
    count = sum(stem in doc_stems for stem in keyword_stems)
    return count

# Function to rerank documents based on the number of keyword matches
def rerank_documents(retrieved_docs, keywords):
    # Convert retrieved_docs to a DataFrame if it's not already one
    if not isinstance(retrieved_docs, pd.DataFrame):
        retrieved_docs = pd.DataFrame(retrieved_docs)
    # Score each document by the number of keyword matches
    scored_documents = [(index, doc, enhanced_keyword_match_count(doc['fullText'], keywords)) for index, doc in retrieved_docs.iterrows()]
    # Sort the documents by their score in descending order
    scored_documents.sort(key=lambda x: x[2], reverse=True)
    # Return the sorted documents, discarding the scores
    return [doc for _, doc, _ in scored_documents]

# Function to create an optimized prompt for a RAG model
def create_optimized_prompt_long(query, retrieved_docs, keywords, max_tokens=16385, max_docs=3, max_length_per_doc=1500, chunk_size=550, window_size=50):
    # Initial prompt setup with the user's query
    prompt = f"""Given the following query and relevant information from multiple sources, 
    provide a comprehensive and accurate answer. Synthesize the information from all sources,
    and if there are any contradictions or gaps in the information, point them out.

    Query: {query}

    Relevant Information:
    """
    # Rerank documents based on keyword matching
    reranked_docs = rerank_documents(retrieved_docs, keywords)
    doc_count = 0
    # Iterate over the reranked documents to add them to the prompt
    for i, doc in enumerate(reranked_docs):
        if doc_count >= max_docs:
            break
        doc_text = doc['fullText']
        # Break the document text into chunks for inclusion in the prompt
        chunks = [doc_text[i:i+chunk_size] for i in range(0, len(doc_text), chunk_size - window_size)]
        for j, chunk in enumerate(chunks[:max_docs]):
            # Ensure the prompt does not exceed the maximum token limit
            if len(prompt) + len(chunk) > max_tokens:
                break
            # Add the chunk to the prompt
            prompt += f"\nSource {i+1}, Part {j+1} - {doc['title']}:\n{chunk}\n"
        doc_count += 1
    # Finalize the prompt with a section for the synthesized answer
    prompt += "\nSynthesized Answer:"
    # Trim the prompt if it exceeds the maximum token limit
    if len(prompt) > max_tokens:
        prompt = prompt[:max_tokens]
    return prompt

# Example usage of the functions to generate an optimized RAG response
query = "How do literary scholars understand adaptation theory?"
keywords = ["adaptation", "authors", "concept"]
# Assume a retriever function is defined elsewhere to get relevant documents
retrieved = retriever.retrieve(query)
# Generate an optimized prompt based on the retrieved documents and keywords
optimized_prompt = create_optimized_prompt_long(query, retrieved, keywords)
# Assume a get_completion function is defined elsewhere to generate a response
response = get_completion(optimized_prompt)
# Print the query and the optimized RAG response
print(f"Query: {query}")
print(f"Optimized RAG Response: {response}")

# Exercises

1. Implement a cross-encoder reranking method using a pre-trained model from the `sentence-transformers` library.

2. Develop a method to dynamically adjust the number of retrieved documents based on the complexity of the query.

3. Implement a simple caching mechanism to store and reuse embeddings and retrieved results for frequently asked questions.

4. Create a method to generate follow-up questions based on the initial RAG response, encouraging a more interactive and in-depth exploration of the topic.

# Conclusion

In this final lesson of our LLMs with RAG workshop, we've explored advanced techniques for optimizing Retrieval Augmented Generation systems. We've implemented efficient retrieval using FAISS, developed strategies for handling long contexts, created optimized prompts, and built a reranking method. Finally, we combined these techniques into a comprehensive RAG pipeline.

These optimization techniques can significantly enhance the performance of RAG systems, making them more accurate, efficient, and capable of handling complex queries and large knowledge bases.

As you continue to work with RAG systems, remember that ongoing experimentation and refinement are key to achieving the best results for your specific use case.

# References

1. Karpukhin, V., et al. (2020). [Dense Passage Retrieval for Open-Domain Question Answering](https://arxiv.org/abs/2004.04906). arXiv preprint arXiv:2004.04906.
2. Khattab, O., & Zaharia, M. (2020). [ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT](https://arxiv.org/abs/2004.12832). arXiv preprint arXiv:2004.12832.
3. Gao, L., et al. (2021). [Making Pre-trained Language Models Better Few-shot Learners](https://arxiv.org/abs/2012.15723). arXiv preprint arXiv:2012.15723.
4. Johnson, J., Douze, M., & Jégou, H. (2019). [Billion-scale similarity search with GPUs](https://arxiv.org/abs/1702.08734). IEEE Transactions on Big Data.

___
This concludes our LLMs with RAG workshop series. I hope you found these lessons informative and practical for your research and applications!