<a href="https://mng.bz/8wdg" target="_blank">
    <img src="../../Assets/Images/NewMEAPHeader.png" alt="New MEAP" style="width: 100%;" />
</a>


# Chapter 04 - Generation Pipeline: Generating Contextual LLM Responses

### Welcome to chapter 4 of A Simple Introduction to Retrieval Augmented Generation.

In this chapter, we introduce the concepts behind the real-time generation pipeline that uses the knowledge base created by the indexing pipeline. This will complete the development of a simple RAG system.

The generation pipeline consists of three steps -

1. Retrieval
2. Augmentation
3. Generation

<img src="../../Assets/Images/4.1.png" width=800>

## Installing Dependencies

All the necessary libraries for running this notebook along with their versions can be found in __requirements.txt__ file in the root directory of this repository

You should go to the root directory and run the following command to install the libraries

```
pip install -r requirements.txt
```

This is the recommended method of installing the dependencies

___
Alternatively, you can run the command from this notebook too. The relative path may vary

In [1]:
%pip install -r ../../requirements.txt --quiet

Note: you may need to restart the kernel to use updated packages.


## 1. Load the Vector Index

In Chapter 3, we were working on indexing the Wikipedia page for the 2023 cricket world cup. If you recall we had used embeddings from OpenAI to encode the text and used FAISS as the vector index to store the embeddings. We also stored the FAISS index in a local directory. Let’s reuse this index

Note: You will need an __OpenAI API Key__ which can be obtained from [OpenAI](https://platform.openai.com/api-keys) to reuse the embeddings.

To initialize the __OpenAI client__, we need to pass the api key. There are many ways of doing it. 

####  [Option 1] Creating a .env file for storing the API key and using it # Recommended

Install the __dotenv__ library

_The dotenv library is a popular tool used in various programming languages, including Python and Node.js, to manage environment variables in development and deployment environments. It allows developers to load environment variables from a .env file into their application's environment._

- Create a file named .env in the root directory of their project.
- Inside the .env file, then define environment variables in the format VARIABLE_NAME=value. 

e.g.

OPENAI_API_KEY=YOUR API KEY

In [2]:
from dotenv import load_dotenv
import os

if load_dotenv():
    print("Success: .env file found with some environment variables")
else:
    print("Caution: No environment variables found. Please create .env file in the root directory or add environment variables in the .env file")

Success: .env file found with some environment variables


#### [Option 2] Alternatively, you can set the API key in code. 
However, this is not recommended since it can leave your key exposed for potential misuse. Uncomment the cell below to use this method.

In [3]:
#import os
# os.environ["OPENAI_API_KEY"] = "sk-proj-******" #Imp : Replace with an OpenAI API Key

We can also test if the key is valid or not

In [4]:
api_key=os.environ["OPENAI_API_KEY"]

from openai import OpenAI

client = OpenAI()


if api_key:
    try:
        client.models.list()
        print("OPENAI_API_KEY is set and is valid")
    except openai.APIError as e:
        print(f"OpenAI API returned an API Error: {e}")
        pass
    except openai.APIConnectionError as e:
        print(f"Failed to connect to OpenAI API: {e}")
        pass
    except openai.RateLimitError as e:
        print(f"OpenAI API request exceeded rate limit: {e}")
        pass

else:
    print("Please set you OpenAI API key as an environment variable OPENAI_API_KEY")


OPENAI_API_KEY is set and is valid


In [5]:
# Import OpenAIEmbeddings from the library
from langchain_openai import OpenAIEmbeddings

# Instantiate the embeddings object
embeddings=OpenAIEmbeddings(model="text-embedding-3-small")

# Import FAISS from langchain
from langchain_community.vectorstores import FAISS

# Load the FAISS vector store with safe deserialization
vector_store = FAISS.load_local(folder_path="../../Assets/Data/",index_name="CWC_index", embeddings=embeddings, allow_dangerous_deserialization=True)


## 2. Retrieval

We will now retrieve a relevant passage from the knowledge base that is pertinent to our query - __"Who won the World Cup final?"__

<img src="../../Assets/Images/4.2.png" width=800>

In [None]:
# Define the query
query = "Who won the world cup?"

# Perform similarity search
retrieved_docs = vector_store.similarity_search(query, k=2)  # Get top 2 relevant chunks

# Display results

import textwrap

for i, doc in enumerate(retrieved_docs):
    print(textwrap.fill(f"\nRetrieved Chunk {i+1}:\n{doc.page_content}",width=100))
    print("\n\n")

 Retrieved Chunk 1: The tournament was contested by ten national teams, maintaining the same format
used in  2019 . After six weeks of round-robin matches,  India ,  South Africa ,  Australia , and
New Zealand  finished as the top four and qualified for the knockout stage. In the knockout stage,
India and Australia beat New Zealand and South Africa, respectively, to advance to the final, played
on 19 November at the  Narendra Modi Stadium  in  Ahmedabad . Australia won the final by six
wickets, winning their sixth Cricket World Cup title.



 Retrieved Chunk 2: The host  India  was the first team to qualify for the semi-finals after their
302-run win against  Sri Lanka , their seventh successive win in the World Cup. [ 42 ]  India
secured the top place amongst the semi-finalists after they beat  South Africa  by 243 runs on 5
November at  Eden Gardens  in  Kolkata . [ 43 ]





This is the most basic implementation of a retriever in the generation pipeline of a RAG-enabled system. This method of retrieval is enabled by embeddings. We used the text-embedding-3-small from OpenAI. FAISS calculated the similarity score based on these embeddings

## 3. Augmentation

The information fetched by the retriever should also be sent to the LLM in form of a natural language prompt. This process of combining the user query and the retrieved information is called augmentation.

<img src="../../Assets/Images/4.3.png" width=800>

We will now execute augmentation with a simple contextual prompt with controlled generation.

In [10]:
# taking first two retrieved documents
retrieved_context=retrieved_docs[0].page_content + retrieved_docs[1].page_content

# Creating the prompt
augmented_prompt=f"""

Given the context below answer the question.

Question: {query} 

Context : {retrieved_context}

Remember to answer only based on the context provided and not from any other source. 

If the question cannot be answered based on the provided context, say I don’t know.

"""

print(textwrap.fill(augmented_prompt,width=150))

  Given the context below answer the question.  Question: Who won the world cup?   Context : The tournament was contested by ten national teams,
maintaining the same format used in  2019 . After six weeks of round-robin matches,  India ,  South Africa ,  Australia , and  New Zealand  finished
as the top four and qualified for the knockout stage. In the knockout stage, India and Australia beat New Zealand and South Africa, respectively, to
advance to the final, played on 19 November at the  Narendra Modi Stadium  in  Ahmedabad . Australia won the final by six wickets, winning their sixth
Cricket World Cup title.The host  India  was the first team to qualify for the semi-finals after their 302-run win against  Sri Lanka , their seventh
successive win in the World Cup. [ 42 ]  India secured the top place amongst the semi-finalists after they beat  South Africa  by 243 runs on 5
November at  Eden Gardens  in  Kolkata . [ 43 ]  Remember to answer only based on the context provided and not f

## 4. Generation

Generation is the final step of this pipeline. While LLMs may be used in any of the previous steps in the pipeline, the generation step is completely reliant on the LLM. The most popular LLMs are the ones being developed by OpenAI, Anthropic, Meta, Google, Microsoft and Mistral amongst other developers. 

We have built a simple retriever using FAISS and OpenAI embeddings and, we created a simple augmented prompt. Now we will use OpenAI’s latest model, GPT-4o-mini, to generate the response. To do this we will import the __ChatOpenAI__ library from langchain

In [14]:
from langchain_openai import ChatOpenAI


# Set up LLM 
llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2
)

messages=[("human",augmented_prompt)]

ai_msg = llm.invoke(messages)



# Extract the answer from the response object
answer=ai_msg.content

print(answer)


Australia won the World Cup.


# 5. RAG function

Let us build a function that will take three inputs - 
1. User Query
2. Location of the Vector Index (Knowledge base)
3. Index Name

And generate an answer along with the retrieved documents

#### RAG function

In [15]:
import re

# Function to clean text
def clean_text(text):
    # Replace non-breaking space with regular space
    text = text.replace('\xa0', ' ')
    
    # Remove any HTML tags (if any)
    text = re.sub(r'<[^>]+>', '', text)  # Removes HTML tags
    
    # Remove references in brackets (e.g., [7], [39])
    text = re.sub(r'\[.*?\]', '', text)  # Removes references inside square brackets
    
    # Remove extra spaces and newlines
    text = ' '.join(text.split())  # This will remove extra spaces and newline characters
    
    return text

def rag_function(query, db_path, index_name):
    embeddings=OpenAIEmbeddings(model="text-embedding-3-small")

    db=FAISS.load_local(folder_path=db_path, index_name=index_name, embeddings=embeddings, allow_dangerous_deserialization=True)

    retrieved_docs = db.similarity_search(query, k=2)

    retrieved_context=[clean_text(retrieved_docs[0].page_content + retrieved_docs[1].page_content)]


    augmented_prompt=f"""

    Given the context below answer the question.

    Question: {query} 

    Context : {retrieved_context}

    Remember to answer only based on the context provided and not from any other source. 

    If the question cannot be answered based on the provided context, say I don’t know.

    """

    llm = ChatOpenAI(
    model="gpt-4o-mini",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2
    )

    messages=[("human",augmented_prompt)]

    ai_msg = llm.invoke(messages)

    response=ai_msg.content

    return retrieved_context, response


Let's try sending our question to this function.

In [21]:
rag_function(query="Who won the world cup?", db_path="../../Assets/Data", index_name="CWC_index")

(['The tournament was contested by ten national teams, maintaining the same format used in 2019 . After six weeks of round-robin matches, India , South Africa , Australia , and New Zealand finished as the top four and qualified for the knockout stage. In the knockout stage, India and Australia beat New Zealand and South Africa, respectively, to advance to the final, played on 19 November at the Narendra Modi Stadium in Ahmedabad . Australia won the final by six wickets, winning their sixth Cricket World Cup title.The host India was the first team to qualify for the semi-finals after their 302-run win against Sri Lanka , their seventh successive win in the World Cup. India secured the top place amongst the semi-finalists after they beat South Africa by 243 runs on 5 November at Eden Gardens in Kolkata .'],
 'Australia won the world cup.')

Let's ask another one.

In [22]:
rag_function("What was Virat Kohli's achievement in the Cup?",db_path="../../Assets/Data", index_name="CWC_index")

(['The ICC announced its team of the tournament on 21 November 2023, with Virat Kohli being named as player of the tournament , and Rohit Sharma as captain of the team. Runs Player Team 765 Virat Kohli India 597 Rohit Sharma India 594 Quinton de Kock South Africa 578 Rachin Ravindra New Zealand 552 Daryl Mitchell New Zealand Source: ESPNcricinfo Most wickets'],
 'Virat Kohli was named player of the tournament in the ICC Cup.')

We can also ask a list of questions and see what the responses are

In [23]:
list_of_queries=['What was the outcome of the match between Australia and the Netherlands on 25 October 2023?',
 'What ongoing cricket competition is currently taking place that involves multiple international teams?',
 'What was the deadline for teams to finalize their 15-player squads for the 2023 Cricket World Cup?',
 "What were the key highlights of the 2023 ICC Men's Cricket World Cup?",
 'What were the key outcomes of the 2023 Cricket World Cup, including the final match results and notable player statistics?',
 'What years had Cricket World Cup finals and their host nations?',
 "Which org has managed the Cricket World Cup since '75?",
 "What was India's winning margin vs. S. Africa on Nov 5, 2023?",
 'What teams qualified for the semi-finals in the 2023 Cricket World Cup?']

In [24]:
for query in list_of_queries:
    print(f"Query:{query}")
    print(f"Response: {rag_function(query,db_path="../../Assets/Data", index_name="CWC_index")[1]}\n")


Query:What was the outcome of the match between Australia and the Netherlands on 25 October 2023?
Response: I don’t know.

Query:What ongoing cricket competition is currently taking place that involves multiple international teams?
Response: I don’t know.

Query:What was the deadline for teams to finalize their 15-player squads for the 2023 Cricket World Cup?
Response: The deadline for teams to finalize their 15-player squads for the 2023 Cricket World Cup was 28 September.

Query:What were the key highlights of the 2023 ICC Men's Cricket World Cup?
Response: The key highlights of the 2023 ICC Men's Cricket World Cup include:

- Dates: 5 October – 19 November 2023
- Host: India (first time as the sole host)
- Tournament format: Round-robin and knockout
- Participants: 10 teams
- Total matches played: 48
- Attendance: 1,250,307 (average of 26,048 per match)
- Champions: Australia (6th title)
- Runners-up: India
- Player of the series: Virat Kohli
- Most runs: Virat Kohli (765 runs)
- Mo

For some of the questions above, the response may be "I don't know". That is when the LLM can't find an answer in the retrieved context. In our augmentation step, we had asked the LLM to do so.

---

Is the RAG system that we have created generating the responses on the expected lines? Is the LLM still hallucinating? Before trying to improve the performance of the system we need to be able to measure and benchmark it. That is what we will do in chapter 5. We will look at the evaluation metrics and the popular benchmarks for RAG.

---

<img src="../../Assets/Images/profile_s.png" width=100> 

Hi! I'm Abhinav! I am an entrepreneur and Vice President of Artificial Intelligence at Yarnit. I have spent over 15 years consulting and leadership roles in data science, machine learning and AI. My current focus is in the applied Generative AI domain focussing on solving enterprise needs through contextual intelligence. I'm passionate about AI advancements constantly exploring emerging technologies to push the boundaries and create positive impacts in the world. Let’s build the future, together!

[If you haven't already, please subscribe to the MEAP of A Simple Guide to Retrieval Augmented Generation here](https://mng.bz/8wdg)

<a href="https://mng.bz/8wdg" target="_blank">
    <img src="../../Assets/Images/NewMEAPFooter.png" alt="New MEAP" style="width: 100%;" />
</a>

#### If you'd like to chat, I'd be very happy to connect

[![GitHub followers](https://img.shields.io/badge/Github-000000?style=for-the-badge&logo=github&logoColor=black&color=orange)](https://github.com/abhinav-kimothi)
[![LinkedIn](https://img.shields.io/badge/LinkedIn-000000?style=for-the-badge&logo=linkedin&logoColor=orange&color=black)](https://www.linkedin.com/comm/mynetwork/discovery-see-all?usecase=PEOPLE_FOLLOWS&followMember=abhinav-kimothi)
[![Medium](https://img.shields.io/badge/Medium-000000?style=for-the-badge&logo=medium&logoColor=black&color=orange)](https://medium.com/@abhinavkimothi)
[![Insta](https://img.shields.io/badge/Instagram-000000?style=for-the-badge&logo=instagram&logoColor=orange&color=black)](https://www.instagram.com/akaiworks/)
[![Mail](https://img.shields.io/badge/email-000000?style=for-the-badge&logo=gmail&logoColor=black&color=orange)](mailto:abhinav.kimothi.ds@gmail.com)
[![X](https://img.shields.io/badge/Follow-000000?style=for-the-badge&logo=X&logoColor=orange&color=black)](https://twitter.com/abhinav_kimothi)
[![Linktree](https://img.shields.io/badge/Linktree-000000?style=for-the-badge&logo=linktree&logoColor=black&color=orange)](https://linktr.ee/abhinavkimothi)
[![Gumroad](https://img.shields.io/badge/Gumroad-000000?style=for-the-badge&logo=gumroad&logoColor=orange&color=black)](https://abhinavkimothi.gumroad.com/)

---