## RAG Implementation With LangChain

RAG is a technique for augmenting LLM knowledge with additional data.

LLMs can reason about wide-ranging topics, but their knowledge is <i> to the public data up to a specific point in time that they were trained on </i>. If you want to build AI applications that can reason about private data or data introduced after a model’s cutoff date, you need to augment the knowledge of the model with the specific information it needs. The process of bringing the appropriate information and inserting it into the model prompt is known as <b> Retrieval Augmented Generation (RAG) </b>.

#### LangChain

LangChain is a framework for developing applications powered by language models. It enables applications that:

Are context-aware: connect a language model to sources of context (prompt instructions, few shot examples, content to ground its response in, etc.)
Reason: rely on a language model to reason (about how to answer based on provided context, what actions to take, etc.)

#### Build a simple Chat Bot using Lang Chain

We will rely heavily on the LangChain library to bring together the different components needed for the chatbot.

<b>Step-1</b>

Run the following command once to set up the OpenAI key as enviornment variable

In [2]:
# Setting up the openAI key
import getpass
import os
os.environ["OPENAI_API_KEY"] = getpass.getpass()

Initialise Chat GPT 3.5 object to be used for generating responses

In [4]:
# NOTE : You need an API Key from OpenAI to use this functionality
import os
from langchain.chat_models import ChatOpenAI

# Creating an OpenAI object
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") 

llm_chat = ChatOpenAI(
    temperature = 0,
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model='gpt-3.5-turbo'
)

<b> Step -2 </b>

Chats with *OpenAI's gpt-3.5-turbo and gpt-4 chat models* are typically structured (in plain text) like this:

System: You are a helpful assistant.

User: Hi AI, how are you today?

Assistant: I'm great thank you. How can I help you?

User: I'd like to understand string theory.

Assistant:
The final "Assistant:" without a response is what would prompt the model to continue the conversation. In the official OpenAI ChatCompletion endpoint these would be passed to the model in a format like:



[
    
    {"role": "system", "content": "You are a helpful assistant."},

    {"role": "user", "content": "Hi AI, how are you today?"},
    
    {"role": "assistant", "content": "I'm great thank you. How can I help you?"}
    
    {"role": "user", "content": "I'd like to understand string theory."}

]

LangChain uses a slightly different format. The message objects like so:

In [5]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
   
]

<b> Step -3 </b>
Send message to ChatGPT to get a response

In [6]:
response = llm_chat.invoke(messages)
print(response.content)

Hello! As an AI, I don't have feelings, but I'm here and ready to assist you. How can I help you today?


Because response is just another AIMessage object, we can append it to messages, add another HumanMessage, and generate the next response in the conversation

In [7]:
# add latest AI response to messages
messages.append(response)

# now create a new user prompt
prompt = HumanMessage(
    content="I would like to know about Australia day")

# add to messages
messages.append(prompt)

# send to chat-gpt
response = llm_chat.invoke(messages)

print(response.content)

Australia Day is a national public holiday in Australia that is celebrated annually on January 26th. It commemorates the arrival of the First Fleet of British ships in 1788, which marked the beginning of European settlement in Australia.

Australia Day is often celebrated with various events and activities, including community gatherings, barbecues, fireworks, concerts, and citizenship ceremonies. It is a day for Australians to come together and reflect on the country's history, culture, and achievements.

However, it is important to note that Australia Day is also a day of controversy and debate. For some Indigenous Australians, it represents the invasion and colonization of their lands, and they refer to it as "Invasion Day" or "Survival Day." There have been ongoing discussions about changing the date of Australia Day to a more inclusive day that acknowledges the history and culture of Indigenous Australians.

Overall, Australia Day is a significant day in the Australian calendar, b

Now changing persona of ChatGPT so it will translate everything in Urdu

In [8]:
messages = [
    SystemMessage(content="You are a helpful assistant that translates from english to urdu."),
    HumanMessage(content="Hi AI, how are you today?"),
]

response = llm_chat.invoke(messages)
print(response.content)

مرحبا! میں بہترین ہوں، شکریہ۔ آپ کیسے ہیں؟


In [9]:
# add latest AI response to messages
messages.append(response)

# now create a new user prompt
prompt = HumanMessage(
    content="I would like to know about Australia day")

# add to messages
messages.append(prompt)

# send to chat-gpt
response = llm_chat.invoke(messages)

print(response.content)

آسٹریلیا ڈے کے بارے میں معلومات درکار ہے۔

آسٹریلیا ڈے 26 جنوری کو منایا جاتا ہے اور یہ آسٹریلیا کا قومی دن ہے۔ یہ دن آسٹریلیا کے بنیادی تشکیلیں کو یاد کرنے اور منانے کا موقع فراہم کرتا ہے۔

اس دن کو مختلف طریقوں سے منایا جاتا ہے جو آسٹریلیا کی تاریخ، ثقافت اور تراث کو نمایاں کرتے ہیں۔ یہ دن عوامی تقریبات، جشنوں، میلوں، موسیقی کے اجراءت اور آتش بازی کے ساتھ منایا جاتا ہے۔

آسٹریلیا ڈے کو مختلف طریقوں سے منایا جاتا ہے۔ یہ دن عوامی تقریبات، جشنوں، میلوں، موسیقی کے اجراءت اور آتش بازی کے ساتھ منایا جاتا ہے۔

آسٹریلیا ڈے کو مختلف طریقوں سے منایا جاتا ہے۔ یہ دن عوامی تقریبات، جشنوں، میلوں، موسیقی کے اجراءت اور آتش بازی کے ساتھ منایا جاتا ہے۔

آسٹریلیا ڈے کو مختلف طریقوں سے منایا جاتا ہے۔ یہ دن عوامی تقریبات، جشنوں، میلوں، موسیقی کے اجراءت اور آتش بازی کے ساتھ منایا جاتا ہے۔

آسٹریلیا ڈے کو مختلف طریقوں سے منایا جاتا ہے۔ یہ دن عوامی تقریبات، جشنوں، میلوں، موسیقی کے اجراءت اور آتش بازی کے ساتھ منایا جاتا ہے۔

آسٹریلیا ڈے کو مختلف طریقوں سے منایا جاتا ہے۔ یہ دن عوامی تقریبات، جشنوں، میلوں، موس

<b> Step -4 Dealing with Hallucinations </b>

The knowledge of LLMs can be limited. The reason for this is that LLMs learn all they know during training. An LLM essentially compresses the "world" as seen in the training data into the internal parameters of the model. We call this knowledge the parametric knowledge of the model.

By default, LLMs have no access to the external world.

The result of this is very clear when we ask LLMs about more recent information, like about the new (and very popular) Llama 2 LLM.

In [11]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="I want to know about Llama 2"),
]

# send to chat-gpt
response = llm_chat.invoke(messages)

print(response.content)

Llama 2 is a fictional character from the video game "Fortnite." It is a part of the Llama Crew Set and was introduced in Season 3 of the game. Llama 2 is a variation of the original Llama character, known for its vibrant colors and unique design.

In the game, Llama 2 is often found as a loot box that players can open to obtain various items such as weapons, resources, and materials. It is highly sought after by players due to the valuable loot it contains.

Llama 2 has become an iconic symbol in the Fortnite community and is often associated with luck and good fortune. Its appearance in the game is always a cause for excitement among players, as it can significantly enhance their chances of success.

Overall, Llama 2 is a beloved character in Fortnite, known for its distinctive appearance and the valuable loot it provides to players.


To tackle this issue, we feeding knowledge into LLMs in another way. It is called source knowledge and it refers to any information fed into the LLM via the prompt. We can try that with the Llama 2 question. We can take a description of this object from the Llama 2 source page

In [16]:
llama2_information = [
    "Code Llama is a code generation model built on Llama 2, trained on 500B tokens of code. It supports common programming languages being used today, including Python, C++, Java, PHP, Typescript (Javascript), C#, and Bash.",
    "In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine-tuned LLMs, called Llama 2-Chat, are optimized for dialogue use cases. Our models outperform open-source chat models on most benchmarks we tested, and based on our human evaluations for helpfulness and safety, may be a suitable substitute for closed-source models. We provide a detailed description of our approach to fine-tuning and safety improvements of Llama 2-Chat in order to enable the community to build on our work and contribute to the responsible development of LLMs."
]

source_knowledge = "\n".join(llama2_information)

We can feed this additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query.

In [13]:
query = "Can you tell me about the llama 2 ?"

augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""

Now we feed this additional information to model

In [15]:
# create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = llm_chat.invoke(messages)
print(res.content)

Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) developed and released by the creators. These models range in scale from 7 billion to 70 billion parameters. One specific variant of the fine-tuned LLMs is called Llama 2-Chat, which is optimized for dialogue use cases. The Llama 2 models have been found to outperform open-source chat models on most benchmarks tested. Additionally, based on human evaluations for helpfulness and safety, Llama 2-Chat may be considered a suitable substitute for closed-source models. The creators have provided a detailed description of their approach to fine-tuning and safety improvements of Llama 2-Chat, with the intention of enabling the community to build on their work and contribute to the responsible development of large language models.


## Building RAG Chatbots with LangChain

In this example, we will build an AI chatbot from start-to-finish so that it can answer automatically about Llama 2 instead of providing the information manually. We will be using LangChain,HuggingFace embeddings, OpenAI, and vector DB, to build a chatbot capable of learning from the external world using Retrieval Augmented Generation (RAG).

We will use two techniques to build our chatbot:

1- Scrap a dataset sourced from the Llama 2 ArXiv paper and other related papers to help our chatbot answer questions about the latest and greatest in the world of GenAI.

2- Scrap multiple webpages to help our chatbot answer questions about the latest and greatest in the world of GenAI.

### Chatbot (no RAG)

We will be relying heavily on the LangChain library to bring together the different components needed for our chatbot. To begin, we'll create a simple chatbot without any retrieval augmentation. We do this by initializing a ChatOpenAI object. For this we do need an OpenAI API key.

In [None]:
# Setting up the openAI key
import getpass
import os
os.environ["OPENAI_API_KEY"] = getpass.getpass()

In [None]:
import os
from langchain.chat_models import ChatOpenAI

# Creating an OpenAI object
os.environ["OPENAI_API_KEY"] = os.getenv("OPENAI_API_KEY") 
chat = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model='gpt-3.5-turbo'
)

Chats with *OpenAI's gpt-3.5-turbo and gpt-4 chat models* are typically structured (in plain text) like this:

System: You are a helpful assistant.

User: Hi AI, how are you today?

Assistant: I'm great thank you. How can I help you?

User: I'd like to understand string theory.

Assistant:
The final "Assistant:" without a response is what would prompt the model to continue the conversation. In the official OpenAI ChatCompletion endpoint these would be passed to the model in a format like:



[
    
    {"role": "system", "content": "You are a helpful assistant."},

    {"role": "user", "content": "Hi AI, how are you today?"},
    
    {"role": "assistant", "content": "I'm great thank you. How can I help you?"}
    
    {"role": "user", "content": "I'd like to understand string theory."}

]

In LangChain there is a slightly different format. We use three message objects like so:

In [None]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
   
]

In [None]:
res = chat(messages)
print(res.content)

Because res is just another AIMessage object, we can append it to messages, add another HumanMessage, and generate the next response in the conversation

In [None]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="I would like to understand string theory")

# add to messages
messages.append(prompt)

# send to chat-gpt
res = chat(messages)

print(res.content)

### Dealing with Hallucinations

The knowledge of LLMs can be limited. The reason for this is that LLMs learn all they know during training. An LLM essentially compresses the "world" as seen in the training data into the internal parameters of the model. We call this knowledge the parametric knowledge of the model.

By default, LLMs have no access to the external world.

The result of this is very clear when we ask LLMs about more recent information, like about the new (and very popular) Llama 2 LLM.

In [None]:
# now create a new user prompt
prompt = HumanMessage(
    content="What is so special about Llama 2?"
)

# add to messages
messages = [
    SystemMessage(content="You are a helpful assistant."),
]
messages.append(prompt)

# send to OpenAI
res = chat(messages)
print(res.content)

As we can see that the model answer is totally wrong. Llama 2 is not a game. Lets try model response for LangChain

In [None]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Can you tell me about the LLMChain in LangChain?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [None]:
print(res.content)

Our chatbot can no longer help us, it doesn't contain the information we need to answer the question. It was very clear from this answer that the LLM doesn't know the informaiton, but sometimes an LLM may respond like it does know the answer — and this can be very hard to detect.

##### Adding source knowledge

There is another way of feeding knowledge into LLMs. It is called source knowledge and it refers to any information fed into the LLM via the prompt. We can try that with the LLMChain question. We can take a description of this object from the LangChain documentation.

In [None]:
llmchain_information = [
    "A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.",
    "Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.",
    "LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those types of applications."
]

source_knowledge = "\n".join(llmchain_information)

We can feed this additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query.

In [None]:
query = "Can you tell me about the LLMChain in LangChain?"

augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""

Now feed this information to our chat bot

In [None]:
# create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)
print(res.content)

The quality of this answer is phenomenal. This is made possible due to the augmention of our query with external knowledge (source knowledge). We can use the concept of vector databases to get this information automatically.

### Importing the Data

In this task, we will be importing our data. We will be using the Hugging Face Datasets library to load our data. Specifically, we will be using the <b>"jamescalam/llama-2-arxiv-papers"</b> dataset. This dataset contains a collection of ArXiv papers which will serve as the external knowledge base for our chatbot.

In [None]:
from datasets import load_dataset
dataset = load_dataset(
    "jamescalam/llama-2-arxiv-papers-chunked",
    split="train"
)


In [None]:
dataset[0]

##### Dataset Overview

The dataset we are using is sourced from the Llama 2 ArXiv papers. It is a collection of academic papers from ArXiv, a repository of electronic preprints approved for publication after moderation. Each entry in the dataset represents a "chunk" of text from these papers.

Because most Large Language Models (LLMs) only contain knowledge of the world as it was during training, they cannot answer our questions about Llama 2 — at least not without this data.

### Building the Knowledge Base

We now have a dataset that can serve as our chatbot knowledge base. Our next task is to transform that dataset into the knowledge base that our chatbot can use. To do this we must use an embedding model and vector database.

Using HuggingFace model to generate embeddings like so:

In [None]:
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.vectorstores import DocArrayInMemorySearch

model_id = 'sentence-transformers/all-MiniLM-L6-v2'
model_kwargs = {'device': 'cpu'}
hf_embedding_model = HuggingFaceEmbeddings(
    model_name=model_id,
    model_kwargs=model_kwargs
)

texts = [
    'this is the first chunk of text',
    'my name is xyz'
]

res = hf_embedding_model.embed_documents(texts)
len(res), len(res[0])

In [None]:
print(res[0])

From this we get two (aligning to our two chunks of text) 384-dimensional embeddings.

We're now ready to embed and index all our our data! We do this by looping through our dataset and embedding and inserting everything in batches.

In [None]:
from langchain.vectorstores import Chroma
import chromadb
# Define a vector data base client
chroma_client = chromadb.Client()
#chroma_client.delete_collection(name="my_collections2")

In [None]:
from tqdm.auto import tqdm  # for progress bar
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from chromadb.db.base import UniqueConstraintError
from chromadb.utils import embedding_functions
import chromadb

chroma_client = chromadb.PersistentClient(path="db/")  # data stored in 'db' folder
em = embedding_functions.SentenceTransformerEmbeddingFunction("sentence-transformers/all-MiniLM-L6-v2")


In [None]:
chroma_client.delete_collection("lang_chain_1")

In [None]:
collection = chroma_client.create_collection(name="lang_chain_1",embedding_function=em)

data = dataset.to_pandas()  # this makes it easier to iterate over the dataset
batch_size = 100
for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i+batch_size)
    # get batch of data
    batch = data.iloc[i:i_end]
    # generate unique ids for each chunk
    ids = [f"{x['doi']}-{x['chunk-id']}" for i, x in batch.iterrows()]
    doc_ids = [f"{x['doi']}" for i, x in batch.iterrows()]
    # get text to embed
    texts = [x['chunk'] for _, x in batch.iterrows()]
    # embed text
    embeds = hf_embedding.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [
        {'text': x['chunk'],
         'source': x['source'],
         'title': x['title']} for i, x in batch.iterrows()
    ]

    # Adding collections
    collection.add(
                documents= texts,
                metadatas=metadata,
                embeddings=embeds,
                ids=ids
                )

Query the most relevant result

In [None]:
# query the top 2 results
results = collection.query(
    query_texts='What is so special about Llama 2',
    n_results=5
)

print(results['documents'][0][2])


### Retrieval Augmented Generation
We've built a fully-fledged knowledge base. Now it's time to connect that knowledge base to our chatbot. To do that we'll be diving back into LangChain and reusing our template prompt from earlier.

To use LangChain here we need to load the LangChain abstraction for a vector index, called a vectorstore. We pass in our vector index to initialize the object.

We return a lot of text here and it's not that clear what we need or what is relevant. Fortunately, our LLM will be able to parse this information much faster than us. All we need is to connect the output from our vectorstore to our chat chatbot. To do that we can use the same logic as we used earlier.

In [None]:
def augment_prompt(query: str):
    # get top 5 results from knowledge base
    results = collection.query(
            query_texts=query,
            n_results=5
        )

    source_knowledge= ""
    # get the text from the results
    for i in range(0, len(results['documents'])):
        source_knowledge+= "\n".join(results['documents'][i])
    
    # feed into an augmented prompt
    augmented_prompt = f"""Using the contexts below, answer the query.

    Contexts:
    {source_knowledge}

    Query: {query}"""
    
    return augmented_prompt

In [None]:
# create a new user prompt
prompt = HumanMessage(
    content=augment_prompt(query)
)
# add to messages
messages.append(prompt)

res = chat(messages)

print(res.content)

We can continue with more Llama 2 questions. Let's try without RAG first:



In [None]:
prompt = HumanMessage(
    content="what safety measures were used in the development of llama 2?"
)

res = chat(messages + [prompt])
print(res.content)

The chatbot is able to respond about Llama 2 thanks to it's conversational history stored in messages. However, it doesn't know anything about the safety measures themselves as we have not provided it with that information via the RAG pipeline. Let's try again but with RAG

In [None]:
prompt = HumanMessage(
    content=augment_prompt(
        "what safety measures were used in the development of llama 2?"
    )
)

res = chat(messages + [prompt])
print(res.content)