# RAG with LangChain

Objective: Building a knowledge base chatbot capable of learning from the external world using **R**etrieval **A**ugmented **G**eneration (RAG).

Stack: LangChain, OpenAI, and Pinecone vector DB

Data: Llama 2 ArXiv paper and other related papers.


## Install liberties

In [None]:
!pip install -qU \
    langchain==0.0.292 \
    openai==0.28.0 \
    datasets==2.10.1 \
    pinecone-client==2.2.4 \
    tiktoken==0.5.1


## First, lets build a simple chat without RAG capabilities.

In [None]:
import os
from langchain.chat_models import ChatOpenAI

os.environ["OPENAI_API_KEY"] = "******"

chat = ChatOpenAI(
    openai_api_key=os.environ["OPENAI_API_KEY"],
    model='gpt-3.5-turbo'
)


#### LangChain chat format _message_ objects:

In [None]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I'm great thank you. How can I help you?"),
    HumanMessage(content="I'd like to understand string theory.")
]

#### We generate the next response from the AI by passing these messages to the `ChatOpenAI` object.

In [None]:
res = chat(messages)
res

AIMessage(content="String theory is a theoretical framework in physics that aims to explain the fundamental nature of particles and the forces between them. It suggests that at the smallest scales of existence, particles are not point-like but instead tiny, vibrating strings.\n\nHere are some key points to help you understand string theory:\n\n1. Basic Idea: According to string theory, the fundamental building blocks of the universe are not particles but tiny strings, similar to the strings of a musical instrument. These strings can vibrate at different frequencies, and the different vibrational patterns give rise to different particles and their properties.\n\n2. Extra Dimensions: String theory requires extra dimensions beyond the three spatial dimensions (length, width, and height) that we experience in everyday life. These extra dimensions are compactified or curled up so small that we don't directly perceive them.\n\n3. Unifying Theory: One of the main motivations behind string the

####In response we get another AI message object. We can print it more clearly like so:

In [None]:
print(res.content)

String theory is a theoretical framework in physics that aims to explain the fundamental nature of particles and the forces between them. It suggests that at the smallest scales of existence, particles are not point-like but instead tiny, vibrating strings.

Here are some key points to help you understand string theory:

1. Basic Idea: According to string theory, the fundamental building blocks of the universe are not particles but tiny strings, similar to the strings of a musical instrument. These strings can vibrate at different frequencies, and the different vibrational patterns give rise to different particles and their properties.

2. Extra Dimensions: String theory requires extra dimensions beyond the three spatial dimensions (length, width, and height) that we experience in everyday life. These extra dimensions are compactified or curled up so small that we don't directly perceive them.

3. Unifying Theory: One of the main motivations behind string theory is its potential to uni

####Because `res` is just another `AIMessage` object, we can append it to `messages`, add another `HumanMessage`, and generate the next response in the conversation.

In [None]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Why do physicists believe it can produce a 'unified theory'?"
)
# add to messages
messages.append(prompt)

# send to chat-gpt
res = chat(messages)

print(res.content)

Physicists believe that string theory has the potential to produce a unified theory because it incorporates both quantum mechanics and general relativity, two foundational theories in physics that have been highly successful in their respective domains but are incompatible with each other.

1. Resolving Incompatibility: Quantum mechanics describes the behavior of particles on extremely small scales, such as atoms and subatomic particles, while general relativity describes the behavior of gravity on large scales, such as the motion of planets and the structure of the universe. However, when physicists try to combine these two theories, they encounter mathematical inconsistencies and infinities. String theory attempts to overcome these issues by providing a framework that reconciles quantum mechanics and general relativity.

2. Theory of Everything: A unified theory, often referred to as a "theory of everything," would describe all fundamental forces and particles in a consistent and coh

### Dealing with Hallucinations

By default, LLMs have no access to the external world.

LLM precive the world as seen in the training data, this kind of knowledge called the _parametric knowledge_.

So if we'll ask our LLMs about up-to-date topics which it never trained on (like Llama 2 in our case), it'll be strugeeling to provide us with an appropriate response.

Let's give it a shot and see what's happening.

In [None]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="What is so special about Llama 2?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [None]:
print(res.content)

I'm sorry, I am not aware of any specific reference or context to "Llama 2" that would allow me to provide an answer. Could you please provide more information or clarify your question?


As mentioned above, since our chatbot doesn contain the information we need to answer the question, it can no longer help us.

#### Let's see one way of feeding knowledge into LLMs. It is called _source knowledge_ and it refers to any information fed into the LLM via the prompt.

In [None]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Can you tell me about the LLMChain in LangChain?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [None]:
print(res.content)

I apologize, but I am not familiar with specific details about a technology called "LLMChain" in "LangChain." It is possible that these terms are specific to a particular project or system that is not widely known or recognized. Without further information, it is difficult for me to provide any specific details or insights. If you can provide more context or background information, I will do my best to assist you.


Let's take a description of this object from the LangChain documentation and try using that with the LLMChain question.

In [None]:
llmchain_information = [
    "A LLMChain is the most common type of chain. It consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser. This chain takes multiple input variables, uses the PromptTemplate to format them into a prompt. It then passes that to the model. Finally, it uses the OutputParser (if provided) to parse the output of the LLM into a final format.",
    "Chains is an incredibly generic concept which returns to a sequence of modular components (or other chains) combined in a particular way to accomplish a common use case.",
    "LangChain is a framework for developing applications powered by language models. We believe that the most powerful and differentiated applications will not only call out to a language model via an api, but will also: (1) Be data-aware: connect a language model to other sources of data, (2) Be agentic: Allow a language model to interact with its environment. As such, the LangChain framework is designed with the objective in mind to enable those types of applications."
]

source_knowledge = "\n".join(llmchain_information)

We can feed this additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query.

In [None]:
query = "Can you tell me about the LLMChain in LangChain?"

augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""

Now we feed this into our chatbot as we were before.

In [None]:
# create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [None]:
print(res.content)

LLMChain is a type of chain within the LangChain framework. Chains, in general, refer to a sequence of modular components combined in a specific way to achieve a common purpose. In the context of LangChain, a chain consists of a PromptTemplate, a model (either an LLM or a ChatModel), and an optional output parser.

Specifically, an LLMChain is the most common type of chain in LangChain. It takes multiple input variables, uses the PromptTemplate to format them into a prompt, and passes it to the language model (LLM). The language model generates a response based on the prompt. Additionally, the LLMChain may include an optional output parser, which is used to parse the output of the language model into a final format.

The purpose of LangChain as a framework is to develop applications powered by language models. It aims to go beyond simply calling out to a language model via an API. LangChain emphasizes two key aspects: being data-aware and being agentic.

Being data-aware means that Lan

Thanks to the idea of augmented query with external knowledge, it's possible to answer questions outside of the model original knowledge boundaries.
And as we can see, the answer quality is fantastic.

Now we left we with question - how do we get this information in the first place?
The answer : Pinecone and vector databases.

First, let's get a dataset.

### Importing the Data

We will be using the Hugging Face Datasets library to load our data.  
The dataset is : `"jamescalam/llama-2-arxiv-papers"`.
Which contains Llama 2 ArXiv papers, a collection of academic papers from ArXiv,
a repository of electronic preprints approved for publication after moderation
This will serve as the external knowledge base for our chatbot.

In [None]:
from datasets import load_dataset

dataset = load_dataset(
    "jamescalam/llama-2-arxiv-papers-chunked",
    split="train"
)

dataset

Downloading readme:   0%|          | 0.00/409 [00:00<?, ?B/s]

Downloading and preparing dataset json/jamescalam--llama-2-arxiv-papers-chunked to /root/.cache/huggingface/datasets/jamescalam___json/jamescalam--llama-2-arxiv-papers-chunked-ea255a807f3039a6/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51...


Downloading data files:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading data:   0%|          | 0.00/14.4M [00:00<?, ?B/s]

Extracting data files:   0%|          | 0/1 [00:00<?, ?it/s]

Generating train split: 0 examples [00:00, ? examples/s]

Dataset json downloaded and prepared to /root/.cache/huggingface/datasets/jamescalam___json/jamescalam--llama-2-arxiv-papers-chunked-ea255a807f3039a6/0.0.0/0f7e3662623656454fcd2b650f34e886a7db4b9104504885bd462096cc7a9f51. Subsequent calls will reuse this data.


Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 4838
})

In [None]:
dataset[0]

{'doi': '1102.0183',
 'chunk-id': '0',
 'chunk': 'High-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nTechnical Report No. IDSIA-01-11\nJanuary 2011\nIDSIA / USI-SUPSI\nDalle Molle Institute for Arti\x0ccial Intelligence\nGalleria 2, 6928 Manno, Switzerland\nIDSIA is a joint institute of both University of Lugano (USI) and University of Applied Sciences of Southern Switzerland (SUPSI),\nand was founded in 1988 by the Dalle Molle Foundation which promoted quality of life.\nThis work was partially supported by the Swiss Commission for Technology and Innovation (CTI), Project n. 9688.1 IFF:\nIntelligent Fill in Form.arXiv:1102.0183v1  [cs.AI]  1 Feb 2011\nTechnical Report No. IDSIA-01-11 1\nHigh-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nJanuary 2011\nAbs

### Building the Knowledge Base

Now after we get data that will serve our knowledge base, we need transform the database into the knowledge base that our chatbot can use.
For that, we need to use an embedding model and vector database.


We begin by initializing our connection to Pinecone.

In [None]:
import pinecone

# get API key from app.pinecone.io and environment from console
pinecone.init(
    api_key=os.environ.get('PINECONE_API_KEY') or '921c6073-f26c-4354-b567-092b515d6699',
    environment=os.environ.get('PINECONE_ENVIRONMENT') or 'gcp-starter'
)

Then we initialize the index. We will be using OpenAI's `text-embedding-ada-002` model for creating the embeddings, so we set the `dimension` to `1536`.

In [None]:
import time

index_name = 'llama-2-rag'

if index_name not in pinecone.list_indexes():
    pinecone.create_index(
        index_name,
        dimension=1536,
        metric='cosine'
    )
    # wait for index to finish initialization
    while not pinecone.describe_index(index_name).status['ready']:
        time.sleep(1)

index = pinecone.Index(index_name)

Then we connect to the index:

In [None]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

Our index is now ready but it's empty. It is a vector index, so it needs vectors.
To create these vector embeddings we will OpenAI's `text-embedding-ada-002` model which can be accessed via LangChain:

In [None]:
from langchain.embeddings.openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-ada-002")

Let's embed and index all our our data.
We do this by looping through our dataset- embedding and inserting all in batches.

In [None]:
from tqdm.auto import tqdm  # for progress bar

data = dataset.to_pandas()  # this makes it easier to iterate over the dataset

batch_size = 100

for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i+batch_size)
    # get batch of data
    batch = data.iloc[i:i_end]
    # generate unique ids for each chunk
    ids = [f"{x['doi']}-{x['chunk-id']}" for i, x in batch.iterrows()]
    # get text to embed
    texts = [x['chunk'] for _, x in batch.iterrows()]
    # embed text
    embeds = embed_model.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [
        {'text': x['chunk'],
         'source': x['source'],
         'title': x['title']} for i, x in batch.iterrows()
    ]
    # add to Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))

check the vector is populates:

In [None]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {'': {'vector_count': 4838}},
 'total_vector_count': 4838}

#### Retrieval Augmented Generation

Now it's time to connect that knowledge base to our chatbot.
For that, we'll reusing our LangChain template prompt from earlier.

 We pass in our vector `index` to initialize the object.

In [None]:
from langchain.vectorstores import Pinecone

text_field = "text"  # the metadata field that contains our text

# initialize the vector store object
vectorstore = Pinecone(
    index, embed_model.embed_query, text_field
)

Query the index using `vectorstore` and see what we got.

In [None]:
query = "What is so special about Llama 2?"

vectorstore.similarity_search(query, k=3)

[Document(page_content='Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\nSergey Edunov Thomas Scialom\x03\nGenAI, Meta\nAbstract\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\nmodels outperform open-source chat models on most benchmarks we tested, and based on\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety', metadata={'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tun

A lot of text returned and it's not clear what is relevant and what is not.

Fortunately, our LLM is capable to shape things up.

All we need is to connect the `vectorstore` output to the `chat` chatbot. For that, we'll use the same logic from earlier.

In [None]:
def augment_prompt(query: str):
    # get top 3 results from knowledge base
    results = vectorstore.similarity_search(query, k=3)
    # get the text from the results
    source_knowledge = "\n".join([x.page_content for x in results])
    # feed into an augmented prompt
    augmented_prompt = f""" Using the contexts below, answer the query.

    Contexts:
    {source_knowledge}

    Query: {query} """
    return augmented_prompt

Let's pass it onto our chat model to see how it performs.

In [None]:
# create a new user prompt
prompt = HumanMessage(
    content=augment_prompt(query)
)
# add to messages
messages.append(prompt)

res = chat(messages)

print(res.content)

Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) that range in scale from 7 billion to 70 billion parameters. These LLMs, such as L/l.sc/a.sc/m.sc/a.sc/t.sc and L/l.sc/a.sc/m.sc/a.sc/t.sc-C/h.sc/a.sc/t.sc, are specifically optimized for dialogue use cases.

The special aspect of Llama 2 is that its fine-tuned LLMs outperform open-source chat models on various benchmarks, demonstrating superior performance. In fact, based on humane evaluations for helpfulness and safety, Llama 2 models are considered as potential substitutes for closed-source models. Closed-source models like ChatGPT, BARD, and Claude are heavily fine-tuned to align with human preferences, enhancing usability and safety.

The development and release of Llama 2 contribute to the progress of AI alignment research, as it provides transparent and reproducible approaches to fine-tuning and safety. This is in contrast to closed-source models, which often lack transparency and hinder community 

Now Let's try _without_ RAG:

In [None]:
prompt = HumanMessage(
    content="what safety measures were used in the development of llama 2?"
)

res = chat(messages + [prompt])
print(res.content)

According to the provided context, the paper mentions that they provide a detailed description of their approach to fine-tuning and safety, similar to other closed-source models like ChatGPT, BARD, and Claude. However, the specific safety measures used in the development of Llama 2 are not mentioned in the given context. To obtain more detailed information about the safety measures employed in Llama 2, it would be necessary to refer to the original paper or additional sources related to Llama 2.


 Thanks to it's conversational history stored in `messages`, The chatbot is able to respond about Llama2.

 However, it doesn't know anything about the safety measures themselves as we have not provided it with that information via the RAG pipeline. Let's try again but with RAG.

In [None]:
prompt = HumanMessage(
    content=augment_prompt(
        "what safety measures were used in the development of llama 2?"
    )
)

res = chat(messages + [prompt])
print(res.content)

The safety measures used in the development of Llama 2 include safety-specific data annotation and tuning, conducting red-teaming, and employing iterative evaluations. These measures were taken to increase the safety of the models and ensure responsible development. The paper also provides a thorough description of the fine-tuning methodology and approach to improving LLM safety. By sharing these details and being open about the process, the aim is to enable the community to reproduce fine-tuned LLMs and continue to improve their safety, promoting responsible development in the field.


Now we get the right information, and if we add the prompt to the message history, our LLM will know how to provide us with an answer.