# Building RAG Chatbots with LangChain

In this example, we'll work on building an AI chatbot from start-to-finish. We will be using LangChain, OpenAI, and Pinecone vector DB, to build a chatbot capable of learning from the external world using **R**etrieval **A**ugmented **G**eneration (RAG).

We will be using a dataset sourced from the Llama 2 ArXiv paper and other related papers to help our chatbot answer questions about the latest and greatest in the world of GenAI.

By the end of the example we'll have a functioning chatbot and RAG pipeline that can hold a conversation and provide informative responses based on a knowledge base.

### Before you begin

You'll need to get an [OpenAI API key](https://platform.openai.com/account/api-keys) and [Pinecone API key](https://app.pinecone.io). If you have completed previous assignments and followed along in class, you will have these already stored in a .env file. 

In [1]:
from dotenv import load_dotenv, find_dotenv

load_dotenv(find_dotenv())

import os

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
PINECONE_API_KEY = os.getenv("PINECONE_API_KEY")
PINECONE_INDEX_NAME = os.getenv("PINECONE_INDEX_NAME")
PINECONE_ENVIRONMENT = os.getenv("PINECONE_ENVIRONMENT")

### Prerequisites

Before we start building our chatbot, we need to install some Python libraries. Here's a brief overview of what each library does:

- **langchain**: This is a [library for Generative AI](https://python.langchain.com/docs/get_started/introduction). We'll use it to chain together different language models and components for our chatbot.
- **openai**: This is the official [OpenAI Python client](https://github.com/openai/openai-python). We'll use it to interact with the OpenAI API and generate responses for our chatbot.
- **datasets**: This library provides a [vast array of datasets](https://pypi.org/project/datasets/) for machine learning. We'll use it to load our knowledge base for the chatbot.
- **pinecone-client**: This is the official Pinecone Python client. We'll use it to interact with the Pinecone API and store our chatbot's knowledge base in a vector database.
- **tiktoken**: [tiktoken](https://github.com/openai/tiktoken) is a fast BPE ([Byte Pair Encoding](https://en.wikipedia.org/wiki/Byte_pair_encoding) tokeniser for use with OpenAI's models. It is useful for when we wish to know how many tokens our message would be in the model's vocabulary.



You can install these libraries using conda like so:

```bash
conda install -c conda-forge \
    langchain \ 
    openai \
    datasets \
    tiktoken
```

Unfortunately, pinecone-client is not yet available on conda-forge, so we will need to resort to using pip

```bash
pip install pinecone-client
```

### Building a Chatbot (no RAG)

We will be relying heavily on the LangChain library to bring together the different components needed for our chatbot. To begin, we'll create a simple chatbot without any retrieval augmentation. We do this by initializing a `ChatOpenAI` object. For this we do need an [OpenAI API key](https://platform.openai.com/account/api-keys).

In [2]:
import os
from langchain.chat_models import ChatOpenAI

chat = ChatOpenAI(
    openai_api_key= OPENAI_API_KEY,
    model='gpt-3.5-turbo'
)

Chats with OpenAI's `gpt-3.5-turbo` and `gpt-4` chat models are typically [structured (in plain text) like this](https://help.openai.com/en/articles/7042661-chatgpt-api-transition-guide):

```
System: You are a helpful assistant.

User: Hi AI, how are you today?

Assistant: I'm great thank you. How can I help you?

User: I'd like to understand string theory.

Assistant:
```

The final `"Assistant:"` without a response is what would prompt the model to continue the conversation. In the official OpenAI `ChatCompletion` endpoint these would be passed to the model in a format like:

```python
[
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Hi AI, how are you today?"},
    {"role": "assistant", "content": "I'm great thank you. How can I help you?"}
    {"role": "user", "content": "I'd like to understand string theory."}
]
```

In LangChain there is a slightly different format. We use three _message_ objects like so:

In [60]:
from langchain.schema import (
    SystemMessage,
    HumanMessage,
    AIMessage
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    HumanMessage(content="Hi AI, how are you today?"),
    AIMessage(content="I'm great thank you. How can I help you?"),
    HumanMessage(content="I'd like to understand string theory.")
]

The format is very similar, we're just swapped the role of `"user"` for `HumanMessage`, and the role of `"assistant"` for `AIMessage`.

> NOTE: You may ask,  why LangChain would both changing these? Keep in mind that LangChain is remaining agnostic as to which is the best service (be it LLM's, Vector Stores, etc.). Therefore, LangChain 'wraps' the messages in a `HumanMessage` or `AIMessage` object to indicate which is which. This allows us to easily swap out the AI service without having to change the message format. LangChain would handle any required translation to the new service's format.

```python
[
    HumanMessage("You are a helpful assistant."),
    HumanMessage("Hi AI, how are you today?"),
    AIMessage("I'm great thank you. How can I help you?")
    HumanMessage("I'd like to understand string theory.")
]
```

We generate the next response from the AI by passing these messages to the `ChatOpenAI` object.

In [61]:
res = chat(messages)
res

AIMessage(content="String theory is a theoretical framework in physics that attempts to explain the fundamental nature of particles and the interactions between them. It suggests that particles are not point-like objects but instead tiny, vibrating strings of energy. These strings can vibrate at different frequencies, giving rise to different types of particles.\n\nHere are some key points to understand about string theory:\n\n1. Dimensions: String theory requires more than the usual three dimensions of space and one dimension of time. In fact, it suggests the existence of additional dimensions beyond our everyday experience. The most well-known version of string theory, called superstring theory, suggests that there are 10 dimensions in total (9 spatial dimensions and 1 time dimension).\n\n2. Unification of forces: String theory aims to unify the fundamental forces of nature, including gravity, electromagnetism, and the strong and weak nuclear forces. In traditional physics, these for

In response we get another AI message object. We can print it more clearly like so:

In [62]:
print(res.content)
print('\nNOTE: res is an AIMessage object from LangChain:')
print(type(res))

String theory is a theoretical framework in physics that attempts to explain the fundamental nature of particles and the interactions between them. It suggests that particles are not point-like objects but instead tiny, vibrating strings of energy. These strings can vibrate at different frequencies, giving rise to different types of particles.

Here are some key points to understand about string theory:

1. Dimensions: String theory requires more than the usual three dimensions of space and one dimension of time. In fact, it suggests the existence of additional dimensions beyond our everyday experience. The most well-known version of string theory, called superstring theory, suggests that there are 10 dimensions in total (9 spatial dimensions and 1 time dimension).

2. Unification of forces: String theory aims to unify the fundamental forces of nature, including gravity, electromagnetism, and the strong and weak nuclear forces. In traditional physics, these forces are described by sepa

Since `res` is just another `AIMessage` object, we can append it to `messages`, add another `HumanMessage`, and generate the next response in the conversation.

In [63]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Why do physicists believe it can produce a 'unified theory'?"
)
# add to messages
messages.append(prompt)

# send to chat-gpt
res = chat(messages)

print(res.content)

Physicists believe that string theory has the potential to produce a unified theory because it incorporates all the fundamental forces of nature within a single framework. This unification is desirable because it would provide a more complete understanding of the fundamental building blocks of the universe and the interactions between them.

Here are a few reasons why physicists are hopeful about string theory's potential for unification:

1. Consistency with quantum mechanics: String theory is inherently a quantum theory, meaning it incorporates the principles of quantum mechanics. Quantum mechanics has been incredibly successful in describing the behavior of particles on a small scale, so it is natural to seek a theory that unifies quantum mechanics with gravity (which is described by general relativity).

2. Gravity as a force-carrying particle: In string theory, the graviton emerges as a vibration mode of the string. This suggests that gravity is not fundamentally different from ot

Notice that the messages list is not 8 items long. This is because we are storing the sequence of LangChain messages so that we can use these to provide content when sending a prompt to out LLM.

In [64]:
messages

[SystemMessage(content='You are a helpful assistant.'),
 HumanMessage(content='Hi AI, how are you today?'),
 AIMessage(content="I'm great thank you. How can I help you?"),
 HumanMessage(content="I'd like to understand string theory."),
 AIMessage(content="String theory is a theoretical framework in physics that attempts to explain the fundamental nature of particles and the interactions between them. It suggests that particles are not point-like objects but instead tiny, vibrating strings of energy. These strings can vibrate at different frequencies, giving rise to different types of particles.\n\nHere are some key points to understand about string theory:\n\n1. Dimensions: String theory requires more than the usual three dimensions of space and one dimension of time. In fact, it suggests the existence of additional dimensions beyond our everyday experience. The most well-known version of string theory, called superstring theory, suggests that there are 10 dimensions in total (9 spatia

### Dealing with Hallucinations

We have our chatbot, but as mentioned — the knowledge of LLMs can be limited. The reason for this is that LLMs learn all they know during training. An LLM essentially compresses the "world" as seen in the training data into the internal parameters of the model. We call this knowledge the _parametric knowledge_ of the model.

By default, LLMs have no access to the external world.

The result of this is very clear when we ask LLMs about more recent information, like about the new (and very popular) [Llama 2 LLM](https://ai.meta.com/llama/) (I've mentioned Llama and Llama 2 in class - it's an opensource LLM developed by Meta/Facebook.)

In [65]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="What is so special about Llama 2?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [66]:
print(res.content)

I apologize, but I couldn't find any specific information about something called "Llama 2." It's possible that you may be referring to something that is not widely known or that is specific to a particular context. If you can provide more details or clarify your question, I'll do my best to assist you.


Our chatbot can no longer help us, it doesn't contain the information we need to answer the question. It was very clear from this answer that the LLM doesn't know the informaiton, but sometimes an LLM may respond like it _does_ know the answer — and, somtetimes, this can difficult to detect.

In [67]:
# add latest AI response to messages
messages.append(res)

# now create a new user prompt
prompt = HumanMessage(
    content="Can you tell me about the prerequisites for University of South Florida's MSBAIS program?"
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [68]:
print(res.content)

To provide accurate information about the prerequisites for the University of South Florida's MSBAIS (Master of Science in Business Analytics and Information Systems) program, it is best to refer directly to the university's official website or contact the admissions office. However, I can provide you with some general information about typical prerequisites for similar programs:

1. Educational Background: Most MSBAIS programs require applicants to hold a bachelor's degree from an accredited institution. The preferred undergraduate degree may vary, but it is often in a field related to business, computer science, information systems, mathematics, statistics, or a related discipline.

2. Academic Requirements: Applicants are typically expected to have a competitive undergraduate GPA. The specific minimum GPA requirement can vary between institutions, but a strong academic record is generally preferred.

3. Coursework: Some programs may require specific prerequisite coursework or founda

There is another way of feeding knowledge into LLMs. It is called _source knowledge_ and it refers to any information fed into the LLM via the prompt. We can try that with the MSBAIS' program's prereq question. We can take a description of this from the MSBAIS program website (https://catalog.usf.edu/preview_program.php?catoid=12&poid=3955&returnto=2049).

In [69]:
source_knowledge = """

CURRICULUM REQUIREMENTS
Total Minimum Hours: 33 credit hours

Core Requirements– 12 credit hours
Capstone – 3 credit hours
Concentration or Electives – 18 credit hours
The major requires 33 hours of coursework and may be taken either full-time or part-time. Full-time students with appropriate prerequisites may be able to complete the major in one full year (3 semesters) of study. Part-time students and full-time students who need prerequisites will typically need from 1 ½ to 3 years to complete the degree.

Prerequisites
Incoming students are expected to have the following as prerequisites

A course in high-level, object oriented programming language (e.g., C#, C++, Java and Python) or substantial programming experience;
A course in Information Systems Analysis and Design or equivalent experience;
A course in Database Systems or equivalent experience;
A course in Statistics or equivalent professional qualification or experiences
A course in economics, or equivalent professional qualification or experiences and
A course in financial accounting.
These required prerequisite courses may be taken simultaneously with courses in the M.S./BAIS major. Prerequisiite courses do not count toward the 33 credit hours of course requirements in the M.S./BAIS major.

Students have the choice of two options:

On-Campus Option:
Designed for students who need flexibility in their course work, students will work early in the first semester with their major advisor to complete a formal Major Curriculum of Study meeting the Major Curriculum Requirements that will define a coherent sequence of courses to accomplish the student’s objectives. Students have choice of electives as well as the option to complete a master’s thesis or practicum project, depending upon the availability and approval of a faculty sponsor.

Executive Weekend Option:
Intended for full-time working Information Technology/Information Systems/Business professionals who will pursue this degree while remaining employed. Offered on a cohort basis, students will meet the Major Curriculum Requirements through a pre-determined set of courses, electives, and independent study options selected by faculty and noted on the formal Major Curriculum of Study, based on market needs and student profiles. Students will benefit from an accelerated curriculum with a managerial and leadership approach. To get the full benefit, applicants are expected to have a minimum of 5 years of relevant work experience.

CORE REQUIREMENTS (12 CREDIT HOURS)
The following four courses provide an understanding of the state-of-the-art in research and practice in technical areas of Information Systems Management.

ISM 6124 Advanced Systems Analysis and Design Credit Hours: 3
ISM 6218 Advanced Database Management Credit Hours: 3
ISM 6225 Distributed Information Systems Credit Hours: 3
QMB 6304 Analytical Methods for Business Credit Hours: 3
CAPSTONE COURSE (3 CREDIT HOURS)
This course is considered the capstone of the M.S./BAIS major and as such it must be taken during one of the last two semesters of the student’s major.

ISM 6155 Enterprise Information Systems Management Credit Hours: 3

"""

We can feed this additional knowledge into our prompt with some instructions telling the LLM how we'd like it to use this information alongside our original query.

In [70]:
query = "Can you tell me about the prerequisites for University of South Florida's MSBAIS program?"

augmented_prompt = f"""Using the contexts below, answer the query.

Contexts:
{source_knowledge}

Query: {query}"""

Now we feed this into our chatbot as we were before.

In [71]:
# create a new user prompt
prompt = HumanMessage(
    content=augmented_prompt
)
# add to messages
messages.append(prompt)

# send to OpenAI
res = chat(messages)

In [72]:
print(res.content)

The prerequisites for the University of South Florida's MSBAIS (Master of Science in Business Analytics and Information Systems) program are as follows:

1. A course in high-level, object-oriented programming language (e.g., C#, C++, Java, and Python) or substantial programming experience.
2. A course in Information Systems Analysis and Design or equivalent experience.
3. A course in Database Systems or equivalent experience.
4. A course in Statistics or equivalent professional qualification or experiences.
5. A course in economics or equivalent professional qualification or experiences.
6. A course in financial accounting.

These prerequisites are expected to be completed before or concurrently with the courses in the MSBAIS major. It's worth noting that these prerequisite courses do not count toward the 33 credit hours required for the MSBAIS major.

The program offers two options for students: the On-Campus Option and the Executive Weekend Option.

In the On-Campus Option, students 

The quality of this answer is excellent. This is made possible because we have augmented our query with external knowledge (source knowledge). There's just one problem — how do we get this information in the first place?

We learned in a previous class about Pinecone and vector databases. Let's use a Pinecone vector database. But first, we'll need a dataset.

### Importing the Data

In this task, we will be importing our data. We will be using the Hugging Face Datasets library to load our data. Specifically, we will be using the `"jamescalam/llama-2-arxiv-papers"` dataset. This dataset contains a collection of ArXiv papers which will serve as the external knowledge base for our chatbot.

In [73]:
from datasets import load_dataset

dataset = load_dataset(
    "jamescalam/llama-2-arxiv-papers-chunked",
    split="train"
)

dataset

Downloading readme: 100%|██████████| 409/409 [00:00<?, ?B/s] 
Downloading data: 100%|██████████| 14.4M/14.4M [00:01<00:00, 12.2MB/s]
Downloading data files: 100%|██████████| 1/1 [00:01<00:00,  1.18s/it]
Extracting data files: 100%|██████████| 1/1 [00:00<00:00, 166.69it/s]
Generating train split: 4838 examples [00:00, 44631.93 examples/s]


Dataset({
    features: ['doi', 'chunk-id', 'chunk', 'id', 'title', 'summary', 'source', 'authors', 'categories', 'comment', 'journal_ref', 'primary_category', 'published', 'updated', 'references'],
    num_rows: 4838
})

> NOTE: We could also write a screen scaper to search for and download papers on a certain topic -- or download these by hand and store as pdf's or text. There are many other possible sources of information on a given topic -- for instance, USF's MSBAIS program has web pages, course descriptions, and other information that could be used as a knowledge base. Also, depending on the business processes you are attempting to automate, you may need to collect documents external to USF - such as information about tampa, lodging, visa applications, etc.

In [74]:
dataset[0]

{'doi': '1102.0183',
 'chunk-id': '0',
 'chunk': 'High-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nTechnical Report No. IDSIA-01-11\nJanuary 2011\nIDSIA / USI-SUPSI\nDalle Molle Institute for Arti\x0ccial Intelligence\nGalleria 2, 6928 Manno, Switzerland\nIDSIA is a joint institute of both University of Lugano (USI) and University of Applied Sciences of Southern Switzerland (SUPSI),\nand was founded in 1988 by the Dalle Molle Foundation which promoted quality of life.\nThis work was partially supported by the Swiss Commission for Technology and Innovation (CTI), Project n. 9688.1 IFF:\nIntelligent Fill in Form.arXiv:1102.0183v1  [cs.AI]  1 Feb 2011\nTechnical Report No. IDSIA-01-11 1\nHigh-Performance Neural Networks\nfor Visual Object Classi\x0ccation\nDan C. Cire\x18 san, Ueli Meier, Jonathan Masci,\nLuca M. Gambardella and J\x7f urgen Schmidhuber\nJanuary 2011\nAbs

#### Dataset Overview

The dataset we are using is sourced from the Llama 2 ArXiv papers. It is a collection of academic papers from ArXiv, a repository of electronic preprints approved for publication after moderation. Each entry in the dataset represents a "chunk" of text from these papers.

Because most **L**arge **L**anguage **M**odels (LLMs) only contain knowledge of the world as it was during training, they cannot answer our questions about Llama 2 — at least not without this data.

###  Building the Knowledge Base

We now have a dataset that can serve as our chatbot knowledge base. Our next task is to transform that dataset into the knowledge base that our chatbot can use. To do this we must use an embedding model and vector database.

We begin by initializing our connection to Pinecone, this requires a [free API key](https://app.pinecone.io).

In [75]:
import pinecone

# get API key from app.pinecone.io and environment from console
pinecone.init(
    api_key=os.environ.get('PINECONE_API_KEY') or 'YOUR_API_KEY',
    environment=os.environ.get('PINECONE_ENVIRONMENT') or 'YOUR_ENV'
)

Then we initialize the index. We will be using OpenAI's `text-embedding-ada-002` model for creating the embeddings, so we set the `dimension` to `1536`.

Most (all?) of you are using the free version of Pinecone. Though this will be sufficient to hold the data required for you course work, the free services is limited to only one index at a time. Therefore, we will need to delete the index if we wish to create a new one.

First, let's look at what indexes we have.

In [76]:
pinecone.list_indexes()

['rag-index']

The following code will remove all existing indexes from your pinecone account. **Only run this if you are sure you want to delete all indexes.**

In [77]:
index_list = pinecone.list_indexes()
while len(index_list) > 0:
    pinecone.delete_index(index_list[-1])
    index_list = pinecone.list_indexes()

Here we create our pinecone index. We should wait to make sure it is ready before we use it before moving to the next cell. The while loop will check the status of the index every second until it is ready...


In [78]:
import time

if PINECONE_INDEX_NAME not in pinecone.list_indexes():
    pinecone.create_index(
        PINECONE_INDEX_NAME,
        dimension=1536,
        metric='cosine'
    )
    # wait for index to finish initialization
    while not pinecone.describe_index(PINECONE_INDEX_NAME).status['ready']:
        time.sleep(1)
        

index = pinecone.Index(PINECONE_INDEX_NAME)

Then we connect to the index:

In [79]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

Our index is now ready but it's empty. It is a vector index, so it needs vectors. As mentioned, to create these vector embeddings we will OpenAI's `text-embedding-ada-002` model — we can access it via LangChain like so:

In [80]:
from langchain.embeddings.openai import OpenAIEmbeddings

embed_model = OpenAIEmbeddings(model="text-embedding-ada-002")

Using this model we can create embeddings like so:

In [81]:
texts = [
    'this is the first chunk of text',
    'then another second chunk of text is here'
]

res = embed_model.embed_documents(texts)
len(res), len(res[0])

(2, 1536)

From this we get two (aligning to our two chunks of text) 1536-dimensional embeddings.

We're now ready to embed and index all our our data! We do this by looping through our dataset and embedding and inserting everything in batches.

In [82]:
from tqdm.auto import tqdm  # for progress bar

data = dataset.to_pandas()  # this makes it easier to iterate over the dataset

batch_size = 100

for i in tqdm(range(0, len(data), batch_size)):
    i_end = min(len(data), i+batch_size)
    # get batch of data
    batch = data.iloc[i:i_end]
    # generate unique ids for each chunk
    ids = [f"{x['doi']}-{x['chunk-id']}" for i, x in batch.iterrows()]
    # get text to embed
    texts = [x['chunk'] for _, x in batch.iterrows()]
    # embed text
    embeds = embed_model.embed_documents(texts)
    # get metadata to store in Pinecone
    metadata = [
        {'text': x['chunk'],
         'source': x['source'],
         'title': x['title']} for i, x in batch.iterrows()
    ]
    # add to Pinecone
    index.upsert(vectors=zip(ids, embeds, metadata))

100%|██████████| 49/49 [00:59<00:00,  1.21s/it]


We can check that the vector index has been populated using `describe_index_stats` like before:

In [83]:
index.describe_index_stats()

{'dimension': 1536,
 'index_fullness': 0.038,
 'namespaces': {'': {'vector_count': 3800}},
 'total_vector_count': 3800}

#### Retrieval Augmented Generation

We've built a fully-fledged knowledge base. Now it's time to connect that knowledge base to our chatbot. To do that we'll be diving back into LangChain and reusing our template prompt from earlier.

To use LangChain here we need to load the LangChain abstraction for a vector index, called a `vectorstore`. We pass in our vector `index` to initialize the object.

In [84]:
from langchain.vectorstores import Pinecone

text_field = "text"  # the metadata field that contains our text

# initialize the vector store object
vectorstore = Pinecone(
    index, embed_model.embed_query, text_field
)



Using this `vectorstore` we can already query the index and see if we have any relevant information given our question about Llama 2.

In [91]:
query = "What is Llama 2?"

vectorstore.similarity_search(query, k=3)

[Document(page_content='Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang\nRoss Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang\nAngela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic\nSergey Edunov Thomas Scialom\x03\nGenAI, Meta\nAbstract\nIn this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned\nlarge language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.\nOur ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our\nmodels outperform open-source chat models on most benchmarks we tested, and based on\nourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety', metadata={'source': 'http://arxiv.org/pdf/2307.09288', 'title': 'Llama 2: Open Foundation and Fine-Tun

We return a lot of text here and it's not that clear what we need or what is relevant. Fortunately, our LLM will be able to parse this information much faster than us. All we need is to connect the output from our `vectorstore` to our `chat` chatbot. To do that we can use the same logic as we used earlier.

In [92]:
def augment_prompt(query: str):
    # get top 3 results from knowledge base
    results = vectorstore.similarity_search(query, k=3)
    # get the text from the results
    source_knowledge = "\n".join([x.page_content for x in results])
    # feed into an augmented prompt
    augmented_prompt = f"""Using the contexts below, answer the query.

    Contexts:
    {source_knowledge}

    Query: {query}"""
    return augmented_prompt

Using this we produce an augmented prompt:

In [93]:
print(augment_prompt(query))

Using the contexts below, answer the query.

    Contexts:
    Alan Schelten Ruan Silva Eric Michael Smith Ranjan Subramanian Xiaoqing Ellen Tan Binh Tang
Ross Taylor Adina Williams Jian Xiang Kuan Puxin Xu Zheng Yan Iliyan Zarov Yuchen Zhang
Angela Fan Melanie Kambadur Sharan Narang Aurelien Rodriguez Robert Stojnic
Sergey Edunov Thomas Scialom
GenAI, Meta
Abstract
In this work, we develop and release Llama 2, a collection of pretrained and ﬁne-tuned
large language models (LLMs) ranging in scale from 7 billion to 70 billion parameters.
Our ﬁne-tuned LLMs, called L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , are optimized for dialogue use cases. Our
models outperform open-source chat models on most benchmarks we tested, and based on
ourhumanevaluationsforhelpfulnessandsafety,maybeasuitablesubstituteforclosedsource models. We provide a detailed description of our approach to ﬁne-tuning and safety
asChatGPT,BARD,andClaude. TheseclosedproductLLMsareheavilyﬁne-tunedtoalignwith

There is still a lot of text here, so let's pass it onto our chat model to see how it performs.

In [94]:
# create a new user prompt
prompt = HumanMessage(
    content=augment_prompt(query)
)
# add to messages
messages.append(prompt)

res = chat(messages)

print(res.content)

Llama 2 is a collection of pretrained and fine-tuned large language models (LLMs) developed and released in a research work. These LLMs range in scale from 7 billion to 70 billion parameters. They are optimized for dialogue use cases and are specifically designed for chat conversations. The models, named L/l.sc/a.sc/m.sc/a.sc /two.taboldstyle-C/h.sc/a.sc/t.sc , have been fine-tuned to perform well on various benchmarks and have shown better performance compared to existing open-source chat models in most cases. The research work also suggests that these models may be suitable alternatives to closed-source models, based on evaluations of helpfulness and safety.

It is worth noting that the Llama 2 models are part of ongoing research and development in the field of language models and their application in dialogue systems. The work describes the approach used for fine-tuning the models and emphasizes the importance of transparency and reproducibility in AI alignment research.


...The performance of this is rather surprising - notice how the chatbot was able to make some sense out of this text -- and the text contained many issues, like missing spaces between words.

We can continue with more Llama 2 questions. Let's try _without_ RAG first:

In [95]:
prompt = HumanMessage(
    content="what safety measures were used in the development of llama 2?"
)

res = chat(messages + [prompt])
print(res.content)

In the provided context, it is mentioned that Llama 2, a collection of pretrained and fine-tuned large language models (LLMs), was developed with a focus on safety. However, the specific safety measures used in the development of Llama 2 are not detailed in the available information. Without further context or additional information, it is not possible to provide specific details about the safety measures employed in the development of Llama 2.


The chatbot is able to respond about Llama 2 thanks to it's conversational history stored in `messages`. However, it doesn't know anything about the safety measures themselves as we have not provided it with that information via the RAG pipeline. Let's try again but with RAG.

In [96]:
prompt = HumanMessage(
    content=augment_prompt(
        "what safety measures were used in the development of llama 2?"
    )
)

res = chat(messages + [prompt])
print(res.content)

In the development of Llama 2, several safety measures were employed to increase the safety of the models. These measures include:

1. Safety-specific data annotation and tuning: During the fine-tuning process, specific attention was given to safety-related aspects. The models were trained and fine-tuned using data that was annotated with safety considerations in mind. This approach helps enhance the models' ability to generate safe and appropriate responses.

2. Red-teaming: Red-teaming involves conducting critical evaluations and assessments of the models. This process involves having external experts or teams challenge the models and identify potential safety risks or vulnerabilities. By subjecting the models to rigorous evaluations, any issues or areas of concern can be identified and addressed.

3. Iterative evaluations: The development of Llama 2 involved iterative evaluations. This means that the models were continuously assessed and improved based on feedback and evaluations. T

We get a much more informed response that includes several items missing in the previous non-RAG response, such as "red-teaming", "iterative evaluations", and the intention of the researchers to share this research to help "improve their safety, promoting responsible development in the field".

## For your final project

As you can see from the above examples, successful utilization of RAG requires a lot of work. You will need to: 

* Find data sources that contain information related to the process(es) you are trying to support
* Create a suitable knowledgebase by embedding and indexing the data
* Create a suitable prompt that can be used to query the knowledgebase
* Create a suitable chatbot that can be used to interact with the user
* Connect the chatbot to the knowledgebase via the prompt
* Test the chatbot to ensure it is working as expected
* Iterate on the above steps until you have a working chatbot

LangChain has a number of powerful and userful document loaders. You can read entire directories of files, or specific files. [You can easily import JSON files, CSV files, PDF Files, HTML files, and Markdown](https://python.langchain.com/docs/modules/data_connection/document_loaders.html). You can also access live databases/datasources using [Retrievers](https://python.langchain.com/docs/modules/data_connection/retrievers/) (BUT, accessing a live datasource not needed for your final project)

```python

# there are usually multiple ways to load data into LangChain

####
# load a single pdf
#load single pdf and split into text chunks
#loader = PyPDFLoader("./pdfs/paper1.pdf")

####
# load a directory of pdf files
#loader = PyPDFDirectoryLoader("./pdfs")


# load all pdfs from a folder, using directory loader
# and PyPDFLoader to load each pdf. You can replace
# PyPDFLoader with any other loader that you need
# such as...
# JSON loader (https://python.langchain.com/docs/modules/data_connection/document_loaders/json)
# HTML loader (https://python.langchain.com/docs/modules/data_connection/document_loaders/html)
# CSV loader https://python.langchain.com/docs/modules/data_connection/document_loaders/csv
# Markdown loader https://python.langchain.com/docs/modules/data_connection/document_loaders/markdown
loader = DirectoryLoader(
    '../', 
    glob="**/*.md", 
    use_multithreading=True,
    loader_cls=PyPDFLoader,
    show_progress=True
)

# once the loader is defined, we can load our documents
docs = loader.load()

# then split text into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, 
    chunk_overlap=100
)
texts = text_splitter.split_documents(docs)


```

The final piece of this puzzle is how to create a "chatbot" user interface. This is covered in the next notebook.


