# Build a Simple RAG Engine Using LangChain, ChromaDB and OpenAI

This notebook is a demo for building a simple RAG engine. 
OpenAI's embedding is used to vectorize documents. 
The vectors are stored locally in a Chroma database. 
Integration is done using Python and LangChain. 

You will need 

LangChain is a heavily developed framework, and they do make many changes, 
especially with versions < 1.0.0. 
**Make sure that the exact versions of packages listed in requirements.txt are installed.**
This code runs using their first stable version 1.0.0, 
so hopefully newer versions the authors publish will be consistent with 1.0.0. 
But it is always a good idea to use the exact same versions of all packages. 

Using a virtual environment is highly reccommeded. I use [uv](https://astral.sh/blog/uv). 
If you haven't used it before, I have a [quickstart demo post](https://praveenng.medium.com/uv-a-fast-alternative-to-pip-6f1d8c4a30aa) on Medium.com.

I have another Medium.com post on building RAG Engine, which is very consistent with this notebook. If you like, you can read it [here](https://medium.com/@praveenng/creating-a-vector-database-for-rag-471aca771bce). 
However, note that the versions of packages I used in that article are slightly different, and hence the code has slight changes. 

Finally, the goal of this notebook is to get started with a simple RAG engine. In a production environment, things are much more complicated, as you know. 

In [1]:
# Use pysqlite3 instead of sqlite3 (for ChromaDB)
__import__('pysqlite3')
import sys

sys.modules["sqlite3"] = sys.modules.pop('pysqlite3')

In [2]:
# import os
# os.environ["OPENAI_API_KEY"] = "YOUR API KEY HERE"

In [3]:
# dir where documents are stored
DATA_DIR = 'data'

# file extension ('*.txt', '*.md' etc.)
# all files with these extensions will be read and embedded.
FILE_EXT = ['*.md']

### 1. Builiding Vector Database

In [4]:
# load the document from folder 'data'
from langchain_community.document_loaders import DirectoryLoader

loader = DirectoryLoader(DATA_DIR, glob=FILE_EXT)
docs = loader.load()
print(f"{len(docs)} docuemnt(s) is(are) loaded.")

1 docuemnt(s) is(are) loaded.


In [5]:
# split loaded document into smaller chunks. 
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=300,
    length_function=len,
    add_start_index=True,
)
chunks = splitter.split_documents(docs)
print(f"Docuemnts are split into {len(chunks)} chunks.")

Docuemnts are split into 1719 chunks.


In [4]:
# embed (vectorize) chunks using OpenAIEmbeddings and store in chromadb
# from langchain_community.vectorstores import Chroma
from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings

embedding_function = OpenAIEmbeddings()


In [None]:
db = Chroma.from_documents(
    chunks,
    embedding_function,
    persist_directory='chroma',
)

### 2. Testing Retrieval
We run a query and see if we can retrieve relevant documents from the vector database based on similarity search. 
Below we get the 3 best documents that has the highest similarity with the query. 

In [16]:
# Run a query 
COLLECTION_NAME = 'langchain'
K_DOCUMENTS = 3 

In [6]:
vector_store = Chroma(
            persist_directory='chroma',
            collection_name=COLLECTION_NAME,
            embedding_function=embedding_function
        )

results = vector_store.similarity_search(
            query=my_query, 
            k=K_DOCUMENTS
        )

In [7]:
for i, doc in enumerate(results):
    print(f"\nResult {i+1}:")
    print(f"  Content: {doc.page_content}...")
    print(f"  Source/Metadata: {doc.metadata}")
    print("---\n")


Result 1:
  Content: year, Raskolnikov had got the old man into a hospital and paid for his funeral when he died. Raskolnikov’s landlady bore witness, too, that when they had lived in another house at Five Corners, Raskolnikov had rescued two little children from a house on fire and was burnt in doing so. This was investigated and fairly well confirmed by many witnesses. These facts made an impression in his favour....
  Source/Metadata: {'source': 'data/crime_and_punishment.md', 'start_index': 1104781}
---


Result 2:
  Content: Five months after Raskolnikov’s confession, he was sentenced. Razumihin and Sonia saw him in prison as often as it was possible. At last the moment of separation came. Dounia swore to her brother that the separation should not be for ever, Razumihin did the same. Razumihin, in his youthful ardour, had firmly resolved to lay the foundations at least of a secure livelihood during the next three or four years, and saving up a certain sum, to emigrate to Siberia,

You can see that Result 1 (results\[0]) contains info related to the query. See below.

In [8]:
print(results[0].page_content[194:])

Raskolnikov had rescued two little children from a house on fire and was burnt in doing so. This was investigated and fairly well confirmed by many witnesses. These facts made an impression in his favour.


<br> But wouldn't it be nice if we get the answer direclty without having to go through the documents? That's what a RAG engine does!

### 3. Building RAG Engine

We now put everything together. The steps include building a RAG chain that accepts a query, embed (vectorize) the query, do a similarty search of the embedded query against documents in the database and retrieve relevant documents. Finally, we pass the query and the context (relevant documents) to an LLM. The LLM, based on the query and the context, generates an answer. 

In [9]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai.chat_models import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

In [10]:
retriever = vector_store.as_retriever()

In [19]:
template="""As an assistant for answering questions related to the documents, 
use the following pieces of retrieved context to answer the question.
If the question cannot be answered based on the context, just say that you don't have the information to answer the question.
Use ten sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:
"""

In [20]:
prompt = ChatPromptTemplate.from_template(template)
chat_model = ChatOpenAI(model="gpt-3.5-turbo")
parser = StrOutputParser()

In [21]:
rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | prompt
    | chat_model
    | parser
)

### 4. Testing RAG Engine

In [22]:
# your query about the document
my_query = 'Whom did Raskolnikov rescue when he lived at Five Corners?'

final_response = rag_chain.invoke(my_query)
print(final_response)

Raskolnikov rescued two little children from a house on fire when he lived at Five Corners.


<br><br>
That's great. But, what if we ask a question that is not related to the documents in the database? We prompted the LLM to say that it didn't know. Let's check it. 

In [23]:
# your query about the document
my_query = 'Who is the 6th president of the US?'

rag_chain.invoke(my_query)

"I don't have the information to answer the question regarding the 6th president of the US based on the provided context."