# Implementation of RAG System

Below is the highlevel overview how a RAG system works: 

![RAG](https://www.mongodb.com/developer/_next/image/?url=https%3A%2F%2Fimages.contentstack.io%2Fv3%2Fassets%2Fblt39790b633ee0d5a7%2Fblt2d3edefc63969c9e%2F65cf3ec38d55b016fb614064%2FGenAI_Stack_(4).png&w=3840&q=75)

<a href="https://www.mongodb.com/developer/_next/image/?url=https%3A%2F%2Fimages.contentstack.io%2Fv3%2Fassets%2Fblt39790b633ee0d5a7%2Fblt2d3edefc63969c9e%2F65cf3ec38d55b016fb614064%2FGenAI_Stack_(4).png&w=3840&q=75">Image source</a>

## TechStack used

- **Framework** - Langchain

- **LLM** - Gemini 2.0 flash
- **Embedding Model** - LLaMA 2.0
- **Vector Database** - MongoDB Atlas

## Loading Envrionment Variables

In [2]:
import os 

from dotenv import load_dotenv
load_dotenv()

True

## Setup LLM, Embeddings and Vector Store 

### Gemini 2.0-flash (Chat Model)

In [2]:
from langchain.chat_models import init_chat_model

llm = init_chat_model("gemini-2.0-flash", model_provider="google_genai")

### LLaMA 2.0 (Embedding Model loaded from Ollama)

In [49]:
from langchain_ollama import OllamaEmbeddings

embeddings = OllamaEmbeddings(model="llama3.2")

### MongoDB Atlas (Vector Database - stores embeddings)

#### Creating Database, Collection and Vector Search Index

**Hierarchy in MongoDB:** 

Cluster -> Database -> Collections (like Tables in SQL) -> Documents (like rows in SQL)

In [52]:
from langchain_mongodb import MongoDBAtlasVectorSearch
from pymongo import MongoClient

# initialize MongoDB python client
client = MongoClient(os.getenv("MONGODB_ATLAS_CLUSTER_URI"))

DB_NAME = "cognichat_vectordatabase"
COLLECTION_NAME = "cognichat_website_embeddings"
ATLAS_VECTOR_SEARCH_INDEX_NAME = "cognichat_website_embeddings_index"

MONGODB_COLLECTION = client[DB_NAME][COLLECTION_NAME]

#### Creating Vector Store using above params

In [53]:
vector_store = MongoDBAtlasVectorSearch(
    embedding=embeddings,
    collection=MONGODB_COLLECTION,
    index_name=ATLAS_VECTOR_SEARCH_INDEX_NAME,
    relevance_score_fn="cosine",
)

**<u>IMPORTANT:</u> Creating Vector Search Index:**

Its crucial to create since it enables MongoDB to do search with respect to embeddings instead the usual text search in the collections. Hence, it helps us to find the Approximate Nearest Neighbour (ANN) when our search query is passed.

In [74]:
vector_store.create_vector_search_index(dimensions=3072) # As LLama 3.2 has 3072 dimensions and it should match the embedding model dimensions size.

## Implementing RAG Architecture

### Step 1: Loading Datasource 

Here, we will be scraping the content in this <a href="https://lilianweng.github.io/posts/2023-06-23-agent/">blog</a> using langchain's inbuilt document loader which uses Beautiful Soup library under the hood.

In [54]:
import bs4
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load() # Loading the contents of the blog

print(docs)

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='\n\n      LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistake

### Step 2: Chunking the document with each chunk having 1000 characters each

In [55]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

# Define a text splitter to break the documents into smaller chunks with chunk size - 1000 and overlap - 200
# chunk size is the maximum number of characters in each chunk. For example, if chunk size is 1000, each chunk will have at most 1000 characters.
# chunk overlap is the number of characters that will be repeated in the next chunk. For example, if chunk overlap is 200, the next chunk will start 200 characters after the end of the previous chunk.
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)

# Split the loaded document (from previous step) into 1000 character chunks with 200 character overlap.
all_splits = text_splitter.split_documents(docs)

print(all_splits)

print(len(all_splits)) # 63 chunks are created with each chunk having at most 1000 characters and 200 characters overlap.

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refi

### Step 3: Storing the chunks in the Vector Store (MongoDB Atlas)

In [None]:
# Storing the chunks in MongoDB Atlas Vector Search which converts the text into vector embeddings and then stores them in the collection.
_ = vector_store.add_documents(documents=all_splits)

### Step 4: Defining prompt and state for RAG (using in-built prompt from Langchain)

In [8]:
from langchain import hub
from langchain_core.documents import Document
from typing_extensions import List, TypedDict

# Define prompt for question-answering
prompt = hub.pull("rlm/rag-prompt", api_url="https://api.smith.langchain.com")

# Define state for application
class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

### Step 5: Creating state for Retreival 

This method once triggered from State, it then passes the "User query" to the vector store using the `similarity_search` method. It then converts the user query into vector embedding, and then finds the nearest Approximate Nearest Neighbour (ANN), using cosine similiarity and returns the related chunks back in raw form (text). 

In [9]:
# This method retrieves relevant documents from the vector store based on the user's question and will be triggered by the "retrieve" action in the application.
def retrieve(state: State):
    
    retrieved_docs = vector_store.similarity_search(state["question"]) # Perform similarity search in the vector store using the user's question after converting user's question into vector embeddings.
    
    return {"context": retrieved_docs} 

### Step 6: Creating state which auguments the context along with the user query and calls LLM

In [10]:
# This method generates an answer to the user's question using the retrieved context and will be triggered by the "generate" action in the application.
def generate(state: State):

    docs_content = "\n\n".join(doc.page_content for doc in state["context"]) # Concatenate the content of the retrieved documents into a single string to be used as context for the LLM.
     
    messages = prompt.invoke({"question": state["question"], "context": docs_content}) # Invoke the prompt with the user's question and the concatenated context to prepare messages for the LLM.
     
    response = llm.invoke(messages) # Invoke the LLM with the prepared messages to generate an answer to the user's question.

    return {"answer": response.content} # returns the generated answer in raw format to the application state.

### Step 7: Build the state graph for the application using LangGraph

In [11]:
from langgraph.graph import START, StateGraph

graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

### Step 8: Testing the Application with user query

In [12]:
response = graph.invoke({"question": "What is Maximum Inner Product Search (MIPS)?"})

print(response["answer"]) # Print the generated answer to the user's question.

NameError: name 'vector_store' is not defined