RAG (Retrieval Augmented Generative)

Why we create RAG system?
Retrieval systems (RAG) give LLM systems access to factual, access-controlled, timely information.

1. RAG REDUCES HALLUCINATION
Example: In the financial services induftry, providing accurate information on investment options is crucual bcause it directly impacts customers' purchasing decisions and financial well-being.

2. COST-EFFECTIVE ALTERNATIVE
Example: Banks often need to assess the creditworthiness of potential borrowers. Fine-tuning pre-trained Language models to analyze credit histories can be resource-intensive. RAG architecture.

3. CREDIBLE AND ACCURATE RESPONSES
Example: In customer support, providing accurate and helpful responses is essential for maintaining customer trust, as it demonstrates the company's commitment to providing reliable information and support.

4. DOMAIN-SPEIFIC INFORMATION
Example: In the logal industry, clients often require advice specific to their case or jurisdiction becasue different legal systems have unique rules and regulations, and understanding these nuances is crucial for effective legal representation. 

https://www.advancinganalytics.co.uk/blog/2023/11/7/10-reasons-why-you-need-to-implement-rag-a-game-changer-in-ai


RAG PRACTICAL USECASES
1. Document Question Answering Systems
2. Conversational agents
3. Real-time event commentary
4. Content Generation
5. Personalized Recommendation
6. Virtual Asisstants

INSTALLING LIBRARIES

In [1]:
!pip3 install langchain openai tiktoken rapidocr-onnxruntime

!pip install -U "langchain[google-genai]"
!pip install -qU langchain-community faiss-cpu
!pip install -qU  langchain-google-genai

Defaulting to user installation because normal site-packages is not writeable
Defaulting to user installation because normal site-packages is not writeable
Collecting langchain[google-genai]
  Downloading langchain-1.1.0-py3-none-any.whl.metadata (4.9 kB)
Collecting langchain-core<2.0.0,>=1.1.0 (from langchain[google-genai])
  Downloading langchain_core-1.1.0-py3-none-any.whl.metadata (3.6 kB)
Downloading langchain-1.1.0-py3-none-any.whl (101 kB)
Downloading langchain_core-1.1.0-py3-none-any.whl (473 kB)
Installing collected packages: langchain-core, langchain
[2K  Attempting uninstall: langchain-core
[2K    Found existing installation: langchain-core 1.0.7
[2K    Uninstalling langchain-core-1.0.7:
[2K      Successfully uninstalled langchain-core-1.0.7
[2K  Attempting uninstall: langchain━━━━━━━━━━━━━━━[0m [32m0/2[0m [langchain-core]
[2K    Found existing installation: langchain 1.0.8 [32m0/2[0m [langchain-core]
[2K    Uninstalling langchain-1.0.8:━━━━━━━━━━━[0m [32m0/2[

FETCHING OPENAI API KEY

In [2]:
import os
import getpass
from dotenv import load_dotenv

load_dotenv()
gemini_api_key = os.getenv('GEMINI_API_KEY')

os.environ["GOOGLE_API_KEY"] = gemini_api_key

print(gemini_api_key)


AIzaSyBCz196fnoMq8IzcJd9MDxF3MphRrIhYLg


1. DATA INGESTION
2. DATA RETRIEVAL
3. DATA GENERATION

In [3]:
################DATA INGESTION###############

import requests
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
import bs4
from langchain_community.document_loaders import WebBaseLoader

  from .autonotebook import tqdm as notebook_tqdm
USER_AGENT environment variable not set, consider setting it to identify your requests.


In [4]:
!pip install -qU langchain-google-genai

In [5]:
#SELECT A CHAT MODEL

# from langchain.chat_models import init_chat_model

# API key is already set from the previous cell that loads .env
# os.environ["GOOGLE_API_KEY"] is already configured

# model = init_chat_model("google_genai:gemini-2.5-pro")

from langchain_google_genai import ChatGoogleGenerativeAI

llm = ChatGoogleGenerativeAI(
    model="gemini-2.5-flash",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2,
    # other params...
)



In [6]:
#INVOCATION

messages = [
    (
        "system",
        "You are a helpful assistant that translates English to French. Translate the user sentence.",
    ),
    ("human", "I love programming."),
]
ai_msg = llm.invoke(messages)
ai_msg

AIMessage(content="J'adore la programmation.", additional_kwargs={}, response_metadata={'prompt_feedback': {'block_reason': 0, 'safety_ratings': []}, 'finish_reason': 'STOP', 'model_name': 'gemini-2.5-flash', 'safety_ratings': [], 'model_provider': 'google_genai'}, id='lc_run--11fac380-1d8c-4dd7-b862-0c95543fb3a8-0', usage_metadata={'input_tokens': 21, 'output_tokens': 7, 'total_tokens': 28, 'input_token_details': {'cache_read': 0}})

In [7]:
print(ai_msg.content)

J'adore la programmation.


In [None]:
#SELECT AN EMBEDDINGS MODEL

import getpass
import os

if not os.environ.get("GOOGLE_API_KEY"):
    os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter API key for Google Gemini: ")

from langchain_google_genai import GoogleGenerativeAIEmbeddings

embeddings = GoogleGenerativeAIEmbeddings(model="models/gemini-embedding-001")

In [None]:
!pip install -U "langchain[google-genai]"

In [None]:
#SELECT A VECTOR STORE
import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS

embedding_dim = len(embeddings.embed_query("hello world"))
index = faiss.IndexFlatL2(embedding_dim)

vector_store = FAISS(
    embedding_function=embeddings,
    index=index,
    docstore=InMemoryDocstore(),
    index_to_docstore_id={},
)


In [None]:
#LOADING DOCUMENTS FROM THE WEB

bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()

assert len(docs) == 1
print(f"Total characters: {len(docs[0].page_content)}")

In [None]:
print(docs[0].page_content[:5000])

In [None]:
#SPLIT DOCUMENTS
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)
all_splits = text_splitter.split_documents(docs)
print(f"Split blog post into {len(all_splits)} sub-documents.")

In [None]:
#STORING DOCUMENTS IN VECTOR STORE
document_ids = vector_store.add_documents(documents=all_splits)

print(document_ids[:3])


##---------------------------RETRIEVAL & GENERATION---------------------------##

In [None]:
##----RAG Agent - DEFINING TOOL----##

from langchain.tools import tool

@tool(response_format="content_and_artifact")
def retrieve_context(query: str):
    """Retrieve information to help answer a query."""
    retrieved_docs = vector_store.similarity_search(query, k=2)
    serialized = "\n\n".join(
        (f"Source: {doc.metadata}\nContent: {doc.page_content}")
        for doc in retrieved_docs
    )
    return serialized, retrieved_docs

In [None]:
##----CONSTRUCT THE AGENT----##
from langchain.agents import create_agent

tools = [retrieve_context]
# If desired, specify custom instructions
prompt = (
    "You have access to a tool that retrieves context from a blog post. "
    "Use the tool to help answer user queries."
)
agent = create_agent(model, tools, system_prompt=prompt)

In [None]:
##----TEST IT OUT----##
query = (
    "What is the standard method for Task Decomposition?\n\n"
    "Once you get the answer, look up common extensions of that method."
)

for event in agent.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    event["messages"][-1].pretty_print()

##----USING RAG CHAINS - Allow the LLM to use its discretion in generating a tool call to help answer user queries----##

✅ Benefits OF RAG CHAIN 
Search only when needed – The LLM can handle greetings, follow-ups, and simple queries without triggering unnecessary searches.
Contextual search queries – By treating search as a tool with a query input, the LLM crafts its own queries that incorporate conversational context.
Multiple searches allowed – The LLM can execute several searches in support of a single user query.

⚠️ Drawbacks
Two inference calls – When a search is performed, it requires one call to generate the query and another to produce the final response.
Reduced control – The LLM may skip searches when they are actually needed, or issue extra searches when unnecessary.

In [None]:
#Implement this chain by removing tools from the agent and instead incorporating the retrieval step into a custom prompt:

from langchain.agents.middleware import dynamic_prompt, ModelRequest

@dynamic_prompt
def prompt_with_context(request: ModelRequest) -> str:
    """Inject context into state messages."""
    last_query = request.state["messages"][-1].text
    retrieved_docs = vector_store.similarity_search(last_query)

    docs_content = "\n\n".join(doc.page_content for doc in retrieved_docs)

    system_message = (
        "You are a helpful assistant. Use the following context in your response:"
        f"\n\n{docs_content}"
    )

    return system_message


agent = create_agent(model, tools=[], middleware=[prompt_with_context])

In [None]:
##----TEST IT OUT----##

query = "What is task decomposition?"
for step in agent.stream(
    {"messages": [{"role": "user", "content": query}]},
    stream_mode="values",
):
    step["messages"][-1].pretty_print()