## Retrieval Augmented Generation (RAG)

### Install libraries

In [1]:
!pip install langchain_text_splitters



### Building VectorStore (FAISS) and Retriever

In [2]:
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from dotenv import load_dotenv

load_dotenv()

docs = [
    "LangChain is a framework for working with LLM.",
    "RAG combines context retrieval with answer generation.",
    "FAISS is a library for storing and searching embeddings.",
    "Retriever is used to find the most similar documents to the user's queries. The retriever can return a variable number of matching documents, specified in the k parameter. The retriever uses various text similarity algorithms, e.g., cosine matching, Euclidean distance, MMR."
]

splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
split = splitter.create_documents(docs)

print(f"Number of chunks: {len(split)}")

embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(split, embedding=embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

query = "Why use a retriever?"

context = retriever.invoke(query)
print("Retrieved chunk:")
for i, c in enumerate(context, 1):
    print(f"{i}.", c.page_content)

Number of chunks: 7
Retrieved chunk:
1. Retriever is used to find the most similar documents to the user's queries. The retriever can return
2. The retriever uses various text similarity algorithms, e.g., cosine matching, Euclidean distance,


### Simple chain RAG (prompt + context + LLM)

In [3]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

rag_prompt = ChatPromptTemplate.from_messages([
    ("system", "Give precise answers based solely on CONTEXT. If there is no data, say you don't know."),
    ("system", "CONTEXT:\\n{context}"),
    ("user", "{question}")
])

rag_chain = (
    {"context": retriever, "question": RunnablePassthrough()}
    | rag_prompt
    | llm
    | StrOutputParser()
)

print(rag_chain.invoke("What is FAISS and what is it for?"))

FAISS is a library for storing and searching embeddings.


### Example RAG - full program

In [4]:
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

#  Model
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

#  Source documents
docs = [
"LangChain is a framework for working with LLM.",
"RAG combines context matching with answer generation.",
"FAISS is a library for storing and retrieving embeddings."
]

#  Split
splitter = RecursiveCharacterTextSplitter(chunk_size=50, chunk_overlap=10)
splits = splitter.create_documents(docs)

#  Embeddings + vector store
embeddings = OpenAIEmbeddings()
vectorstore = FAISS.from_documents(splits, embedding=embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 2})

#  Prompt RAG
prompt = ChatPromptTemplate.from_messages([
    ("system", "Respond only to context:\n{context}"),
    ("user", "{question}")
])

#  Pipeline
rag_chain = (
    {
        "context": lambda x: retriever.invoke(x["question"]),
        "question": lambda x: x["question"]
    }
    | prompt
    | llm
    | StrOutputParser()
)


print(rag_chain.invoke({"question": "What is FAISS?"}))

FAISS is a library for storing and retrieving embeddings, which are numerical representations of data, often used in machine learning and information retrieval tasks.


### RAG with loop and evaluation

In [5]:
from langchain_core.prompts import ChatPromptTemplate

eval_prompt = ChatPromptTemplate.from_messages([
    ("system", "Evaluate answer."),
    ("user", "Question: {question}\\nAnswer: {answer}\\nIs the answer correct? Respond with only 'yes' or 'no'.")
])

def rag_with_eval(question, max_retries):
    for attempt in range(max_retries):
        context = retriever.invoke(question)
        answer = (prompt | llm | StrOutputParser()).invoke({"context": context, "question": question})
        eval_result = (eval_prompt | llm | StrOutputParser()).invoke({"question": question, "answer": answer})
        print(f"Evaluation result {eval_result}")
        if "yes" in eval_result.lower():
            return f"✅ Answer approved:\\n{answer}"
        print(f"❌ Answer: {answer}\\n rejected, retrying...")
    return "Could not get the correct answer."

print(rag_with_eval("What is RAG?", max_retries=3))

Evaluation result Yes.
✅ Answer approved:\nRAG stands for Retrieval-Augmented Generation. It combines context matching with answer generation, allowing for more accurate and contextually relevant responses by retrieving information from a knowledge base before generating an answer.
