# Lab 2: RAG

## We will build and evaluate a Question Answering Expert for a fictional company: InsureLLM!

### BEFORE WE BEGIN:

Look at the knowledge-base - this is the company shared drive.

### For those new to RAG:

Does one of the Experts want to give an explanation?

We will be figuring out ways to insert relevant background information in to the prompt..

Today will be more intense - please ask me lots of questions and clarifications..

In [1]:
import os
import glob
import tiktoken
import numpy as np
from IPython.display import Markdown, display

from langchain_openai import ChatOpenAI
from langchain_chroma import Chroma
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_openai import OpenAIEmbeddings

from langchain_community.document_loaders import DirectoryLoader, TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

from sklearn.manifold import TSNE
import plotly.graph_objects as go

In [2]:
MODEL = "gpt-4.1-nano"
db_name = "vector_db"

In [3]:
knowledge_base_path = "knowledge-base/**/*.md"
files = glob.glob(knowledge_base_path, recursive=True)
print(f"Found {len(files)} files in the knowledge base")

entire_knowledge_base = ""

for file_path in files:
    with open(file_path, 'r', encoding='utf-8') as f:
        entire_knowledge_base += f.read()
        entire_knowledge_base += "\n\n"

print(f"Total characters in knowledge base: {len(entire_knowledge_base):,}")

Found 76 files in the knowledge base
Total characters in knowledge base: 304,434


In [4]:
encoding = tiktoken.encoding_for_model("gpt-4.1-mini")
tokens = encoding.encode(entire_knowledge_base)
token_count = len(tokens)
print(f"Total tokens for gpt-4.1-mini: {token_count:,}")

Total tokens for gpt-4.1-mini: 63,555


## LangChain Document Loaders

Loading in the data and splitting it into chunks using LangChain's helper classes

In [5]:
folders = glob.glob("knowledge-base/*")

documents = []
for folder in folders:
    doc_type = os.path.basename(folder)
    loader = DirectoryLoader(folder, glob="**/*.md", loader_cls=TextLoader, loader_kwargs={'encoding': 'utf-8'})
    folder_docs = loader.load()
    for doc in folder_docs:
        doc.metadata["doc_type"] = doc_type
        documents.append(doc)

print(f"Loaded {len(documents)} documents")

Loaded 76 documents


In [6]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
chunks = text_splitter.split_documents(documents)

print(f"Divided into {len(chunks)} chunks")

Divided into 413 chunks


In [7]:
print(chunks[0])

page_content='# Product Summary

# Rellm: AI-Powered Enterprise Reinsurance Solution

## Summary

Rellm is an innovative enterprise reinsurance product developed by Insurellm, designed to transform the way reinsurance companies operate. Harnessing the power of artificial intelligence, Rellm offers an advanced platform that redefines risk management, enhances decision-making processes, and optimizes operational efficiencies within the reinsurance industry. With seamless integrations and robust analytics, Rellm enables insurers to proactively manage their portfolios and respond to market dynamics with agility.

## Features

### AI-Driven Analytics
Rellm utilizes cutting-edge AI algorithms to provide predictive insights into risk exposures, enabling users to forecast trends and make informed decisions. Its real-time data analysis empowers reinsurance professionals with actionable intelligence.' metadata={'source': 'knowledge-base/products/Rellm.md', 'doc_type': 'products'}


In [8]:
#embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
embeddings = OpenAIEmbeddings()

if os.path.exists(db_name):
    Chroma(persist_directory=db_name, embedding_function=embeddings).delete_collection()

vectorstore = Chroma.from_documents(documents=chunks, embedding=embeddings, persist_directory=db_name)
print(f"Vectorstore created with {vectorstore._collection.count()} documents")

Vectorstore created with 413 documents


In [9]:
# How many documents are in the vector store? How many dimensions?

collection = vectorstore._collection
count = collection.count()

sample_embedding = collection.get(limit=1, include=["embeddings"])["embeddings"][0]
dimensions = len(sample_embedding)
print(f"There are {count:,} vectors with {dimensions:,} dimensions in the vector store")

There are 413 vectors with 1,536 dimensions in the vector store


In [10]:
# Gather the vectors, documents and metadata

result = collection.get(include=['embeddings', 'documents', 'metadatas'])
vectors = np.array(result['embeddings'])
documents = result['documents']
metadatas = result['metadatas']
doc_types = [metadata['source'].split('/')[1] for metadata in metadatas]
colors = [['blue', 'green', 'red', 'orange'][['products', 'employees', 'contracts', 'company'].index(t)] for t in doc_types]

In [11]:
# We humans find it easier to visalize things in 2D!
# Reduce the dimensionality of the vectors to 2D using t-SNE
# (t-distributed stochastic neighbor embedding)

tsne = TSNE(n_components=2, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 2D scatter plot
fig = go.Figure(data=[go.Scatter(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='2D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x',yaxis_title='y'),
    width=800,
    height=600,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

In [12]:
# Let's try 3D!

tsne = TSNE(n_components=3, random_state=42)
reduced_vectors = tsne.fit_transform(vectors)

# Create the 3D scatter plot
fig = go.Figure(data=[go.Scatter3d(
    x=reduced_vectors[:, 0],
    y=reduced_vectors[:, 1],
    z=reduced_vectors[:, 2],
    mode='markers',
    marker=dict(size=5, color=colors, opacity=0.8),
    text=[f"Type: {t}<br>Text: {d[:100]}..." for t, d in zip(doc_types, documents)],
    hoverinfo='text'
)])

fig.update_layout(
    title='3D Chroma Vector Store Visualization',
    scene=dict(xaxis_title='x', yaxis_title='y', zaxis_title='z'),
    width=900,
    height=700,
    margin=dict(r=20, b=10, l=10, t=40)
)

fig.show()

## LangChain Code to Call OpenAI

In [13]:
# create a new Chat with OpenAI
llm = ChatOpenAI(temperature=0, model_name=MODEL)

# how many chunks to provide in each prompt
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})

# Simple prompt that includes chat history
prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer the question based on the context:\n{context}"),
    ("human", "{input}")
])

# Create the chain
question_answer_chain = create_stuff_documents_chain(llm, prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

# Invoke it
query = "Please explain what Insurellm is in a couple of sentences"
result = rag_chain.invoke({"input": query})
print(result["answer"])

Insurellm is an innovative insurance technology company founded in 2015 that develops digital platforms for various insurance lines, including auto, home, life, health, and commercial insurance. It focuses on providing modern, customer-centric insurance solutions through its suite of software products, operating primarily remotely across the United States.


# CHALLENGE:

You will be changing or replacing 2 modules:

`ingest.py`

`answer.py`

They are VERY simple! Let's look at them.

## Now check out ingest.py

Then run at the terminal:

`uv run ingest.py`

In [14]:
!uv run ingest.py
!uv run answer.py

There are 265 vectors with 1,536 dimensions in the vector store
Ingestion complete


## Now check out answer.py

In [15]:
from answer import fetch_context, answer_question

fetch_context("Who is Avery?")

[Document(id='1bade4b3-d906-479f-bc78-189aeba5eab0', metadata={'source': 'knowledge-base/employees/Avery Lancaster.md', 'doc_type': 'employees'}, page_content='Avery Lancaster has demonstrated resilience and adaptability throughout her career at Insurellm, positioning the company as a key player in the insurance technology landscape.'),
 Document(id='9adef073-107a-4a3f-a9ef-9fe4e16bfa32', metadata={'doc_type': 'employees', 'source': 'knowledge-base/employees/Avery Lancaster.md'}, page_content="# Avery Lancaster\n\n## Summary\n- **Date of Birth**: March 15, 1985\n- **Job Title**: Co-Founder & Chief Executive Officer (CEO)\n- **Location**: San Francisco, California\n- **Current Salary**: $225,000  \n\n## Insurellm Career Progression\n- **2015 - Present**: Co-Founder & CEO  \n  Avery Lancaster co-founded Insurellm in 2015 and has since guided the company to its current position as a leading Insurance Tech provider. Avery is known for her innovative leadership strategies and risk managemen

In [16]:
result, chunks = await answer_question("Who is Avery?")
display(Markdown(result))

Avery Lancaster is the Co-Founder and Chief Executive Officer (CEO) of Insurellm. She has been with the company since its founding in 2015 and has played a key role in establishing Insurellm as a leading player in the insurance technology industry. Avery is known for her innovative leadership, risk management expertise, and her efforts to drive the company's growth and market presence.

## Now check out app.py

As long as you keep the same 2 functions in `answer.py`, this UI will keep working!!

In [None]:
!uv run app.py

* Running on local URL:  http://127.0.0.1:7860
* To create a public link, set `share=True` in `launch()`.


## OK - Now it's time to EVALUATE!

### First check out tests.jsonl for all the questions

And see how it's loaded in test.py


In [None]:
from test import load_tests

test_data = load_tests()

print(len(test_data))
print(test_data[0])
print(test_data[10])



In [None]:
print(set(test.category for test in test_data))


## Now take a look at eval.py

test_data[0] is a very hard question that it sometimes gets wrong  
test_data[1] is an easy question

In [None]:
from eval import evaluate_retrieval, evaluate_answer

evaluate_retrieval(test_data[1])

In [None]:
await evaluate_answer(test_data[0])

## AND FINALLY - all come together in a UI

In [None]:
!uv run evaluator.py

## Ideas for your experiments

### Quick wins

- Experiment with the encoder
- Experiment with chunking strategies

### Big change ideas

1. Pre-processing - use an LLM to rewrite (a) the chunks and/or (b) the questions / conversation history
2. Hierarchical RAG - summarize at different levels and do RAG over summaries
3. Tools!

# 10 RAG Techniques

1. **Chunking R&D:** experiment with chunking strategy to optimize for your commercial goal
2. **Encoder R&D:** select the best Encoder model based on a test set
3. **Improve Prompts:** general content, the current date, relevant context and history
4. **Document pre-processing:** use an LLM to make the chunks and/or text for encoding
5. **Query rewriting:** use an LLM to convert the user’s question to a RAG query
6. **Query expansion:** use an LLM to turn the question into multiple RAG queries
7. **Re-ranking:** use an LLM to sub-select from RAG results
8. **Hierarchical:** use an LLM to summarize at multiple levels
9. **Graph RAG:** retrieve content closely related to similar documents
10. **Agentic RAG:** use Agents for retrieval, combining with Memory and Tools such as SQL


2 hard questions that can be addressed with the above:

- Who won the IIOTY award in 2023?

- What proportion of employees have a salary over $90,000?

