## RAG Day 3

### Expert Question Answerer for InsureLLM

LangChain 1.0 implementation of a RAG pipeline.

Using the VectorStore we created last time (with HuggingFace `all-MiniLM-L6-v2`)

In [8]:
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI

# from langchain_ollama import ChatOllama
# from langchain_google_genai import ChatGoogleGenerativeAI
# from langchain_anthropic import ChatAnthropic

from langchain_chroma import Chroma
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_huggingface import HuggingFaceEmbeddings
import gradio as gr

In [9]:
MODEL = "gpt-4.1-nano"
DB_NAME = "vector_db"
load_dotenv(override=True)

True

### Connect to Chroma; use Hugging Face all-MiniLM-L6-v2

In [10]:
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Chroma(persist_directory=DB_NAME, embedding_function=embeddings)

### Set up the 2 key LangChain objects: retriever and llm

#### A sidebar on "temperature":
- Controls how diverse the output is
- A temperature of 0 means that the output should be predictable
- Higher temperature for more variety in answers

Some people describe temperature as being like 'creativity' but that's not quite right
- It actually controls which tokens get selected during inference
- temperature=0 means: always select the token with highest probability
- temperature=1 usually means: a token with 10% probability should be picked 10% of the time

Note: a temperature of 0 doesn't mean outputs will always be reproducible. You also need to set a random seed. We will do that in weeks 6-8. (Even then, it's not always reproducible.)

Note 2: if you want creativity, use the System Prompt!

In [11]:
retriever = vectorstore.as_retriever()
llm = ChatOpenAI(temperature=0, model_name=MODEL)

### These LangChain objects implement the method `invoke()`

In [12]:
retriever.invoke("Who is Avery?")

[Document(id='180616b2-6ac5-4623-9965-7d56d09be6e8', metadata={'Header 1': 'Avery Lancaster', 'Header 2': 'Other HR Notes', 'doc_type': 'employees', 'source': 'knowledge-base/employees/Avery Lancaster.md'}, page_content='## Other HR Notes\n- **Professional Development**: Avery has actively participated in leadership training programs and industry conferences, representing Insurellm and fostering partnerships.\n- **Diversity & Inclusion Initiatives**: Avery has championed a commitment to diversity in hiring practices, seeing visible improvements in team representation since 2021.\n- **Work-Life Balance**: Feedback revealed concerns regarding work-life balance, which Avery has approached by implementing flexible working conditions and ensuring regular check-ins with the team.'),
 Document(id='5ac7bd80-507f-49b8-96eb-856eaadf0d24', metadata={'Header 2': 'Insurellm Career Progression', 'source': 'knowledge-base/employees/Avery Lancaster.md', 'doc_type': 'employees', 'Header 1': 'Avery Lanc

In [14]:
print(llm.invoke("Who is Avery?"))

content='Could you please provide more context or specify which Avery you are referring to? There are many individuals and characters named Avery.' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 24, 'prompt_tokens': 11, 'total_tokens': 35, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_provider': 'openai', 'model_name': 'gpt-4.1-nano-2025-04-14', 'system_fingerprint': 'fp_7f8eb7d1f9', 'id': 'chatcmpl-CxyHHqIvFB1g9MSi86OQc3UtJstNo', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None} id='lc_run--019bbd65-56f7-7c71-aa49-1b96a00ee681-0' tool_calls=[] invalid_tool_calls=[] usage_metadata={'input_tokens': 11, 'output_tokens': 24, 'total_tokens': 35, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}


## Time to put this together!

In [15]:
SYSTEM_PROMPT_TEMPLATE = """
You are a knowledgeable, friendly assistant representing the company Insurellm.
You are chatting with a user about Insurellm.
If relevant, use the given context to answer any question.
If you don't know the answer, say so.
Context:
{context}
"""

In [16]:
def answer_question(question: str, history):
    docs = retriever.invoke(question)
    context = "\n\n".join(doc.page_content for doc in docs)
    system_prompt = SYSTEM_PROMPT_TEMPLATE.format(context=context)
    response = llm.invoke([SystemMessage(content=system_prompt), HumanMessage(content=question)])
    return response.content

In [None]:
# the_answer = answer_question("Who is Averi Lancaster?", [])
the_answer = answer_question("What products do you offer?", [])

print(the_answer)


At Insurellm, we offer a range of products designed to support insurance agencies in various aspects of their business. These include:

- Competitive benchmarking to help you understand your market position
- Consumer behavior insights to better tailor your offerings
- Product mix recommendations to optimize your portfolio
- Reputation management tools, including review collection, response management, review showcase, and negative review mediation support
- Marketing support with co-branded materials, email campaigns, social media suggestions, seasonal campaigns, and referral program integration

If you'd like more details about any specific product or service, feel free to ask!


## What could possibly come next? ðŸ˜‚

In [20]:
gr.ChatInterface(answer_question).launch()

  self.chatbot = Chatbot(


* Running on local URL:  http://127.0.0.1:7863
* To create a public link, set `share=True` in `launch()`.




## Admit it - you thought RAG would be more complicated than that!!