## RAG Day 3

### Expert Question Answerer for InsureLLM

LangChain 1.0 implementation of a RAG pipeline.

Using the VectorStore we created last time (with HuggingFace `all-MiniLM-L6-v2`)

In [2]:
! pip install langchain_ollama

Collecting langchain_ollama
  Obtaining dependency information for langchain_ollama from https://files.pythonhosted.org/packages/91/08/7be292aee722692b13a93316247b57eefb83d4309f5fdfe636cc47786efe/langchain_ollama-1.0.0-py3-none-any.whl.metadata
  Downloading langchain_ollama-1.0.0-py3-none-any.whl.metadata (2.1 kB)
Collecting ollama<1.0.0,>=0.6.0 (from langchain_ollama)
  Obtaining dependency information for ollama<1.0.0,>=0.6.0 from https://files.pythonhosted.org/packages/47/4f/4a617ee93d8208d2bcf26b2d8b9402ceaed03e3853c754940e2290fed063/ollama-0.6.1-py3-none-any.whl.metadata
  Downloading ollama-0.6.1-py3-none-any.whl.metadata (4.3 kB)
Downloading langchain_ollama-1.0.0-py3-none-any.whl (29 kB)
Downloading ollama-0.6.1-py3-none-any.whl (14 kB)
Installing collected packages: ollama, langchain_ollama
Successfully installed langchain_ollama-1.0.0 ollama-0.6.1

[1m[[0m[34;49mnotice[0m[1;39;49m][0m[39;49m A new release of pip is available: [0m[31;49m23.2.1[0m[39;49m -> [0m[32

In [5]:
from dotenv import load_dotenv
# langchain_openai import ChatOpenAI is the LLM abstraction for OpenAI models
from langchain_openai import ChatOpenAI
from langchain_ollama import ChatOllama


from langchain_chroma import Chroma
from langchain_core.messages import SystemMessage, HumanMessage
from langchain_huggingface import HuggingFaceEmbeddings
import gradio as gr

ModuleNotFoundError: No module named 'langchain_ollama'

In [6]:
MODEL = "gpt-4.1-nano"
DB_NAME = "vector_db"
load_dotenv(override=True)

True

### Connect to Chroma; use Hugging Face all-MiniLM-L6-v2

In [7]:
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = Chroma(persist_directory=DB_NAME, embedding_function=embeddings)

NameError: name 'HuggingFaceEmbeddings' is not defined

### Set up the 2 key LangChain objects: retriever and llm

#### A sidebar on "temperature":
- Controls how diverse the output is
- A temperature of 0 means that the output should be predictable
- Higher temperature for more variety in answers

Some people describe temperature as being like 'creativity' but that's not quite right
- It actually controls which tokens get selected during inference
- temperature=0 means: always select the token with highest probability
- temperature=1 usually means: a token with 10% probability should be picked 10% of the time

Note: a temperature of 0 doesn't mean outputs will always be reproducible. You also need to set a random seed. We will do that in weeks 6-8. (Even then, it's not always reproducible.)

Note 2: if you want creativity, use the System Prompt!

In [None]:
retriever = vectorstore.as_retriever()
# temprature controls the randomness of the model output, 0 is deterministic higher values more randomness
llm = ChatOpenAI(temperature=0, model_name=MODEL)

### These LangChain objects implement the method `invoke()`

In [None]:
retriever.invoke("Who is Avery?")

In [None]:
llm.invoke("Who is Avery?")

## Time to put this together!

In [None]:
SYSTEM_PROMPT_TEMPLATE = """
You are a knowledgeable, friendly assistant representing the company Insurellm.
You are chatting with a user about Insurellm.
If relevant, use the given context to answer any question.
If you don't know the answer, say so.
Context:
{context}
"""

In [None]:
def answer_question(question: str, history):
    docs = retriever.invoke(question)
    context = "\n\n".join(doc.page_content for doc in docs)
    system_prompt = SYSTEM_PROMPT_TEMPLATE.format(context=context)
    response = llm.invoke([SystemMessage(content=system_prompt), HumanMessage(content=question)])
    return response.content

In [None]:
answer_question("Who is Averi Lancaster?", [])

## What could possibly come next? ðŸ˜‚

In [None]:
gr.ChatInterface(answer_question).launch()

## Admit it - you thought RAG would be more complicated than that!!