# Augmented Research Assistant 🤖 🚀

You've built a great assistant that uses RAG to provide answers. Now let's combined what we've learned and try to implement memory inside that bot 🧠

<Note type="note">

In this exercise, you will probably learn a few more advanced concepts especially regarding chaining. Read carefully and be patient if you want to finish this exercise 😌

</Note>


## Step 0 - Demo setup 

First remember to have:

* You containers running 
* Packages below installed

In [1]:
# install package
%pip install -Uqq langchain-weaviate
%pip install langchain langchain_mistralai langchain_huggingface -q
%pip install -qU langchain-community beautifulsoup4
%pip install -qU weaviate-client
%pip install sentence-transformers -q 
%pip install transformers -q

Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.
Note: you may need to restart the kernel to use updated packages.


## Step I - Connect to your database 

Now let's connect your database. 

<Note type="important">

If for some reason you deleted your Weaviate container, you will need to recreate one and repopulate your DB as everything would have been lost.

Check out the previous exercises if you need to do so. 

</Note>

1. Connect to your database

In [5]:
from dotenv import load_dotenv

load_dotenv()

True

In [2]:
from langchain_weaviate.vectorstores import WeaviateVectorStore
from langchain.embeddings import HuggingFaceEmbeddings
import weaviate
from weaviate.classes.init import Auth
import os

weaviate_url = os.environ["WEAVIATE_URL"]
weaviate_api_key = os.environ["WEAVIATE_API_KEY"]

# Connect to Weaviate Cloud
client = weaviate.connect_to_weaviate_cloud(
    cluster_url=weaviate_url,
    auth_credentials=Auth.api_key(weaviate_api_key),
)

# client = weaviate.connect_to_local(
#     # host="host.docker.internal",  # Use host.docker.internal if you are running it inside a docker container
#     port=8080,
#     grpc_port=50051,
# )

2. Start by displaying the available collections

In [3]:
client.collections.list_all()

{'LangChain_5bc7e27ecd0747218db36fbf82ce55b8': _CollectionConfigSimple(name='LangChain_5bc7e27ecd0747218db36fbf82ce55b8', description=None, generative_config=None, properties=[_Property(name='text', description=None, data_type=<DataType.TEXT: 'text'>, index_filterable=True, index_range_filters=False, index_searchable=True, nested_properties=None, tokenization=<Tokenization.WORD: 'word'>, vectorizer_config=None, vectorizer='none', vectorizer_configs=None), _Property(name='sources', description="This property was generated by Weaviate's auto-schema feature on Fri May 16 13:54:08 2025", data_type=<DataType.TEXT: 'text'>, index_filterable=True, index_range_filters=False, index_searchable=True, nested_properties=None, tokenization=<Tokenization.WORD: 'word'>, vectorizer_config=None, vectorizer='none', vectorizer_configs=None), _Property(name='chunk_id', description="This property was generated by Weaviate's auto-schema feature on Fri May 16 13:54:08 2025", data_type=<DataType.NUMBER: 'numbe

3. Define a vector store using an embedding and the database.

In [4]:
# Instanciate Embeddings
embeddings = HuggingFaceEmbeddings()

# Now we can load our documents into our Database 
# Depending on the amount of data 
# The time necessary to execute the cell will vary
vectorstore = WeaviateVectorStore.from_documents(
    [],
    client= client,
    embedding=embeddings,
    index_name="LangChain_5bc7e27ecd0747218db36fbf82ce55b8", # To know where to get that, you need to look for client.collections.list_all()
    use_multi_tenancy=True
)

  embeddings = HuggingFaceEmbeddings()
  embeddings = HuggingFaceEmbeddings()


## Step II - Load a model & Create a retriever

Now let's:
* Import a model and
* create a `retriever` variable based on `vectorestore`

Once this is done, test your `retriever` by doing a simple search.

In [6]:
from langchain_mistralai import ChatMistralAI


# Create LLM
llm = ChatMistralAI(model="mistral-large-latest")


# Retrieve and generate using the relevant snippets of the blog.
retriever = vectorstore.as_retriever(search_type="similarity", search_kwargs={"k": 5, "tenant": "knowledge_base_llm"})
retriever.invoke("Tell me everything I need to know about LLMs")

[Document(metadata={'chunk_id': 0.0, 'sources': 'Large Language Models (LLMs) - Everything You NEED To Know.m4a'}, page_content="This video is going to give you everything you need to go from knowing absolutely nothing about artificial intelligence and large language models to having a solid foundation of how these revolutionary technologies work. Over the past year, artificial intelligence has completely changed the world, with products like ChatGPT potentially appending every single industry and how people interact with technology in general. And in this video, I will be focusing on LLMs, how they work, ethical considerations, applications, and so much more. And this video was created in collaboration with an incredible program called AI Camp, in which high school students learn all about artificial intelligence. And I'll talk more about that later in the video. Let's go. So first, what is an LLM? Is it different from AI? And how is ChatGPT related to all of this? LLMs stand for larg

## Step III - Understand what we are going to do next 🤔

Alright, let's pause on the code for a little bit as you need to think about next steps when building your LLM app. If we want to have a bot that has memory + uses RAG, we need to be smart but also choose between several solutions as each have some trade-offs.

The main problem with LLMs is the **context window**. If you feed too many tokens, you will overflow the model and therefore your code won't work. This is very likely to happen (at least as for today's models) especially if you have a chat history + a context. 

So what do we do? You have several solutions: 

* (the we will choose) - Build an LLM app in two steps:
    1. We will build a *history based retriever* that will take into account user question **and** the chat history **and then reformulate the question based on these two parameters** before applying it to the retriever. This way we will have a question that will basically be a mix of the user and model interaction + the latest question 

    2. Based on the retrieved context, we'll ask the LLM to provide the final answer 


* Build an LLM app with no summary 
    1. This is an easier way to code the app but it will come with the risk of overflowing the context window 
    2. However your LLM will have access to the full history which can be useful if your user asks unrelated questions 


Again for this exercise, we will choose the first solution, but feel free to experiment the other one and work around the context window problem 😉

## Step IV - Build the History aware retriever prompt 

Alright let's tackle the first step of our solution which is: building the prompt. Here we want:

* A system prompt that takes a `chat_history` and a user `input` (the question) as parameter 
* You should be able to build the chat template using `ChatTemplateMessage`

In [7]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

### Contextualize question ###
contextualize_q_system_prompt = """
    Given a chat history and the latest user question, 
    formulate a standalone question which can be understood 
    without the chat history. Do NOT answer the question, 
    just reformulate it if needed and otherwise return it as is.

    Here is the chat history:
    {chat_history}

    Here is the user question:
    {input}
"""
contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

## Step V - Build the history retriever chain 

Now the hard part: **Build the chain**. To do so you can: 

* First think about a simple chain that simply passes the prompt to the retriever 
* Once this is done, you will need to think about a specific case: **What happens when there is no chat_history** 
    * Even if I didn't look at your code, it is very likely that it will throw an error 
    * To fix that, you will need to have some kind of a conditional chain where you will simply call the retriever without taking the chat history into account 
    * To do so, I definitely advise you to look at [`RunnableBranch`](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.branch.RunnableBranch.html#langchain_core.runnables.branch.RunnableBranch) which is great for that 😉

In [8]:
from langchain_core.output_parsers import StrOutputParser
from langchain_core.messages import HumanMessage, AIMessage
from langchain_core.runnables import RunnableBranch


history_aware_retriever = RunnableBranch(
    (
        # Both empty string and empty list evaluate to False
        lambda x: not x.get("chat_history", False),
        # If no chat history, then we just pass input to retriever
        (lambda x: x["input"]) | retriever,
    ),
    # If chat history, then we pass inputs to LLM chain, then to retriever
    contextualize_q_prompt | llm | StrOutputParser() | retriever,
).with_config(run_name="chat_retriever_chain")

resp = history_aware_retriever.invoke({
    "input": "What should I learn first If I want to build my own?",
    "chat_history": [HumanMessage("What are LLMs?"), AIMessage("LLMs are specific models in Artificial Intelligence")]
})
for i,doc in enumerate(resp):
    print(f"### DOC {i}\n")
    print(doc.page_content)

### DOC 0

So, let's get started, so I'll be talking about building LLMs today, so I think a lot of you have heard of LLMs before, but just as a quick recap, LLMs, standing for Large Language Models, are basically all the chatbots that you've been hearing about recently, so ChatGPT from OpenAI, Claude from Entropiq, Gemini, and Lama, and other type of models like this, and today we'll be talking about how do they actually work, so it's going to be an overview because it's only one lecture, and it's hard to compress everything, but hopefully I'll touch a little bit about all the components that are needed to train some of these LLMs. Also, if you have questions, please interrupt me and ask. If you have a question, most likely other people in the room or on Zoom have the same question, so please ask. Great, so what matters when training LLMs? So, there are a few key components that matter. One is the architecture, so as you probably all know, LLMs are neural networks, and when you think 

## Step VI - Create the question-answer prompt 

You finished the first step of the LLM app! 👏 Now let's tackle the next (and final) one! 

First, let's create a new system prompt that will simply tell our LLM to answer a question based on a given context

In [9]:
### Answer question ###
system_prompt = """
    You are an assistant for question-answering tasks. 
    Use the following pieces of retrieved context to answer 
    the question. If you don't know the answer, say that you 
    don't know. Use three sentences maximum and keep the 
    answer concise.

    Here is the user question:
    {input}

    Here is the context to help you answer:
    {context}
"""
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt)
    ]
)

## Step VII - Create the question-answer chain 

Alright, that's the hardest part: **the chain**. Let's review what we need:

* The chain should take a `context` and `input` 
* Both these variables should be passed through the whole chain as we will need it as output of our chain 
* The chain should output a dictionnary with:
    * `context`
    * `input`
    * `answer`

Now some hints 😘

* If you want to pass variables through the chain you will need to use [`RunnablePassthrough`](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.passthrough.RunnablePassthrough.html#langchain_core.runnables.passthrough.RunnablePassthrough)
    * Use also the `.assign()` method to provide specific keys

* Don't forget that you will also need the `format_docs` function and you will most likely need to wrap it up around the [`RunnableLambda`](https://python.langchain.com/api_reference/core/runnables/langchain_core.runnables.base.RunnableLambda.html#langchain_core.runnables.base.RunnableLambda) to make it chainable

* The question answer chain should contain the history retriever chain to work! (so part of your q&a chain should have the history aware retriever chain)

This is the hardest part of this exercise but with the hint above you should be able to do it. Take some time to read the documentation carefully to really understand the concepts of chaining in Langchain

In [11]:
from langchain_core.runnables import RunnableLambda, RunnablePassthrough


def format_docs(docs):
    return "\n\n".join(doc.page_content for doc in docs)


### Answer question ###
system_prompt = """
    You are an assistant for question-answering tasks. 
    Use the following pieces of retrieved context to answer 
    the question. If you don't know the answer, say that you 
    don't know. Use three sentences maximum and keep the 
    answer concise.

    Here is the user question:
    {input}

    Here is the context to help you answer:
    {context}
"""
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt)
    ]
)

# This is the most complicated part as you most likely never seen it 
# Basically we wrap the whole chain using RunnablePassthrough so that 
# the data from the beginning of the chain is passed through 
# the end of the chain 
# Each of the parameters in the .assign() methods are actually the keys of the future dictionnary 
# that will be output at the end of the chain
question_answer_chain = RunnablePassthrough.assign(
    context=history_aware_retriever | RunnableLambda(format_docs),
    input=lambda x: x["input"]
).assign(answer=qa_prompt | llm | StrOutputParser())


resp = question_answer_chain.invoke({
    "input": "What should I learn first If I want to build my own?",
    "chat_history": [HumanMessage("What are LLMs?"), AIMessage("LLMs are specific models in Artificial Intelligence")]
})

for key, value in resp.items():
    print(f"KEY:{key}")
    print(f"VALUE:{value}")

KEY:input
VALUE:What should I learn first If I want to build my own?
KEY:chat_history
VALUE:[HumanMessage(content='What are LLMs?', additional_kwargs={}, response_metadata={}), AIMessage(content='LLMs are specific models in Artificial Intelligence', additional_kwargs={}, response_metadata={})]
KEY:context
VALUE:So, let's get started, so I'll be talking about building LLMs today, so I think a lot of you have heard of LLMs before, but just as a quick recap, LLMs, standing for Large Language Models, are basically all the chatbots that you've been hearing about recently, so ChatGPT from OpenAI, Claude from Entropiq, Gemini, and Lama, and other type of models like this, and today we'll be talking about how do they actually work, so it's going to be an overview because it's only one lecture, and it's hard to compress everything, but hopefully I'll touch a little bit about all the components that are needed to train some of these LLMs. Also, if you have questions, please interrupt me and ask.

## Step VIII - Create the graph

Alright you've done the hardest part! Now all we have to do is to incorporate the final chain in a LangGraph! The only thing that is new is that we want to have custom values to be monitored (not just the messages history). Therefore you will need to build **a custom state**. To do so:

* You will need to create a new `State` class that will inherit the `TypeDict` class 
    * This will allow you to output all parameters as dictionnary keys 

* This `State` should have the following attributes:
    * input (as string)
    * `chat_history` which should be a `Sequence` of `BaseMessage`
        * Also use the prebuilt `add_message` method from `langgraph.graph.message` 
    * `context` (as string)
    * `answer` (as string)

The rest of the code should be the same as what you've learned so far about LangGraph

In [12]:
from typing import Sequence
from langchain_core.messages import HumanMessage, AIMessage, BaseMessage
from langgraph.checkpoint.memory import MemorySaver
from langgraph.graph import START, StateGraph
from langgraph.graph.message import add_messages
from typing_extensions import Annotated, TypedDict

# LangGraph uses the concepts of graphs which corresponds to a workflow 
# The first thing you need to do is to instanciate that graph using StateGraph.
# StateGraph needs to be provided a schema meaning the data it is expected to handle that is called states
# A State corresponds to the data stored at a given moment in your graph as well as functions (called "reducers") 
# which purpose is to update the State.
# In our case, we use MessagesState which is pre-configured State meant for messages

### Statefully manage chat history ###
class State(TypedDict):
    input: str
    chat_history: Annotated[Sequence[BaseMessage], add_messages]
    context: str
    answer: str


def call_model(state: State):
    print(state)
    response = question_answer_chain.invoke(state)
    return {
        "chat_history": [
            HumanMessage(state["input"]),
            AIMessage(response["answer"]),
        ],
        "context": response["context"],
        "answer": response["answer"],
    }


workflow = StateGraph(state_schema=State)
workflow.add_edge(START, "model")
workflow.add_node("model", call_model)

memory = MemorySaver()
app = workflow.compile(checkpointer=memory)

## Step IX - Test your app 

Now it's time to test your application! Try it 🤗

In [13]:
# Define the configuration for the workflow
config = {"configurable": {"thread_id": "test_thread"}}

query = "Hi my name is Obi-Wan, what is the purpose of an LLM?"

# Run the workflow
output = app.invoke({"input": query}, config)

# Display the result
output

{'input': 'Hi my name is Obi-Wan, what is the purpose of an LLM?', 'chat_history': []}


{'input': 'Hi my name is Obi-Wan, what is the purpose of an LLM?',
 'chat_history': [HumanMessage(content='Hi my name is Obi-Wan, what is the purpose of an LLM?', additional_kwargs={}, response_metadata={}, id='df4b4ce7-3a69-4865-b825-902c8c2a9c06'),
  AIMessage(content='The purpose of a Large Language Model (LLM) is to understand and generate human language based on patterns it has learned from vast amounts of text data. LLMs are a type of neural network designed to handle a wide range of tasks, including summarization, text generation, creative writing, question-answering, programming, and more. They are trained on massive datasets from various sources like web pages, books, and transcripts, allowing them to learn and adapt to different contexts and tasks.', additional_kwargs={}, response_metadata={}, id='9f050a0f-10d9-4255-8424-207ec27a7305')],
 'context': "This video is going to give you everything you need to go from knowing absolutely nothing about artificial intelligence and lar

## Step X - Limits of your app 

Now your application works! 👏 Though this solution is great, it has some drawbacks. For example, try to insert unrelated information to the topic in your prompt and then test whether your LLM remembers that information:

In [14]:
from langchain_core.messages import HumanMessage

# Define the configuration for the workflow
config = {"configurable": {"thread_id": "test_thread"}}

query = "Hi my name is Obi-Wan, what is the purpose of an LLM?"

# Run the workflow
q1 = app.invoke({"input": query}, config)
q2 = app.invoke({"input": "How can I train them?"}, config)
q3 = app.invoke({"input": "What is my name?"}, config)


print( q1["chat_history"][-1].pretty_print())
print( q2["chat_history"][-1].pretty_print())
print( q3["chat_history"][-1].pretty_print())

{'input': 'Hi my name is Obi-Wan, what is the purpose of an LLM?', 'chat_history': [HumanMessage(content='Hi my name is Obi-Wan, what is the purpose of an LLM?', additional_kwargs={}, response_metadata={}, id='df4b4ce7-3a69-4865-b825-902c8c2a9c06'), AIMessage(content='The purpose of a Large Language Model (LLM) is to understand and generate human language based on patterns it has learned from vast amounts of text data. LLMs are a type of neural network designed to handle a wide range of tasks, including summarization, text generation, creative writing, question-answering, programming, and more. They are trained on massive datasets from various sources like web pages, books, and transcripts, allowing them to learn and adapt to different contexts and tasks.', additional_kwargs={}, response_metadata={}, id='9f050a0f-10d9-4255-8424-207ec27a7305')], 'context': "This video is going to give you everything you need to go from knowing absolutely nothing about artificial intelligence and large l

In [15]:
client.close()