# Chapter 4. Memory: Enabling Your Chatbot to Learn from Interactions

## A simple version of this memory system using LangChain

In [1]:
pip install -U langchain-community

Collecting langchain-community
  Using cached langchain_community-0.3.24-py3-none-any.whl.metadata (2.5 kB)
Collecting langchain-core<1.0.0,>=0.3.59 (from langchain-community)
  Using cached langchain_core-0.3.62-py3-none-any.whl.metadata (5.8 kB)
Collecting langchain<1.0.0,>=0.3.25 (from langchain-community)
  Using cached langchain-0.3.25-py3-none-any.whl.metadata (7.8 kB)
Collecting pydantic-settings<3.0.0,>=2.4.0 (from langchain-community)
  Using cached pydantic_settings-2.9.1-py3-none-any.whl.metadata (3.8 kB)
Collecting httpx-sse<1.0.0,>=0.4.0 (from langchain-community)
  Using cached httpx_sse-0.4.0-py3-none-any.whl.metadata (9.0 kB)
Collecting langchain-text-splitters<1.0.0,>=0.3.8 (from langchain<1.0.0,>=0.3.25->langchain-community)
  Using cached langchain_text_splitters-0.3.8-py3-none-any.whl.metadata (1.9 kB)
Using cached langchain_community-0.3.24-py3-none-any.whl (2.5 MB)
Using cached httpx_sse-0.4.0-py3-none-any.whl (7.8 kB)
Using cached langchain-0.3.25-py3-none-any.wh

In [86]:
# Import modules
from langchain.memory import ChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.prompts.chat import MessagesPlaceholder
from langchain_core.runnables import chain
from langchain_core.chat_history import InMemoryChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.runnables import RunnablePassthrough
from langchain_core.messages import SystemMessage, HumanMessage, AIMessage
from langchain_core.messages.utils import trim_messages, filter_messages
from langchain_community.document_loaders import TextLoader
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import RecursiveCharacterTextSplitter
from operator import itemgetter
from langchain.llms import Ollama

In [3]:
# Create a prompt template
prompt_temp = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer all questions to the best of your ability."),
    ("placeholder", "{messages}"),
])

# Obtain a chat model
# llm = Ollama(model="deepseek-r1:7b")
llm = Ollama(model="gemma3:4b")


# Create a simple chain
chain = prompt_temp | llm

  llm = Ollama(model="gemma3:4b")


In [4]:
#Invoke the chain.
chain.invoke({
    "messages": [
        ("human","What is your name?")
    ],
})

'My name is Gemma. I’m a large language model created by the Gemma team at Google DeepMind.'

In [5]:
#Invoke the chain.
# Note how the incorporation of the previous conversation in the chain enabled the model to answer the follow-up question in a context-aware manner.
chain.invoke({
    "messages": [
        ("human","What is your name?"),
        ("ai", "My name is John."),
        ("human", "Sorry. What is your name again?"),
    ],
})

'My name is John.'

Whilst this may work for demo purposes, it won’t scale in a production environment because the list of conversation messages can grow significantly. Fortunately, LangChain provides a core utility class called *ChatMessageHistory*, which makes it easier to implement this memory system.

In [6]:
# Create a chat history object that can store messages in memory
chat_history = InMemoryChatMessageHistory()

# Add a user message to the chat history
chat_history.add_user_message("What is your name?")

# Add an AI message to the chat history
chat_history.add_ai_message("My name is John.")

# Print the chat history
chat_history.messages

[HumanMessage(content='What is your name?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='My name is John.', additional_kwargs={}, response_metadata={})]

In [7]:
# We can then integrate the stored chat messages into our chain and send a final prompt to the model
response = chain.invoke({
    "messages": chat_history.messages,
})
input = "Sorry, what is your name again?"
chat_history.add_user_message(input)
chain.invoke({
    "messages": chat_history.messages,
})

'My name is John.'

In the previous example, we integrated the chat messages into the chain explicitly but this requires the tedious manual management of each new message. In a production setting, we need a way to persist chat history and automate the insertion and updating of it.

To solve this problem, we can utilize LangChain’s RunnableWithMessageHistory class to automatically insert and update chat messages.

In [8]:
# First, let’s modify our prompt template to incorporate a chat_history parameter which will later contain all prior chat messages
prompt_temp = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer all questions to the best of your ability."),
    ("placeholder", "{history}"),
    ("human", "{input}"),
])
chain = prompt_temp | llm

In [9]:
# Next, let’s use the RunnableWithMessageHistory class to wrap our chain and incorporate the latest user input and chat history.
chat_history_for_chain = InMemoryChatMessageHistory()
chain_with_message_history = RunnableWithMessageHistory(
    chain,
    # Session_id is an identifier for the session (conversation) thread that the input messages correspond to. 
    # This allows you to maintain several conversations or threads with the same chain at the same time.
    lambda session_id: chat_history_for_chain,
    # An input_messages_key that specifies which part of the input should be tracked and stored in the chat history.
    # In this example, we want to track the string passed in as input (match with the "input" key in the prompt).
    input_messages_key="input",
    # A history_messages_key that specifies what the previous messages should be injected into the prompt as. 
    # Our prompt has a placeholder named "history", so we specify this property to match.
    history_messages_key="history",
    )

In [10]:
chat_history_for_chain

InMemoryChatMessageHistory(messages=[])

In [11]:
chain_with_message_history

RunnableWithMessageHistory(bound=RunnableBinding(bound=RunnableBinding(bound=RunnableAssign(mapper={
  history: RunnableBinding(bound=RunnableLambda(_enter_history), kwargs={}, config={'run_name': 'load_history'}, config_factories=[])
}), kwargs={}, config={'run_name': 'insert_history'}, config_factories=[])
| RunnableBinding(bound=RunnableLambda(_call_runnable_sync), kwargs={}, config={'run_name': 'check_sync_or_async'}, config_factories=[]), kwargs={}, config={'run_name': 'RunnableWithMessageHistory'}, config_factories=[]), kwargs={}, config={}, config_factories=[], get_session_history=<function <lambda> at 0x11a7f5240>, input_messages_key='input', history_messages_key='history', history_factory_config=[ConfigurableFieldSpec(id='session_id', annotation=<class 'str'>, name='Session ID', description='Unique identifier for a session.', default='', is_shared=True, dependencies=None)])

Let’s look at an example where we return a chat history corresponding to each session.

In [12]:
# Create the chain we used before
prompt_temp = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer all questions to the best of your ability."),
    ("placeholder", "{history}"),
    ("human", "{input}"),
])
llm = Ollama(model="gemma3:4b")
chain = prompt_temp | llm

# Keep track of the history for each combination of user_id and conversation_id.
# Note: This line uses type hinting in Python 3.9+, which indicates that histories is a dictionary that will map session_id strings to chat history objects.
# The code implies that when a new session starts, it can be stored in this dictionary like so: histories[session_id] = InMemoryChatMessageHistory()
histories: dict[str, InMemoryChatMessageHistory] = {}

# Define a function that takes a session_id as an argument and returns a chat history object.
# Note: This line also uses type hinting in Python 3.9+. Denoting that session_id is a string, and the default value is an empty string.
def get_session_history(session_id: str = ''):
    if session_id not in histories:
        histories[session_id] = InMemoryChatMessageHistory()
    return histories[session_id]

# Chain with history
with_message_history = RunnableWithMessageHistory(
    chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="history",
)

In [13]:
# In action by providing the input and session id
with_message_history.invoke(
    {"input": "hi im bob!"},
    config={"configurable": {"session_id": "123"}},
)

"Hi Bob! It's nice to meet you. 😊 How’s your day going so far? Is there anything you’d like to chat about, or were you just saying hello?"

In [14]:
# Continue the conversation
with_message_history.invoke(
    {"input": "whats my name?"},
    config={"configurable": {"session_id": "123"}},
)

'Your name is Bob! 😄 It’s really nice to chat with you.'

In [15]:
# New session_id --> does not remember
with_message_history.invoke(
    {"input": "whats my name?"},
    config={"configurable": {"session_id": "456"}},
)

"I have no way of knowing your name! As an AI, I don't have access to personal information unless you explicitly tell me. 😊 \n\nCould you tell me your name?"

In [16]:
# You can print the dictioinary
histories

{'123': InMemoryChatMessageHistory(messages=[HumanMessage(content='hi im bob!', additional_kwargs={}, response_metadata={}), AIMessage(content="Hi Bob! It's nice to meet you. 😊 How’s your day going so far? Is there anything you’d like to chat about, or were you just saying hello?", additional_kwargs={}, response_metadata={}), HumanMessage(content='whats my name?', additional_kwargs={}, response_metadata={}), AIMessage(content='Your name is Bob! 😄 It’s really nice to chat with you.', additional_kwargs={}, response_metadata={})]),
 '456': InMemoryChatMessageHistory(messages=[HumanMessage(content='whats my name?', additional_kwargs={}, response_metadata={}), AIMessage(content="I have no way of knowing your name! As an AI, I don't have access to personal information unless you explicitly tell me. 😊 \n\nCould you tell me your name?", additional_kwargs={}, response_metadata={})])}

# How to Modify Chat History
In many cases, the chat history messages aren’t in the best state or format to generate an accurate response from the model. To overcome this problem, we can modify the chat history in a variety of ways.

## Trimming messages​
LLMs have limited context windows, therefore, the final prompt sent to the model can’t exceed the model’s input token limits. In addition, excessive prompt information can distract the model and lead to hallucination.

An effective solution to this problem is to limit the number of messages retrieved from chat history and appended to the prompt. In practice, we need only to load and store the most recent chat n history messages. Let’s use an example chat history with some preloaded messages.

In [None]:
# We cannot use OPEN ai token counter, so lets configure a different one

# # Define the LangChain's trim_messages function
# trimmer = trim_messages(
#     max_tokens=65,
#     # Maintain the last messages 
#     strategy="last",
#     token_counter=ChatOpenAI(model="gpt-4o"),
#     # Include the system message
#     include_system=True,
#     # Do not allow partial messages
#     allow_partial=False,
#     # start_on=”human” ensures that we never remove an AIMessage (that is a response from the model) 
#     # without also removing corresponding HumanMessage (ie the question for that response).
#     start_on="human",
# )

In [17]:
#WE need to figure this piece out, that counts no of tokens.

def dummy_token_counter(messages) -> int:
    # treat each message like it adds 3 default tokens at the beginning
    # of the message and at the end of the message. 3 + 4 + 3 = 10 tokens
    # per message.

    default_content_len = 4
    default_msg_prefix_len = 3
    default_msg_suffix_len = 3

    count = 0
    for msg in messages:
        if isinstance(msg.content, str):
            count += default_msg_prefix_len + default_content_len + default_msg_suffix_len
        if isinstance(msg.content, list):
            count += default_msg_prefix_len + len(msg.content) *  default_content_len + default_msg_suffix_len
    return count

In [18]:
# Define the LangChain's trim_messages function
trimmer = trim_messages(
    max_tokens=65,
    # Maintain the last messages 
    strategy="last",
    token_counter=dummy_token_counter,
    # Include the system message
    include_system=True,
    # Do not allow partial messages
    allow_partial=False,
    # start_on=”human” ensures that we never remove an AIMessage (that is a response from the model) 
    # without also removing corresponding HumanMessage (ie the question for that response).
    start_on="human",
)

In [19]:
# Create a long message
messages = [
    SystemMessage(content="you're a good assistant"),
    HumanMessage(content="hi! I'm bob"),
    AIMessage(content="hi!"),
    HumanMessage(content="I like vanilla ice cream"),
    AIMessage(content="nice"),
    HumanMessage(content="whats 2 + 2"),
    AIMessage(content="4"),
    HumanMessage(content="thanks"),
    AIMessage(content="no problem!"),
    HumanMessage(content="having fun?"),
    AIMessage(content="yes!"),
]

# Trim the message
trimmer.invoke(messages)

[SystemMessage(content="you're a good assistant", additional_kwargs={}, response_metadata={}),
 HumanMessage(content='thanks', additional_kwargs={}, response_metadata={}),
 AIMessage(content='no problem!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='having fun?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='yes!', additional_kwargs={}, response_metadata={})]

Now, let’s incorporate the trimmer into a chain and RunnableWithMessageHistory. To use it in the chain, we need to ensure that the trimmer is run before the messages input to our prompt.

In [20]:
# Create a prompt template
prompt_temp = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Answer all questions to the best of your ability."),
    ("placeholder", "{messages}"),
])

# Obtain an LLM
llm = Ollama(model="gemma3:4b")

# This makes a "messages" key available to prompt template,
# after passing the input messages list through the trimmer 
chain = {"messages": trimmer} | prompt_temp | llm

# Tracking history
history = InMemoryChatMessageHistory()
with_message_history = RunnableWithMessageHistory(
    chain, 
    lambda: history
)

In [21]:
# Using it
with_message_history.invoke(
[HumanMessage(content="Today is a good day to learn about LangChain. Do you agree?")]
)
with_message_history.invoke(
[HumanMessage(content="Why is sky blue?")]
)
with_message_history.invoke(
[HumanMessage(content="What is the capital of France?")]
)
with_message_history.invoke(
[HumanMessage(content="Tell me a joke.")]
)
with_message_history.invoke(
[HumanMessage(content="What joke did you tell me? Could you repeat it?")]
)

'AI: Why don’t scientists trust atoms? \n\n... Because they make up everything! \n\n😄 \n\nWould you like to hear another joke, or perhaps we could explore a different topic?'

In [22]:
# Print the history of the conversation
history

InMemoryChatMessageHistory(messages=[HumanMessage(content='Today is a good day to learn about LangChain. Do you agree?', additional_kwargs={}, response_metadata={}), AIMessage(content='Absolutely! Today is a fantastic day to learn about LangChain. It’s a really exciting and rapidly evolving area of AI development, and there’s a huge amount of interesting stuff to discover. \n\nIt’s great that you’re taking the initiative to learn about it. \n\nTo help you get started, would you like me to:\n\n*   **Give you a brief overview of what LangChain is?** (A framework for building applications powered by language models)\n*   **Suggest some resources for beginners?** (Tutorials, documentation, etc.)\n*   **Answer a specific question you have about LangChain?** \n\nJust let me know where you’d like to start!', additional_kwargs={}, response_metadata={}), HumanMessage(content='Why is sky blue?', additional_kwargs={}, response_metadata={}), AIMessage(content='That’s a fantastic question – it’s on

## Summary memory
Aside from trimming messages, we can utilize the LLM to generate a summary of the conversation and then incorporate this summary into the prompt sent to the model.

In [23]:
# Use ChatMessageHistory() to save the chat history
# Note: while ChatMessageHistory serves as a base class for managing chat histories with potential for various storage implementations, 
# InMemoryChatMessageHistory is a concrete subclass that handles storage in memory.
demo_ephemeral_chat_history = ChatMessageHistory()
demo_ephemeral_chat_history.add_user_message("Hey there! I'm Nemo.")
demo_ephemeral_chat_history.add_ai_message("Hello!")
demo_ephemeral_chat_history.add_user_message("How are you today?")
demo_ephemeral_chat_history.add_ai_message("Fine thanks!")
demo_ephemeral_chat_history.messages

[HumanMessage(content="Hey there! I'm Nemo.", additional_kwargs={}, response_metadata={}),
 AIMessage(content='Hello!', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='How are you today?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='Fine thanks!', additional_kwargs={}, response_metadata={})]

In [24]:
# Create a prompt template
prompt = ChatPromptTemplate.from_messages(
    [("system", "You are a helpful assistant. Answer all questions to the best of your ability. The provided chat history includes facts about the user you are speaking with.",),
    MessagesPlaceholder(variable_name="chat_history"),
        ("user", "{input}"),
    ])

# Create a chain that uses the prompt template
chain = prompt | llm

# Chain with history
chain_with_message_history = RunnableWithMessageHistory(
    chain,
    lambda session_id: demo_ephemeral_chat_history,
    input_messages_key="input",
    history_messages_key="chat_history",
)

In [25]:
# Next, let’s create a function that will distill previous interactions into a summary. We can add this one to the front of the chain too.
def summarize_messages(chain_input):
    stored_messages = demo_ephemeral_chat_history.messages
    if len(stored_messages) == 0:
        return False
    summarization_prompt = ChatPromptTemplate.from_messages(
        [
            MessagesPlaceholder(variable_name="chat_history"),
            (
                "user",
                "Distill the above chat messages into a single summary message. Include as many specific details as you can.",
            ),
        ]
    )
    summarization_chain = summarization_prompt | llm
    summary_message = summarization_chain.invoke({"chat_history": stored_messages})
    demo_ephemeral_chat_history.clear()
    demo_ephemeral_chat_history.add_message(summary_message)
    return True

# Finally, we can add this function to the chain with the message history.
chain_with_summarization = (
    RunnablePassthrough.assign(messages_summarized=summarize_messages)
    | chain_with_message_history
)

In [26]:
# Now, let’s invoke the chain and see if it remembers the chat history.
chain_with_summarization.invoke(
    {"input": "What did I say my name was?"},
    {"configurable": {"session_id": "unused"}},
)

'You said your name was Nemo.'

## Filtering messages
As the list of chat history messages grows, a wider variety of types, sub-chains, and models may be utilized. LangChain provides a filter_messages helper that makes it easier to filter the chat history messages by type, id, or name.

In [27]:
# Filtering messages
messages = [
    SystemMessage("you are a good assistant", id="1"),
    HumanMessage("example input", id="2", name="example_user"),
    AIMessage("example output", id="3", name="example_assistant"),
    HumanMessage("real input", id="4", name="bob"),
    AIMessage("real output", id="5", name="alice"),
]
filter_messages(messages, include_types="human")

[HumanMessage(content='example input', additional_kwargs={}, response_metadata={}, name='example_user', id='2'),
 HumanMessage(content='real input', additional_kwargs={}, response_metadata={}, name='bob', id='4')]

In [28]:
# Another filtering example
filter_messages(messages, exclude_names=["example_user", "example_assistant"])

[SystemMessage(content='you are a good assistant', additional_kwargs={}, response_metadata={}, id='1'),
 HumanMessage(content='real input', additional_kwargs={}, response_metadata={}, name='bob', id='4'),
 AIMessage(content='real output', additional_kwargs={}, response_metadata={}, name='alice', id='5')]

In [29]:
# Another way to filter messages
filter_messages(messages, include_types=[HumanMessage, AIMessage], exclude_ids=["3"])

[HumanMessage(content='example input', additional_kwargs={}, response_metadata={}, name='example_user', id='2'),
 HumanMessage(content='real input', additional_kwargs={}, response_metadata={}, name='bob', id='4'),
 AIMessage(content='real output', additional_kwargs={}, response_metadata={}, name='alice', id='5')]

In [67]:
# The filter_messages helper can also be used imperatively (as above) or declaratively (as below), 
# making it easy to compose with other components in a chain
filter_ = filter_messages(exclude_names=["example_user", "example_assistant"])
chain = filter_ | llm

## Chat history with retrieval

In [31]:
from langchain_community.document_loaders import PyPDFLoader

In [32]:
pip install pypdf

Collecting pypdf
  Using cached pypdf-5.5.0-py3-none-any.whl.metadata (7.2 kB)
Using cached pypdf-5.5.0-py3-none-any.whl (303 kB)
Installing collected packages: pypdf
Successfully installed pypdf-5.5.0
Note: you may need to restart the kernel to use updated packages.


In [33]:
filepaths = ["/Users/hitesh.modi/Desktop/Kinda Personal/LLM Marketing Bot/pdf_files/Alex Hormozi 100 million leads.pdf"]

In [34]:
async def read_pdfs_into_pages(filepaths):
    pages = []
    for filepath in filepaths:
        loader = PyPDFLoader(filepath)
        async for page in loader.alazy_load():
            pages.append(page)
    return pages

In [35]:
doc = await read_pdfs_into_pages(filepaths)

In [37]:
doc[0:5]

[Document(metadata={'producer': 'calibre (4.99.5) [https://calibre-ebook.com]', 'creator': 'calibre (4.99.5) [https://calibre-ebook.com]', 'creationdate': '2023-08-21T11:14:02+00:00', 'author': 'Alex Hormozi', 'moddate': '2023-08-21T11:14:02+00:00', 'title': '$100M Leads: How to Get Strangers To Want To Buy Your Stuff', 'source': '/Users/hitesh.modi/Desktop/Kinda Personal/LLM Marketing Bot/pdf_files/Alex Hormozi 100 million leads.pdf', 'total_pages': 385, 'page': 0, 'page_label': '1'}, page_content=''),
 Document(metadata={'producer': 'calibre (4.99.5) [https://calibre-ebook.com]', 'creator': 'calibre (4.99.5) [https://calibre-ebook.com]', 'creationdate': '2023-08-21T11:14:02+00:00', 'author': 'Alex Hormozi', 'moddate': '2023-08-21T11:14:02+00:00', 'title': '$100M Leads: How to Get Strangers To Want To Buy Your Stuff', 'source': '/Users/hitesh.modi/Desktop/Kinda Personal/LLM Marketing Bot/pdf_files/Alex Hormozi 100 million leads.pdf', 'total_pages': 385, 'page': 1, 'page_label': '2'}, 

In [39]:
pip install sentence-transformers

Collecting sentence-transformers
  Using cached sentence_transformers-4.1.0-py3-none-any.whl.metadata (13 kB)
Collecting transformers<5.0.0,>=4.41.0 (from sentence-transformers)
  Using cached transformers-4.52.3-py3-none-any.whl.metadata (40 kB)
Collecting torch>=1.11.0 (from sentence-transformers)
  Using cached torch-2.7.0-cp310-none-macosx_11_0_arm64.whl.metadata (29 kB)
Collecting scikit-learn (from sentence-transformers)
  Using cached scikit_learn-1.6.1-cp310-cp310-macosx_12_0_arm64.whl.metadata (31 kB)
Collecting scipy (from sentence-transformers)
  Using cached scipy-1.15.3-cp310-cp310-macosx_14_0_arm64.whl.metadata (61 kB)
Collecting huggingface-hub>=0.20.0 (from sentence-transformers)
  Using cached huggingface_hub-0.32.2-py3-none-any.whl.metadata (14 kB)
Collecting filelock (from transformers<5.0.0,>=4.41.0->sentence-transformers)
  Using cached filelock-3.18.0-py3-none-any.whl.metadata (2.9 kB)
Collecting tokenizers<0.22,>=0.21 (from transformers<5.0.0,>=4.41.0->sentence-t

In [38]:
from langchain.embeddings import HuggingFaceEmbeddings

In [40]:
embedding = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2"
    ) 

  embedding = HuggingFaceEmbeddings(
  from .autonotebook import tqdm as notebook_tqdm


In [41]:
from langchain.vectorstores import FAISS

In [62]:
## Load the document 
# loader = TextLoader("TeachingwithGenerativeAI.txt")
# doc = loader.load()

## Split the document into chunks
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=20,
)
chunks = text_splitter.split_documents(doc)

## Define the embedding model
embed_model = embedding

# Create the vector store
vector_db = FAISS.from_documents(
    documents = chunks, 
    embedding = embed_model)

# Create the retriever
retriever = vector_db.as_retriever()

Next, let’s define a sub-chain that takes historical chat messages and the latest user question, and reformulates the question if it makes reference to any information in the historical information. We’ll then use this sub-chain inside the final RAG chain, which will, in order,

1. Rephrase the user’s question given the conversation history (if there is history)

2. Pass the rephrased question to the retriever (see above) to get the most relevant documents

3. Pass the original question, chat history and documents to the final prompt to generate an answer.

In [91]:
# Define a function to extract the content of a message
def get_msg_content(msg):
    # return msg.content
    return msg

# Define the SYSTEM prompt for contextualizing the chat history to come up with a standalone question
contextualize_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

# Define the prompt for contextualizing the chat history to come up with a standalone question
contextualize_prompt = ChatPromptTemplate.from_messages([
    ("system", contextualize_system_prompt),
    ("placeholder", "{chat_history}"),
    ("human", "{input}"),
])

# Define the chain for contextualizing the chat history to come up with a standalone question
contextualize_chain = (
    contextualize_prompt
    | llm
    | get_msg_content
)

In [92]:
# Define the question-answering SYSTEM prompt to generate the final answer
qa_system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

# Define the question-answering prompt to generate the final answer
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", qa_system_prompt),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
    ]
)

# Define the chain to generate the final answer
qa_chain = (
    qa_prompt
    | llm
    | get_msg_content
) 

In [93]:
#We have made lots of chains above, we need to use the langchain one
from langchain_core.runnables import chain

In [94]:
# Define the overall chain the uses both the retrieved documents and the chat history to answer the question
@chain
def history_aware_qa(input):
     # Rephrase the question if needed
     if input.get('chat_history'):
         question = contextualize_chain.invoke(input)
     else:
         question = input['input']
    
     # Get context from the retriever
     context = retriever.invoke(question)

     # Get the final answer
     return qa_chain.invoke({
         **input,
         "context": context
     })

In [95]:
# Next, let’s incorporate stateful management of chat history and send the final prompt, 
# including chat history and retrieved context to the model for an output.
chat_history_for_chain = InMemoryChatMessageHistory()
qa_with_history = RunnableWithMessageHistory(
    history_aware_qa,
    lambda _: chat_history_for_chain,
    input_messages_key="input",
    history_messages_key="chat_history",
)

In [98]:
# Finally, let's invoke the chain
qa_with_history.invoke(
    {"input": "Should faculty declare their AI policy in their classes?"},
    config={"configurable": {"session_id": "123"}},
)

"I don't have enough information to answer this question based on the provided context. The documents focus on training principles and automation strategies within a business context, not academic policies regarding AI use."

In [99]:
# Try ask a related question that refers to the previous question
qa_with_history.invoke(
    {"input": "What question did I just ask?"},
    config={"configurable": {"session_id": "123"}},
)

'You just asked: “Should faculty declare their AI policy in their classes?”'

In [100]:
# Try ask a related question that refers to the previous question
qa_with_history.invoke(
    {"input": "For your information, I would like to tell you that there are 1500 people in the team shaturbhatur"},
    config={"configurable": {"session_id": "123"}},
)

"I don't have enough information to answer this question. The provided documents discuss business strategies and training, not team size or personnel details."

In [101]:
# Try ask a related question that refers to the previous question
qa_with_history.invoke(
    {"input": "how many people are there in the team shaturbhatur?"},
    config={"configurable": {"session_id": "123"}},
)

"I don't have enough information to answer this question based on the provided context. The documents focus on training principles and automation strategies within a business context, not team size or personnel details."