# Chat Applications with Memory

In many Q&A applications we want to allow the user to have a back-and-forth conversation. This means the application needs some sort of memory—the ability to recall past questions and answers—and incorporate that memory into its current reasoning.

Adding logic to incorporate historical messages with 2 approaches :
- Chains : Always execute a retrieval step using past interactions.
- Agents : Allow the LLM to decide if and how to perform retrieval steps, possibly including multiple retrieval actions.



Here’s a step-by-step implementation using LangChain and Groq API, to:
- Load data from a web portal using WebBaseLoader
- Apply HuggingFace embeddings with model: "all-MiniLM-L6-v2"
- Store and search using Chroma as the vector store

## Step 1: Load data from a weburl

In [5]:
from langchain_community.document_loaders import WebBaseLoader
url = 'https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/'
import bs4
loader = WebBaseLoader(
    web_paths=(url,),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

documents=loader.load()
documents

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/'}, page_content='\n\n      Prompt Engineering\n    \nDate: March 15, 2023  |  Estimated Reading Time: 21 min  |  Author: Lilian Weng\n\n\nPrompt Engineering, also known as In-Context Prompting, refers to methods for how to communicate with LLM to steer its behavior for desired outcomes without updating the model weights. It is an empirical science and the effect of prompt engineering methods can vary a lot among models, thus requiring heavy experimentation and heuristics.\nThis post only focuses on prompt engineering for autoregressive language models, so nothing with Cloze tests, image generation or multimodality models. At its core, the goal of prompt engineering is about alignment and model steerability. Check my previous post on controllable text generation.\n[My personal spicy take] In my opinion, some prompt engineering papers are not worthy 8 pages long, since those tricks can be exp

## Step 2: Split documents into chunks

In [7]:
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=100
)
docs = text_splitter.split_documents(documents)
docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/'}, page_content='Prompt Engineering\n    \nDate: March 15, 2023  |  Estimated Reading Time: 21 min  |  Author: Lilian Weng'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/'}, page_content='Prompt Engineering, also known as In-Context Prompting, refers to methods for how to communicate with LLM to steer its behavior for desired outcomes without updating the model weights. It is an empirical science and the effect of prompt engineering methods can vary a lot among models, thus requiring heavy experimentation and heuristics.'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/'}, page_content='This post only focuses on prompt engineering for autoregressive language models, so nothing with Cloze tests, image generation or multimodality models. At its core, the goal of prompt engineering is about al

## Step 3: Initialize HuggingFace Embeddings (MiniLM model)

In [8]:
os.environ['HF_TOKEN']=os.getenv("HF_TOKEN")
from langchain_huggingface import HuggingFaceEmbeddings
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")

  from .autonotebook import tqdm as notebook_tqdm


## Step 4: Create a Chroma vector store and setup retriever

In [11]:
from langchain.vectorstores import Chroma
# Create a Chroma vector store
vectorstore = Chroma.from_documents(documents=docs, embedding=embedding_model)

# Set up retriever from the vector store
retriever = vectorstore.as_retriever()
retriever

VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000002A50D2322F0>, search_kwargs={})

## Step 5: Initialize Groq LLM

In [12]:
# Load GROQ key
from dotenv import load_dotenv
load_dotenv()

import os
groq_api_key = os.getenv('GROQ_API_KEY')

# Initialize model
from langchain_groq import ChatGroq
llm =ChatGroq(model_name="Llama3-8b-8192", groq_api_key = groq_api_key)
llm

ChatGroq(client=<groq.resources.chat.completions.Completions object at 0x000002A51C1E12D0>, async_client=<groq.resources.chat.completions.AsyncCompletions object at 0x000002A51C1E2290>, model_name='Llama3-8b-8192', model_kwargs={}, groq_api_key=SecretStr('**********'))

## Step 6: Define your custom system prompt using ChatPromptTemplate

In [13]:
## Prompt Template
from langchain.prompts import ChatPromptTemplate

system_prompt = """
    You are an intelligent assistant. Use the following context to answer questions accurately and concisely. 
    If the answer is not in the context, say so clearly.
    Use three sentences maximum and keep the answer concise. {context}
    """
prompt = ChatPromptTemplate.from_messages([
    ("system", system_prompt),
    ("human", "{input}")
])

## Step 7: Create chain with your prompt

In [14]:
from langchain.chains.combine_documents import create_stuff_documents_chain
document_chain = create_stuff_documents_chain(llm=llm, prompt=prompt)

## Step 8: Create the retrieval chain

In [29]:
from langchain.chains.retrieval import create_retrieval_chain
qa_chain = create_retrieval_chain(retriever, document_chain)

## Step 9: Now Ask your question

In [33]:
response = qa_chain.invoke({"input": "What is Chain-of-Thought ?"})
response

{'input': 'What is Chain-of-Thought ?',
 'context': [Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/'}, page_content='Chain-of-Thought (CoT)#\nChain-of-thought (CoT) prompting (Wei et al. 2022) generates a sequence of short sentences to describe reasoning logics step by step, known as reasoning chains or rationales, to eventually lead to the final answer. The benefit of CoT is more pronounced for complicated reasoning tasks, while using large models (e.g. with more than 50B parameters). Simple tasks only benefit slightly from CoT prompting.\nTypes of CoT prompts#\nTwo main types of CoT prompting:'),
  Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-03-15-prompt-engineering/'}, page_content='Chain-of-Thought (CoT)#\nChain-of-thought (CoT) prompting (Wei et al. 2022) generates a sequence of short sentences to describe reasoning logics step by step, known as reasoning chains or rationales, to eventually lead to the fina

In [34]:
response['answer']

'Chain-of-Thought (CoT) is a type of prompting that generates a sequence of short sentences to describe reasoning logics step by step, known as reasoning chains or rationales, to eventually lead to the final answer. The benefit of CoT is more pronounced for complicated reasoning tasks, while using large models (e.g. with more than 50B parameters).'

In [36]:
response = qa_chain.invoke({"input": "How to do"})
response['answer']

'I apologize, but the context provided does not include any specific instructions or examples related to "How to do". Therefore, I cannot provide an accurate answer. If you could provide more context or clarify what you mean by "How to do", I\'ll do my best to assist you.'

In [37]:
response = qa_chain.invoke({"input": "How to do Chain-of-Thought"})
response['answer']

'Chain-of-Thought (CoT) prompting involves generating a sequence of short sentences to describe reasoning logics step by step, known as reasoning chains or rationales, to eventually lead to the final answer.'

## Add Chat History

With chat history, your retrieval becomes context-aware, which is essential for multi-turn Q&A like:

- User: What is Chain-of-Thought ?
- User: How to do

The second question depends on prior chat turns — that’s where this will help.

### History-aware retriever prompt 

In [38]:
from langchain.prompts import MessagesPlaceholder
retriever_prompt = ChatPromptTemplate.from_messages([
    ("system", "Given the chat history and user question, improve the question for retrieval."),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}")
])

### Create history-aware retriever

In [41]:
from langchain.chains import create_history_aware_retriever
history_aware_retriever = create_history_aware_retriever(
    llm=llm,
    retriever=retriever,
    prompt=retriever_prompt
)
history_aware_retriever

RunnableBinding(bound=RunnableBranch(branches=[(RunnableLambda(lambda x: not x.get('chat_history', False)), RunnableLambda(lambda x: x['input'])
| VectorStoreRetriever(tags=['Chroma', 'HuggingFaceEmbeddings'], vectorstore=<langchain_community.vectorstores.chroma.Chroma object at 0x000002A50D2322F0>, search_kwargs={}))], default=ChatPromptTemplate(input_variables=['chat_history', 'input'], input_types={'chat_history': list[typing.Annotated[typing.Union[typing.Annotated[langchain_core.messages.ai.AIMessage, Tag(tag='ai')], typing.Annotated[langchain_core.messages.human.HumanMessage, Tag(tag='human')], typing.Annotated[langchain_core.messages.chat.ChatMessage, Tag(tag='chat')], typing.Annotated[langchain_core.messages.system.SystemMessage, Tag(tag='system')], typing.Annotated[langchain_core.messages.function.FunctionMessage, Tag(tag='function')], typing.Annotated[langchain_core.messages.tool.ToolMessage, Tag(tag='tool')], typing.Annotated[langchain_core.messages.ai.AIMessageChunk, Tag(tag

### QA prompt

In [47]:
qa_prompt = ChatPromptTemplate.from_messages([
    ("system", "You are a helpful assistant. Use the context below to answer the user's question. {context}"),
    MessagesPlaceholder(variable_name="chat_history"),
    ("human", "{input}")
])


### Create the document chain and compose final retrieval chain

In [49]:
history_document_chain = create_stuff_documents_chain(llm=llm, prompt=qa_prompt)
history_retrieval_chain = create_retrieval_chain(history_aware_retriever, history_document_chain)

### By Manual chat history management

In [50]:
from langchain_core.messages import AIMessage,HumanMessage
chat_history=[]
question1="What is Chain-of-Thought ?"
response1=history_retrieval_chain.invoke({"input":question1,"chat_history":chat_history})

chat_history.extend(
    [
        HumanMessage(content=question1),
        AIMessage(content=response1["answer"])
    ]
)

question2="How to do it?"
response2=history_retrieval_chain.invoke({"input":question2,"chat_history":chat_history})
print(response2['answer'])

There are two main types of Chain-of-Thought (CoT) prompts:

1. Zero-shot CoT: Use natural language statements like:
	* "Let's think step by step..."
	* "Let's work this out step by step to be sure we have the right answer..."
	* "Therefore, the answer is..."

These prompts explicitly encourage the model to generate reasoning chains and then produce the answer.

Let me know if you have any further questions or if there's anything else I can help you with!


In [None]:
from langchain_core.messages import AIMessage, HumanMessage

chat_history = []

while True:
    user_input = input("You: ")
    if user_input.lower() in ["exit", "quit"]:
        break

    response = history_retrieval_chain.invoke({
        "input": user_input,
        "chat_history": chat_history
    })
    print("User:", user_input)
    print("Assistant:", response["answer"])

    chat_history.extend([
        HumanMessage(content=user_input),
        AIMessage(content=response["answer"])
    ])


Assistant: Chain-of-Thought (CoT) is a type of prompting technique that generates a sequence of short sentences to describe reasoning logics step by step, known as reasoning chains or rationales, to eventually lead to the final answer.
Assistant: There are two main types of Chain-of-Thought (CoT) prompts:

1. Zero-shot CoT: Use natural language statements like:
	* "Let's think step by step to explicitly encourage the model to first generate reasoning chains and then to prompt with 'Therefore, the answer is...' to produce answers."
	* "Let's work this out step by step to be sure we have the right answer."

These prompts encourage the model to generate a chain of thoughts to arrive at the answer.
