# Conversational RAG

- There is a need for conversation. User wants chat tools to remember earlier messages and respond based on that context. 
- That means user expects AI tools to have some kind of 'memory'

As studied earlier chat history management will help in achieving the desired goal of making a converstational RAG

Two approaches could be used:
```mermaid
flowchart TD;
    A(Approaches) --> B[**Chain**: \n which executes a single retrieval step]
    A(Approaches) --> C[**Agents**: \n which can decide which retrieval step to execute or \n how may execution steps]
```


## Setup
- External knowledge source is a blog post
- Libraries
  - MistralAI Embeddings
  - Chroma vector store
  - Langchain

In [1]:
%%capture --no-stderr
%pip install --upgrade --quiet  langchain langchain-community langchainhub langchain-chroma bs4


[notice] A new release of pip available: 22.3.1 -> 24.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [5]:
%%capture --no-stderr
%pip install langchain-mistralai


[notice] A new release of pip available: 22.3.1 -> 24.1.2
[notice] To update, run: python.exe -m pip install --upgrade pip


In [2]:
import getpass
import os
from dotenv import load_dotenv
load_dotenv()

# os.environ["LANGCHAIN_TRACING_V2"] = "true"
if not os.environ["LANGCHAIN_API_KEY"]:
    os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()

In [3]:
if not os.environ["MISTRAL_API_KEY"]:
    os.environ["MISTRAL_API_KEY"] = getpass.getpass()

In [4]:
if not os.environ['HF_TOKEN']:
    os.environ['HF_TOKEN'] = getpass.getpass()

## Chains
### Using Chains is our first approach

- To load the data there are various loaders available with langchain
- We will use WebBaseLoader in this case to load data from a blog post
- BeautifulSoup is used to filter out the required content and load the same
- text splitter to split the large documents into small chunks
- store the data in chroma vector store

In [18]:
import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_chroma import Chroma
from langchain_mistralai.embeddings import MistralAIEmbeddings
from langchain_mistralai import ChatMistralAI
from langchain.prompts import ChatPromptTemplate
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain.chains.retrieval import create_retrieval_chain

# Only keep post title, headers, and content from the full HTML.
bs4_strainer = bs4.SoupStrainer(class_=("post-title", "post-header", "post-content"))
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={"parse_only": bs4_strainer},
)
docs = loader.load()




In [19]:
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000, chunk_overlap=200, 
    add_start_index=True  #index at which each split starts
)
all_splits = text_splitter.split_documents(docs)


In [32]:
if not os.environ['HF_TOKEN']:
    os.environ['HF_TOKEN'] = getpass.getpass()

from langchain_mistralai import MistralAIEmbeddings
embeddings = MistralAIEmbeddings()

In [63]:
vectorstore = Chroma.from_documents(documents=all_splits, embedding=MistralAIEmbeddings(max_concurrent_requests=1))


In [64]:
retriever = vectorstore.as_retriever()              #cretae a retriever
llm = ChatMistralAI(model_name="mistral-large-latest")

#2. System prompts

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system",system_prompt),
        ("human", "{input}")
    ]
)

#3. chain
q_and_a_chain = create_stuff_documents_chain(llm,prompt)
rag_chain = create_retrieval_chain(retriever, q_and_a_chain)

### Add a chat history
- We need to maintain and pass the chat history as context  
Earlier logic was:
```mermaid
flowchart LR;
    A[query];
    B[retriever];
    C[prompt];
    D[llm];
    E[answer]

    A --> B
    B --> C
    C --> D
    D --> E
```
Now there is a need to re-phrase the question by the LLM which will include chat history as context and  
then send it to retriever



```mermaid
flowchart LR;
    A[query + history];
    B[history aware retriever];
    C[prompt];
    D[llm];
    E[answer]

    A --> B
    B -- Rephrased Prompt --> C
    C --> D
    D --> E
```

The second step here is contextualizing the question with the help of a system prompt and LLM

In [65]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

contextualized_q_sys_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualized_q_prompt = ChatPromptTemplate.from_messages (
    [
        ("system", contextualized_q_sys_prompt),
        MessagesPlaceholder("chat_history"),
        ("human","{input}")
    ]
)

history_aware_retriever = create_history_aware_retriever(
    llm,
    retriever,
    contextualized_q_prompt
)

The history aware retriever is a sub chain that takes the query, chat history and creates a context aware prompt for the LLM  
Now we have to recreate our rag chain by chaning the prompt to accomodate chat history in messages  

In [93]:
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}")
    ]
)

qa_chain = create_stuff_documents_chain(llm,qa_prompt)
# qa_rag_chain = create_retrieval_chain(history_aware_retriever,qa_chain)

qa_rag_chain = (
    qa_chain
    | history_aware_retriever
)

[]

We can try this out  
All we need to do is ask a question and the ask another follow up question with reference to earlier question  
First Question: "What is Task Decomposition?"  
Followup Qeustion: "How does it affect the efficiency?"  
In the followup question, you might have noticed, that there is a reference to Task Decomposition by using "it"  
The model has to have information about historical chat to understand the reference and answer correctly  

In [95]:
from langchain_core.messages import HumanMessage, AIMessage
chat_history = []
first_question = "What is Task Decomposition?"
#invoke the rag chain
ai_answer = qa_rag_chain.invoke({"input": first_question, "chat_history":chat_history}, config={"max_concurrency":0})

chat_history.extend(
    [
        HumanMessage(content=first_question),
        AIMessage(content=ai_answer["answer"][:50])
    ]
)
from time import sleep

second_question = "How does it affect the efficiency?"  


ai_answer = qa_rag_chain.invoke({"input":second_question, "chat_history": chat_history}, config={"max_concurrency":5})

ai_answer


KeyError: 'context'

In [88]:
qa_rag_chain.config_schema

<bound method Runnable.config_schema of RunnableBinding(bound=RunnableAssign(mapper={
  context: RunnableBinding(bound=RunnableBranch(branches=[(RunnableLambda(lambda x: not x.get('chat_history', False)), RunnableLambda(lambda x: x['input'])
           | VectorStoreRetriever(tags=['Chroma', 'MistralAIEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x70b2f79f2c50>))], default=ChatPromptTemplate(input_variables=['chat_history', 'input'], input_types={'chat_history': typing.List[typing.Union[langchain_core.messages.ai.AIMessage, langchain_core.messages.human.HumanMessage, langchain_core.messages.chat.ChatMessage, langchain_core.messages.system.SystemMessage, langchain_core.messages.function.FunctionMessage, langchain_core.messages.tool.ToolMessage]]}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='Given a chat history and the latest user question which might reference context in the chat history, formulate a standalone q

In [79]:

chat_history

[HumanMessage(content='What is Task Decomposition?'),
 AIMessage(content='Task Decomposition is a method used to break down ')]

In [83]:
len(chat_history[1].content)

50

HTTPStatusError: Error response 429 while fetching https://api.mistral.ai/v1/chat/completions: {"message":"Requests rate limit exceeded"}