**# Conversational RAG**

In many Q&A applications we want to allow the user to have a back-and-forth conversation, meaning the application needs some sort of "memory" of past questions and answers, and some logic for incorporating those into its current thinking.


We will cover two approaches:

- **Chains,** in which we always execute a retrieval step;
- **Agents,** in which we give an LLM discretion over whether and how to execute a retrieval step (or multiple steps).


# Setup

In [3]:
! pip install --upgrade --quiet  langchain langchain-community langchainhub langchain-chroma bs4

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m974.6/974.6 kB[0m [31m15.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.2/2.2 MB[0m [31m42.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m315.6/315.6 kB[0m [31m23.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m125.2/125.2 kB[0m [31m7.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m526.8/526.8 kB[0m [31m32.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m92.0/92.0 kB[0m [31m8.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.4/2.4 MB[0m [31m53.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m62.4/62.4 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━

In [4]:
import os
from google.colab import userdata

In [7]:
if not os.environ["COHERE_API_KEY"]:
    os.environ["COHERE_API_KEY"]=userdata.get('COHERE_API_KEY')
if not os.environ["LANGCHAIN_API_KEY"]:
    os.environ["LANGCHAIN_API_KEY"]=userdata.get('LANGCHAIN_API_KEY')
os.environ["LANGCHAIN_TRACING_V2"]="true"
os.environ["LANGCHAIN_PROJECT"]="Conversational-RAG"

# Chains

In [10]:
! pip uninstall -y transformers
! pip install -q langchain-cohere

Found existing installation: transformers 4.41.2
Uninstalling transformers-4.41.2:
  Successfully uninstalled transformers-4.41.2


In [35]:
from langchain_community.document_loaders import WebBaseLoader # pdf loader
from langchain_text_splitters import RecursiveCharacterTextSplitter # splitter
from langchain_chroma import Chroma # vectorstore
from langchain_cohere import CohereEmbeddings # embeddings
from langchain.chains import create_retrieval_chain # chain
from langchain.chains.combine_documents import create_stuff_documents_chain # chain
from langchain_core.prompts import ChatPromptTemplate # prompt


In [42]:
!pip install --quiet bs4

In [43]:
import bs4
# # 1. Load data
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/",
                       bs_kwargs=dict(
                           parse_only=bs4.SoupStrainer(
                               class_=("post-content", "post-title", "post-header")
                           )
                       )
                       )
docs = loader.load()

In [50]:
# 2. chunk documents

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
doc_splits = text_splitter.split_documents(docs)

In [51]:
len(doc_splits)

66

In [52]:
doc_splits[13].page_content

'In comparison with three baselines, including ED (expert distillation, behavior cloning with expert trajectories instead of learning history), source policy (used for generating trajectories for distillation by UCB), RL^2 (Duan et al. 2017; used as upper bound since it needs online RL), AD demonstrates in-context RL with performance getting close to RL^2 despite only using offline RL and learns much faster than other baselines. When conditioned on partial training history of the source policy, AD also improves much faster than ED baseline.'

In [54]:
# 3. index the contents of the blog to create a retriever.
vectorstore = Chroma.from_documents(documents=doc_splits, embedding=CohereEmbeddings())
retriever = vectorstore.as_retriever()

In [55]:
# 4. LLM model
from langchain_cohere import ChatCohere
model = ChatCohere(model="command-r")

In [56]:
# 5. Prompt template

system_prompt = """
You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answerthe question.
If you don't know the answer, say that you don't know.
Use three sentences maximum and keep the answer concise.
\n\n
{context}
"""

prompt = ChatPromptTemplate.from_messages(
    [
        ('system', system_prompt),
        ('human', '{input}')
    ]
)

question_answer_chain = create_stuff_documents_chain(llm=model, prompt=prompt)
rag_chain = create_retrieval_chain(retriever=retriever, combine_docs_chain=question_answer_chain)

In [59]:
respone = rag_chain.invoke({"input": "What is Task Decomposition?"})
respone['answer']

"Task decomposition is a process that breaks down complex tasks into simpler, more manageable steps. It's a crucial component of planning in autonomous agent systems, enabling them to tackle complicated assignments by first understanding and decomposing them into clear, executable actions. This technique enhances the agent's ability to tackle challenging tasks by making them more accessible and interpretable.\n\nIn the context of my architecture, as illustrated in the provided figure, task decomposition is achieved through methods like Chain of Thought or Tree of Thoughts prompting techniques, which help me think step-by-step and decompose tasks into manageable parts."

# Adding chat history

To enable conversational context in our system, we need to update our app in two ways:

- **Prompt update:** Modify the prompt to support historical messages as input.
Contextualizing
- **questions:** Add a sub-chain that takes the latest user question and reformulates it in the context of the chat history. This involves creating a new "history-aware" retriever that takes both the query and conversation history as input, and outputs a rephrased query that can be used by the retriever.

To implement this, we'll use a prompt with a MessagesPlaceholder variable named "chat_history" that allows us to pass in a list of messages to the prompt. We'll also use a helper function called create_history_aware_retriever that manages the case where chat_history is empty and applies a sequence of prompt, LLM, StrOutputParser(), and retriever.

The create_history_aware_retriever function constructs a chain that accepts keys input and chat_history as input and has the same output schema as a retriever. This will enable our system to understand user queries that require conversational context to be understood.

In [64]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

contextualize_q_system_prompt = ("""
    Given a chat history and the latest user question
    which might reference context in the chat history,
    formulate a standalone question which can be understood
    without the chat history. Do NOT answer the question,
    just reformulate it if needed and otherwise return it as is.
"""
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)


history_aware_retriever = create_history_aware_retriever(llm=model, retriever=retriever, prompt=contextualize_q_prompt)




**create_history_aware_retriever()**
- If there is no chat_history, then the input is just passed directly to the retriever.
- If there is chat_history, then the prompt and LLM will be used to generate a search query. That search query is then passed to the retriever.

In [65]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import MessagesPlaceholder

from langchain_core.prompts import ChatPromptTemplate

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}")
        ]
)

question_answer_chain = create_stuff_documents_chain(llm=model, prompt=qa_prompt)
rag_chain = create_retrieval_chain(retriever=history_aware_retriever, combine_docs_chain=question_answer_chain)


In [66]:
from langchain_core.messages import HumanMessage, AIMessage

chat_history = []
question = "What is Task Decomposition?"

ai_message_1 = rag_chain.invoke({"input": question, "chat_history": chat_history})

chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content=ai_message_1["answer"]),
    ]
)

second_question = "What are common ways of doing it?"
ai_message_2 = rag_chain.invoke({"input": second_question, "chat_history": chat_history})

ai_message_2["answer"]

'There are three primary methods: \n\n- The LLM can be prompted with simple instructions to break a task down into steps, for instance, asking for "Steps for..." followed by the task description. \n\n- Alternatively, the agent can use more specific instructions designed for the particular task, such as outlining a story. \n\n- The final method involves human input, where a human user provides the breakdown steps.'

In [67]:
ai_message_3 = rag_chain.invoke({"input": "any other ways of doing it?", "chat_history": chat_history})
ai_message_3["answer"]

"Yes, another technique is called Chain of Thought (CoT). It's a prompting method that guides the agent to think step-by-step and use test-time computation to solve complex tasks. Essentially, the agent is instructed to verbalize its thought process, which helps it tackle difficult tasks by breaking them down into smaller, more feasible sub-tasks."

## Stateful management of chat history

To manage chat history in a stateful manner, you can use BaseChatMessageHistory to store the chat history and RunnableWithMessageHistory to handle injecting and updating the chat history.

In [70]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}

def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

conversational_rag_chain = RunnableWithMessageHistory(
    runnable=rag_chain,
    get_session_history=get_session_history,
    input_messages_key = "input",
    history_messages_key = "chat_history",
    output_messages_key= "answer",
)


In [71]:
conversational_rag_chain.invoke(
    {"input": "What is Task Decomposition?"},
    config={
        "configurable": {"session_id": "abc123"}
    },  # constructs a key "abc123" in `store`.
)["answer"]



"Task decomposition is a process that breaks down complex tasks into simpler, more manageable steps. It's a technique used by autonomous agents to plan and execute multi-step tasks more effectively. This approach improves the agent's ability to complete a task by providing a clear structure and allowing for better interpretation of the model's thought process."

In [72]:
conversational_rag_chain.invoke(
    {"input": "What are common ways of doing it?"},
    config={"configurable": {"session_id": "abc123"}},
)["answer"]



'There are three primary methods: \n\n1. The LLM can be prompted with simple instructions to break a task into steps, for instance, asking for "Steps for XYZ".\n\n2. Using more specific instructions tailored to the task, such as outlining story steps for writing a novel.\n\n3. Incorporating human input to guide the task decomposition process. This method often involves human-in-the-loop interactions, where a person provides explicit step-by-step instructions.'

In [73]:
## The conversation history can be inspected in the store dict:

for message in store["abc123"].messages:
    if isinstance(message, AIMessage):
        prefix = "AI"
    else:
        prefix = "User"

    print(f"{prefix}: {message.content}\n")

User: What is Task Decomposition?

AI: Task decomposition is a process that breaks down complex tasks into simpler, more manageable steps. It's a technique used by autonomous agents to plan and execute multi-step tasks more effectively. This approach improves the agent's ability to complete a task by providing a clear structure and allowing for better interpretation of the model's thought process.

User: What are common ways of doing it?

AI: There are three primary methods: 

1. The LLM can be prompted with simple instructions to break a task into steps, for instance, asking for "Steps for XYZ".

2. Using more specific instructions tailored to the task, such as outlining story steps for writing a novel.

3. Incorporating human input to guide the task decomposition process. This method often involves human-in-the-loop interactions, where a person provides explicit step-by-step instructions.



![Conversational RAG](https://python.langchain.com/v0.2/assets/images/conversational_retrieval_chain-5c7a96abe29e582bc575a0a0d63f86b0.png)