In [1]:
import os
from dotenv import load_dotenv
load_dotenv()
os.environ["LANGCHAIN_API_KEY"]=os.environ.get('LANGCHAIN_API_KEY')
os.environ["LANGCHAIN_TRACING_V2"]="true"
os.environ["LANGCHAIN_PROJECT"]="RAG-memory-chain"

# Chains
In a conversational RAG application, queries issued to the retriever should be informed by the context of the conversation. LangChain provides a **create_history_aware_retriever** constructor to simplify this. It constructs a chain that accepts keys input and **chat_history** as input, and has the same output schema as a retriever. **create_history_aware_retriever** requires as inputs:

1. LLM
2. Retriever
3. Prompt
   
First we obtain these objects:

# LLM

In [2]:
# LLM

from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")

# Basic Retriever

In [3]:
# Retriever

import bs4
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
docs = loader.load()

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
splits = text_splitter.split_documents(docs)
vectorstore = Chroma.from_documents(documents=splits, embedding=OpenAIEmbeddings())
retriever = vectorstore.as_retriever()

# Prompt

### First create a contextualized system prompt

We'll use a prompt that includes a MessagesPlaceholder variable under the name "chat_history". 

This allows us to pass in a list of Messages to the prompt using the "chat_history" input key, 

and these messages will be inserted after the system message and before the human message containing the latest question.

<span style="color:red">
The purpose of this prerequisite chain is to reformulate the question if required in the context of the chat history.

If it's not needed. The question is returned as it is.<span>



It takes in 3 things:
1. The system Prompt for question Reformulation
2. Chat History
3. Original Question

In [4]:
# Prompt

from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

# 1. Create History Aware Retriever Chain

We can then instantiate the history-aware retriever:

This chain prepends a rephrasing of the input query to our retriever, so that the retrieval incorporates the context of the conversation.



In [5]:
history_aware_retriever_chain = create_history_aware_retriever(
    llm, retriever, contextualize_q_prompt
)

# 2. Build QA Chain

As in the RAG tutorial, we will use **create_stuff_documents_chain** to generate a **question_answer_chain**, with input keys 

1. **context**
2. **chat_history**
3. **input**

It accepts the retrieved context alongside the conversation history and query to generate an answer.



In [6]:
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
question_answer_chain = create_stuff_documents_chain(llm, qa_prompt)

# 3. Build RAG Chain


We build our final **rag_chain** with **create_retrieval_chain**. 

This chain applies the 
**history_aware_retriever_chain** and **question_answer_chain** (created above)

in sequence, retaining intermediate outputs such as the retrieved context for convenience. It has input keys input and chat_history, and includes input, chat_history, context, and answer in its output.


In [7]:
rag_chain = create_retrieval_chain(history_aware_retriever_chain, question_answer_chain)

# Adding chat history
To manage the chat history, we will need:

An object for storing the chat history;
An object that wraps our chain and manages updates to the chat history.
For these we will use **BaseChatMessageHistory** and **RunnableWithMessageHistory**. The latter is a wrapper for an LCEL chain and a BaseChatMessageHistory that handles injecting chat history into inputs and updating it after each invocation.

In [8]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


In [9]:
# Function to generate a unique session ID
import uuid

def generate_session_id() -> str:
    return str(uuid.uuid4())

# 3. Build Conversational RAG Chain


Finally we build our final **conversational_rag_chain** with 
1. rag_chain 
2. get_session_history
3. input_messages_key="input"
4. history_messages_key="chat_history"
5. output_messages_key="answer"


In [10]:
conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

In [11]:
# Generating a dynamic session ID
session_id_1 = generate_session_id()

In [12]:
session_id_1

'80d3bb01-0503-46c1-ba97-f2b083450275'

In [13]:
response= conversational_rag_chain.invoke(
    {"input": "What is Task Decomposition?"},
    config={
        "configurable": {"session_id": session_id_1}
    },  # constructs a key "abc123" in `store`.
)

In [14]:
response

{'input': 'What is Task Decomposition?',
 'chat_history': [],
 'context': [Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.'),
  Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It

In [15]:
response["answer"]

'Task decomposition is the process of breaking down a complex task into smaller, more manageable steps. This can be achieved through techniques like Chain of Thought (CoT), where the model is prompted to think step by step, or Tree of Thoughts, which explores multiple reasoning possibilities at each step. By decomposing tasks, it becomes easier to tackle each component effectively, enhancing overall performance.'

In [16]:
conversational_rag_chain.invoke(
    {"input": "What are common ways of doing it?"},
    config={"configurable": {"session_id": session_id_1}},
)["answer"]

'Common ways of task decomposition include using simple prompting techniques like asking for "Steps for XYZ" or "What are the subgoals for achieving XYZ?". It can also involve task-specific instructions, such as "Write a story outline" for creative tasks, or incorporating human inputs to guide the decomposition process.'

In [17]:
conversational_rag_chain.invoke(
    {"input": "What are common ways of doing it?"},
    config={"configurable": {"session_id": session_id_1}},
)

{'input': 'What are common ways of doing it?',
 'chat_history': [HumanMessage(content='What is Task Decomposition?'),
  AIMessage(content='Task decomposition is the process of breaking down a complex task into smaller, more manageable steps. This can be achieved through techniques like Chain of Thought (CoT), where the model is prompted to think step by step, or Tree of Thoughts, which explores multiple reasoning possibilities at each step. By decomposing tasks, it becomes easier to tackle each component effectively, enhancing overall performance.'),
  HumanMessage(content='What are common ways of doing it?'),
  AIMessage(content='Common ways of task decomposition include using simple prompting techniques like asking for "Steps for XYZ" or "What are the subgoals for achieving XYZ?". It can also involve task-specific instructions, such as "Write a story outline" for creative tasks, or incorporating human inputs to guide the decomposition process.')],
 'context': [Document(metadata={'sou

# Streaming final outputs


The .stream method will by default stream each key in a sequence.

In [30]:
stream= conversational_rag_chain.stream(
    {"input": "What are common ways of doing it?"},
    config={"configurable": {"session_id": session_id_1}},
)

In [31]:
for chunk in stream:
    print(chunk)

{'input': 'What are common ways of doing it?', 'chat_history': [HumanMessage(content='What is Task Decomposition?'), AIMessage(content='Task decomposition is the process of breaking down a complex task into smaller, more manageable steps. This can be achieved through techniques like Chain of Thought (CoT), where the model is prompted to think step by step, or Tree of Thoughts, which explores multiple reasoning possibilities at each step. By decomposing tasks, it becomes easier to tackle each component effectively, enhancing overall performance.'), HumanMessage(content='What are common ways of doing it?'), AIMessage(content='Common ways of task decomposition include using simple prompting techniques like asking for "Steps for XYZ" or "What are the subgoals for achieving XYZ?". It can also involve task-specific instructions, such as "Write a story outline" for creative tasks, or incorporating human inputs to guide the decomposition process.'), HumanMessage(content='What are common ways o

# Steaming only Answers

We are free to process chunks as they are streamed out. If we just want to stream the answer tokens, for example, we can select chunks with the corresponding key:

In [32]:
stream= conversational_rag_chain.stream(
    {"input": "What are common ways of doing it?"},
    config={"configurable": {"session_id": session_id_1}},
)

Stream with a character

In [33]:
for chunk in stream:
    if answer_chunk := chunk.get("answer"):
        print(f"{answer_chunk}|", end="")

Common| ways| of| task| decomposition| include| using| simple| prompts| like| "|Steps| for| XYZ|"| or| "|What| are| the| sub|go|als| for| achieving| XYZ|?",| applying| task|-specific| instructions| such| as| "|Write| a| story| outline|,"| and| incorporating| human| inputs| for| additional| guidance|.| These| methods| help| break| down| complex| tasks| into| smaller|,| more| manageable| steps|.|

Stream without a character

In [34]:
stream= conversational_rag_chain.stream(
    {"input": "What are common ways of doing it?"},
    config={"configurable": {"session_id": session_id_1}},
)

In [35]:
for chunk in stream:
    if answer_chunk := chunk.get("answer"):
        print(f"{answer_chunk}", end="")

Common ways of task decomposition include using simple prompting techniques like asking for "Steps for XYZ" or "What are the subgoals for achieving XYZ?" Additionally, it can involve task-specific instructions, such as "Write a story outline," or using human inputs to guide the decomposition process.

### Stream the answer Using a .pick method
 https://python.langchain.com/v0.2/api_reference/core/runnables/langchain_core.runnables.base.Runnable.html#langchain_core.runnables.base.Runnable.pick
More simply, we can use the .pick method to select only the desired key:



In [36]:
stream= conversational_rag_chain.stream(
    {"input": "What are common ways of doing it?"},
    config={"configurable": {"session_id": session_id_1}},
)

In [37]:
pick_answer_chain = conversational_rag_chain.pick("answer")


In [40]:
stream= pick_answer_chain.stream(
    {"input": "What are common ways of doing it?"},
    config={"configurable": {"session_id": session_id_1}},
)

In [41]:
for chunk in stream:
    print(f"{chunk}", end="")

Common ways of task decomposition include using simple prompts like "Steps for XYZ" or "What are the subgoals for achieving XYZ?", applying task-specific instructions, such as "Write a story outline," and incorporating human inputs for guidance. These methods facilitate breaking down complex tasks into smaller, manageable parts.