### Set Environment Variables

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

os.environ["GROQ_API_KEY"] = os.getenv("GROQ_API_KEY")

#langsmith tracking
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGCHAIN_API_KEY")
os.environ["LANGCHAIN_TRACKING_V2"]="true"
os.environ["LANGCHAIN_PROJECT"] = os.getenv("LANGCHAIN_PROJECT")

os.environ["HF_TOKEN"] = os.getenv("HUGGINGFACE_API_KEY")

### Loading or Extracting Web Page

In [36]:
from langchain_groq import ChatGroq
from langchain_community.document_loaders import WebBaseLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
import bs4

loader = WebBaseLoader(
    web_paths=("https://python.langchain.com/v0.2/docs/tutorials/qa_chat_history/",),
    bs_kwargs=dict(parse_only=bs4.SoupStrainer("div",attrs={"class": "theme-doc-markdown markdown"}))
)

docs=loader.load()
docs

[Document(metadata={'source': 'https://python.langchain.com/v0.2/docs/tutorials/qa_chat_history/'}, page_content='Conversational RAGPrerequisitesThis guide assumes familiarity with the following concepts:Chat historyChat modelsEmbeddingsVector storesRetrieval-augmented generationToolsAgentsIn many Q&A applications we want to allow the user to have a back-and-forth conversation, meaning the application needs some sort of "memory" of past questions and answers, and some logic for incorporating those into its current thinking.In this guide we focus on adding logic for incorporating historical messages. Further details on chat history management is covered here.We will cover two approaches:Chains, in which we always execute a retrieval step;Agents, in which we give an LLM discretion over whether and how to execute a retrieval step (or multiple steps).For the external knowledge source, we will use the same LLM Powered Autonomous Agents blog post by Lilian Weng from the RAG tutorial.Setup\u2

### Split Documents

In [48]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
split_docs = text_splitter.split_documents(docs)
split_docs

[Document(metadata={'source': 'https://python.langchain.com/v0.2/docs/tutorials/qa_chat_history/'}, page_content='Conversational RAGPrerequisitesThis guide assumes familiarity with the following concepts:Chat historyChat modelsEmbeddingsVector storesRetrieval-augmented generationToolsAgentsIn many Q&A applications we want to allow the user to have a back-and-forth conversation, meaning the application needs some sort of "memory" of past questions and answers, and some logic for incorporating those into its current thinking.In this guide we focus on adding logic for incorporating historical messages. Further details on chat history management is covered here.We will cover two approaches:Chains, in which we always execute a retrieval step;Agents, in which we give an LLM discretion over whether and how to execute a retrieval step (or multiple steps).For the external knowledge source, we will use the same LLM Powered Autonomous Agents blog post by Lilian Weng from the RAG tutorial.Setup\u2

### Embed these documents

In [49]:
hf_embed = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
db_faiss = FAISS.from_documents(split_docs, hf_embed)

### Create Prompt Template & Chat History

In [53]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder

contextualize_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system",contextualize_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

In [54]:
from langchain.chains import create_history_aware_retriever

groq_llm = ChatGroq()
faissRetriever = db_faiss.as_retriever()
history_aware_retriever = create_history_aware_retriever(llm=groq_llm, retriever=faissRetriever,prompt=contextualize_q_prompt)


In [55]:
system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system",system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

In [56]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

stuff_docs_chain = create_stuff_documents_chain(groq_llm,qa_prompt)
retrieval_chain = create_retrieval_chain(history_aware_retriever, stuff_docs_chain)

In [57]:
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory

store = {}
def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]

with_message_history = RunnableWithMessageHistory(retrieval_chain, get_session_history, 
    input_messages_key="question",
    history_messages_key="chat_history",
    output_messages_key="answer")

In [58]:
with_message_history.invoke(
    {"input": "Can you explain me the flow of what is conversational RAG?"},
    config={
        "configurable": {"session_id": "abc1233"}
    },  # constructs a key "abc123" in `store`.
)["answer"]

Error in RootListenersTracer.on_chain_end callback: KeyError('question')


'Conversational Retrieval-Augmented Generation (RAG) involves breaking down complex tasks into smaller steps, known as task decomposition. This can be achieved through methods like Chain of Thought or Tree of Thoughts. In the context of Conversational RAG, an agent uses Language Model (LLM) reasoning capabilities to make decisions during execution, allowing for discretion over the retrieval process. To manage the chat history in a conversation, input and output messages can be appended to a list. For example, to ask a question and a follow-up question, you can create a chat history by extending it with HumanMessage and AIMessage objects.'

In [59]:
print(with_message_history.invoke(
    {"input": "How is Conversational RAG implemented? Give High level steps"},
    config={
        "configurable": {"session_id": "abc1233"}
    },  # constructs a key "abc123" in `store`.
)["answer"])

Error in RootListenersTracer.on_chain_end callback: KeyError('question')


Conversational RAG involves two approaches for incorporating historical messages: Chains, which always execute a retrieval step, and Agents, which give an LLM discretion over retrieval. Task decomposition is a key aspect, which can be achieved through techniques like Chain of Thought, Tree of Thoughts, or by using prompts, task-specific instructions, or human inputs to break down complex tasks into manageable subtasks. The conversation history needs to be managed by appending input and output messages to a list.


In [60]:
print(with_message_history.invoke(
    {"input": "What type of chains are used?"},
    config={
        "configurable": {"session_id": "abc1233"}
    },  # constructs a key "abc123" in `store`.
)["answer"])

Error in RootListenersTracer.on_chain_end callback: KeyError('question')


Chains consisting of predictable applications that generate search queries for each user input are used to build a question-answering application. For more information on different types of retrievers and retrieval strategies, you can visit the retrievers section of the how-to guides.
