### Conversation Q&A Chatbot
In many Q&A applications we want to allow the user to have a back-and-forth conversation, meaning the application needs some sort of "memory" of past questions and answers, and some logic for incorporating those into its current thinking.

In this guide we focus on adding logic for incorporating historical messages. Further details on chat history management is covered in the previous videos.

We will cover two approaches:

- Chains, in which we always execute a retrieval step;
- Agents, in which we give an LLM discretion over whether and how to execute a retrieval step (or multiple steps).

In [18]:
import os
from dotenv import load_dotenv
import bs4
from langchain_community.document_loaders import WebBaseLoader                  # Data Ingestion
from langchain_text_splitters import RecursiveCharacterTextSplitter             # Data Transformation
from langchain.embeddings import OllamaEmbeddings                               # Embedding
from langchain_chroma import Chroma                                             # Vector Store
from langchain_core.prompts import ChatPromptTemplate                           # Prompt Template
from langchain_groq import ChatGroq                                             # LLM
from langchain.chains import create_retrieval_chain                             # Retrieval Chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.messages import AIMessage,HumanMessage
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.runnables.history import RunnableWithMessageHistory
load_dotenv()

True

In [2]:
# 1. Data Ingestion
#    Load, chunk and index the contents of the blog to create a retriever.

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)

documents = loader.load()

In [3]:
# 2. Data Transformation
#    We will convert the document into chunks for processing

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000,chunk_overlap=200)
chunks = text_splitter.split_documents(documents=documents)

In [4]:
# 3. Embedding
#    After converting the documents into chunks we need to convert those into vectors

embeddings = OllamaEmbeddings(model="gemma2:2b")

In [5]:
# 4. Vector Database 
#    We have created the ebeddings, now need to apply those embeddings and store it into database.

vector_store = Chroma.from_documents(documents= chunks,embedding= embeddings)

In [6]:
# 5. Retrieval Chain

retriever = vector_store.as_retriever()
retriever

VectorStoreRetriever(tags=['Chroma', 'OllamaEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x11797faf0>)

In [7]:
# 6. Prompt Template

system_prompt = (
    """
        You are an assistant for question-answering tasks.
        Use the following pieces of retrieved context to answer
        the question. If you don't know the answer, say that you 
        don't know. Use three sentences at maximum and keep the
        answer concise.
        \n\n
        {context}
    """
)

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("user", "{input}")
    ]
)

In [8]:
# 7. LLM Initialisation

groq_api_key=os.getenv("GROQ_API_KEY")
llm = ChatGroq(groq_api_key=groq_api_key,model_name="Llama3-8b-8192")

In [9]:
# 8. Question Answer Chain & RAG Chain

question_answer_chain = create_stuff_documents_chain(llm=llm, prompt=prompt)
rag_chain = create_retrieval_chain(retriever, question_answer_chain)

### 1. **Retrieval Chain (`create_retrieval_chain`)**:
   - **Purpose**: The retrieval chain is responsible for fetching relevant documents or information from a knowledge base (e.g., a vector store) based on the user's query.
   - **How it works**:
     - When the user asks a question, the retrieval chain searches through the stored documents to find the most relevant pieces of information.
     - It uses embeddings to represent both the query and the documents as vectors, allowing it to calculate similarities and identify the closest matches.
     - The retrieved documents are then passed to the next chain for further processing.
   - **Key Components**:
     - **Embeddings**: Convert text into vector representations. Here, `OllamaEmbeddings` are used to generate these vectors.
     - **Vector Store**: Stores the document embeddings. `Chroma` is used as the vector store, enabling fast retrieval based on vector similarity.

### 2. **Combination Chain (`create_stuff_documents_chain`)**:
   - **Purpose**: The combination chain takes the documents retrieved by the retrieval chain and synthesizes them into a single coherent response.
   - **How it works**:
     - After receiving the relevant documents from the retrieval chain, the combination chain processes these documents to generate a concise and contextually accurate response to the user's query.
     - This involves summarizing, rephrasing, or directly extracting content from the documents to form the final output.
   - **Key Components**:
     - **LLM (Language Model)**: The LLM is responsible for understanding the context of the retrieved documents and generating the final text. In this case, the `ChatGroq` model is used.

### How the Chains Work Together:
- **Step 1**: The user inputs a question.
- **Step 2**: The **Retrieval Chain** processes the question, searches the vector store, and retrieves the most relevant documents.
- **Step 3**: The **Combination Chain** takes these documents and creates a structured response, which is then returned to the user.

In [10]:
response = rag_chain.invoke({"input":"What is Self Reflection?"})
response

{'input': 'What is Self Reflection?',
 'context': [Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Another quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.\nSelf-Reflection#\nSelf-reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action d

## Adding Chat History

In [11]:
from langchain.chains import create_history_aware_retriever
from langchain_core.prompts import MessagesPlaceholder

# create_history_aware_retriever - This works as a retriever but it is aware of the history of the conversation

In [12]:
# 9. Contextualized Prompt

contextualized_q_system_prompt = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

contextualize_q_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", contextualized_q_system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

In [13]:
history_aware_retriever = create_history_aware_retriever(llm, retriever, contextualize_q_prompt)
history_aware_retriever

RunnableBinding(bound=RunnableBranch(branches=[(RunnableLambda(lambda x: not x.get('chat_history', False)), RunnableLambda(lambda x: x['input'])
| VectorStoreRetriever(tags=['Chroma', 'OllamaEmbeddings'], vectorstore=<langchain_chroma.vectorstores.Chroma object at 0x11797faf0>))], default=ChatPromptTemplate(input_variables=['chat_history', 'input'], input_types={'chat_history': typing.List[typing.Union[langchain_core.messages.ai.AIMessage, langchain_core.messages.human.HumanMessage, langchain_core.messages.chat.ChatMessage, langchain_core.messages.system.SystemMessage, langchain_core.messages.function.FunctionMessage, langchain_core.messages.tool.ToolMessage]]}, messages=[SystemMessagePromptTemplate(prompt=PromptTemplate(input_variables=[], template='Given a chat history and the latest user question which might reference context in the chat history, formulate a standalone question which can be understood without the chat history. Do NOT answer the question, just reformulate it if neede

In [14]:
qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

In [15]:
question_answer_chain = create_stuff_documents_chain(llm,qa_prompt)
rag_chain = create_retrieval_chain(history_aware_retriever,question_answer_chain)

In [16]:
chat_history=[]

question="What is Self-Reflection"
response1 = rag_chain.invoke({"input":question,"chat_history":chat_history})

chat_history.extend(
    [
        HumanMessage(content=question),
        AIMessage(content = response1["answer"])
    ]
)

question2 = "Tell me more about it?"
response2 = rag_chain.invoke({"input":question,"chat_history":chat_history})

print(response2['answer'])

Self-Reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes.


In [17]:
chat_history

[HumanMessage(content='What is Self-Reflection'),
 AIMessage(content='According to the provided context, Self-Reflection is a vital aspect that allows autonomous agents to improve iteratively by refining past action decisions and correcting previous mistakes. It plays a crucial role in real-world tasks where trial and error are inevitable.')]

In [19]:
store = {}


def get_session_history(session_id: str) -> BaseChatMessageHistory:
    if session_id not in store:
        store[session_id] = ChatMessageHistory()
    return store[session_id]


conversational_rag_chain = RunnableWithMessageHistory(
    rag_chain,
    get_session_history,
    input_messages_key="input",
    history_messages_key="chat_history",
    output_messages_key="answer",
)

In [20]:
conversational_rag_chain.invoke(
    {"input": "What is Task Decomposition?"},
    config={
        "configurable": {"session_id": "abc123"}
    },  # constructs a key "abc123" in `store`.
)["answer"]

'Task decomposition is a process in which a complex task is broken down into smaller, more manageable subtasks or steps. This can be done by a Large Language Model (LLM) using simple prompting, task-specific instructions, or human inputs. The goal of task decomposition is to make it easier to solve the original task by focusing on one step at a time.'

In [21]:
conversational_rag_chain.invoke(
    {"input": "What are common ways of doing it?"},
    config={"configurable": {"session_id": "abc123"}},
)["answer"]

'Common ways of doing task decomposition include:\n\n1. Chain of Thought (CoT): Instructing the model to "think step by step" to utilize more test-time computation and decompose hard tasks into smaller and simpler steps.\n2. History-based prompting: Presenting a history of sequentially improved outputs in context and training the model to take on the trend to produce better outputs.\n\nNote that these methods are specifically mentioned in the context provided. There may be other approaches to task decomposition as well.'

In [27]:
# A new session
conversational_rag_chain.invoke(
    {"input": "What are common ways of doing it?"},
    config={"configurable": {"session_id": "abc456"}},
)["answer"]

'I don\'t know the context of "it", but based on the provided context, it seems that "it" might refer to tool use, generative agents, or task decomposition. If it\'s about tool use, common ways of doing it include equipping LLMs with external tools to extend their capabilities. For generative agents, common ways of doing it include combining LLM with memory, planning, and reflection mechanisms to enable agents to behave conditioned on past experience. If it\'s about task decomposition, common ways of doing it include using simple prompting, task-specific instructions, or human inputs.'

In [28]:
store

{'abc123': InMemoryChatMessageHistory(messages=[HumanMessage(content='What is Task Decomposition?'), AIMessage(content='Task decomposition is a process in which a complex task is broken down into smaller, more manageable subtasks or steps. This can be done by a Large Language Model (LLM) using simple prompting, task-specific instructions, or human inputs. The goal of task decomposition is to make it easier to solve the original task by focusing on one step at a time.'), HumanMessage(content='What are common ways of doing it?'), AIMessage(content='Common ways of doing task decomposition include:\n\n1. Chain of Thought (CoT): Instructing the model to "think step by step" to utilize more test-time computation and decompose hard tasks into smaller and simpler steps.\n2. History-based prompting: Presenting a history of sequentially improved outputs in context and training the model to take on the trend to produce better outputs.\n\nNote that these methods are specifically mentioned in the c