# Part 5: Advanced techniques
* multiquery technique
* chat history

## IMPORTANT: Installation with the exact packages we used
* When you download a full stack app you need to make sure that both backend and frontend use the original packages in order to avoid potential errors caused by installing more modern versions of these packages.
* Since we used pip to install the original backend packages and froze them using pip freeze, you will now use "pip install -r requirements.txt" to install them. Since we also used poetry, you will also use "poetry install".
* Since we used npx to install the original frontend packages, you will now use "npm ci" to install them.
#### Download the code
* Download the code from the github repository.
#### Backend installation
* Since we used both pyenv and poetry to build this project, you will have to use the following approach to install the backend.
* In the terminal, make sure you are in the root directory of the project (v1-166-part5). Pay attention: the root directory of the project and the backend directory have an identic name. Do not mistake them, be sure you are in the root directory of the project now.
* **Create a virtual environment and use pip install to make sure you install the exact same packages we used**:
    * pyenv virtualenv 3.11.4 your-virtual-environment-name
    * pyenv activate your-virtual-environment-name
    * pip install -r requirements.txt
* **Go to the backend directory, create a virtual environment and use poetry install to make sure you install the exact same packages we used**:
    * cd v1-166-part5
    * poetry install
#### Frontend installation
* Open a second terminal window, make sure you are in the root directory of the project (v1-166-part5). Pay attention: the root directory of the project and the backend directory have an identic name. Do not mistake them, be sure you are in the root directory of the project now.
* **Go to the frontend directory, and use npm ci to make sure you install the exact same packages we used**:
    * cd frontend
    * npm ci
#### Ready to go!
* You can now see the code of the app in Visual Studio Code.
* Relax and review the following steps. Remember, since you have pre-installed the modules you will not have to re-install them again.
* **IMPORTANT**: due to the changes we have done in rag_chain.py, if you try to run the backend now you will see an error message. For this lecture, just run the frontend and wait until the following lecture to run the backend.

# Multiquery technique

* First, let's update the final_chain in app/rag_chain.py. We will go from this:

In [None]:
# final_chain = (
#         RunnableParallel(
#             context=(itemgetter("question") | vector_store.as_retriever()),
#             question=itemgetter("question")
#         ) |
#         RunnableParallel(
#             answer=(ANSWER_PROMPT | llm),
#             docs=itemgetter("context")
#         )
# ).with_types(input_type=RagInput)

* To this (see the change in line 3: `multiquery`):

In [1]:
# final_chain = (
#         RunnableParallel(
#             context=(itemgetter("question") | multiquery),
#             question=itemgetter("question")
#         ) |
#         RunnableParallel(
#             answer=(ANSWER_PROMPT | llm),
#             docs=itemgetter("context")
#         )
# ).with_types(input_type=RagInput)

* Now we need to define this new multiquery variable. Before the final_chain, we will add:

In [2]:
# multiquery = MultiQueryRetriever.from_llm(
#     retriever=vector_store.as_retriever(),
#     llm=llm,
# )

* To be able to use [MultiQueryRetriever](https://python.langchain.com/docs/modules/data_connection/retrievers/MultiQueryRetriever/), we will have to add this import at the top of the file:
* `from langchain.retrievers.multi_query import MultiQueryRetriever`

#### We can now check this asking a question to our RAG app and tracking the response on LangSmith.

* Before doing it, we are going to include a small modification in frontend/src/App.tsx:

In [None]:
# const handleSendMessage = async (message: string) => {
#     setInputValue("")

#     setMessages(prevMessages => [...prevMessages, {message, isUser: true}]);

#     await fetchEventSource(`http://localhost:8000/rag/stream`, {
#       method: 'POST',
#       openWhenHidden: true,
#       headers: {
#         'Content-Type': 'application/json',
#       },
#       body: JSON.stringify({
#         input: {
#           question: message,
#         }
#       }),
#       onmessage(event) {
#         if (event.event === "data") {
#           handleReceiveMessage(event.data);
#         }
#       },
#     })
#   }

We added `openWhenHidden: true`. Let's explain what this does in simple terms.
* Imagine you're watching a live sports game on your phone but then switch to another app to send a quick message. Normally, your phone might pause the game to save battery and data because you're not directly looking at it. But if there was a special rule saying "Keep showing the game even if I'm not currently watching," your phone would continue to play the game in the background, so you don't miss anything important.
* `openWhenHidden: true` is like that special rule. It tells the website, "Keep listening for updates from the server even if I've moved to a different tab or minimized the browser." This way, the website can keep getting new messages or updates automatically, without you having to keep it open and visible all the time.
* In short, this small change will allow us to open LangSmith while our App is in the middle of delivering a response without disrupting this.

#### Now, if you ask a question to the RAG App and track it with LangSmith, you will see the Multiquery technique at work.
* If you open the trace in LangSmith you will see that the Retriever step has now 3 other Retriever steps. If you look at the ChatOpenAI step, you will see how the LLM has generated the 3 multi-queries that help the app to find the best answer possible.

# Chat history
* We will use the [RunnableWithMesageHistory](https://python.langchain.com/docs/expression_language/how_to/message_history/) from Langchain to update app/rag_chain.py

In [None]:
# old_chain = (
#         RunnableParallel(
#             context=(itemgetter("question") | multiquery),
#             question=itemgetter("question")
#         ) |
#         RunnableParallel(
#             answer=(ANSWER_PROMPT | llm),
#             docs=itemgetter("context")
#         )
# ).with_types(input_type=RagInput)

# postgres_memory_url = "postgresql+psycopg://postgres:postgres@localhost:5432/pdf_rag_history"

# get_session_history = lambda session_id: SQLChatMessageHistory(
#     connection_string=postgres_memory_url,
#     session_id=session_id
# )

# template_with_history="""
# Given the following conversation and a follow
# up question, rephrase the follow up question
# to be a standalone question, in its original
# language

# Chat History:
# {chat_history}
# Follow Up Input: {question}
# Standalone question:"""

# standalone_question_prompt = PromptTemplate.from_template(template_with_history)

# standalone_question_mini_chain = RunnableParallel(
#     question=RunnableParallel(
#         question=RunnablePassthrough(),
#         chat_history=lambda x:get_buffer_string(x["chat_history"])
#     )
#     | standalone_question_prompt
#     | llm
#     | StrOutputParser()
# )


# final_chain = RunnableWithMessageHistory(
#     runnable=standalone_question_mini_chain | old_chain,
#     input_messages_key="question",
#     history_messages_key="chat_history",
#     output_messages_key="answer",
#     get_session_history=get_session_history,
# )

* See the new imports we have added at the top of the file.

The previous code outlines a pipeline to handle chat interactions, specifically for transforming follow-up questions into standalone questions and providing answers based on historical context. Here’s a breakdown of the key components and their roles:

**Processing Chains**

1. **`old_chain`**:
   - Uses `RunnableParallel` to process inputs in parallel.
   - Extracts the "question" directly from the input and retrieves related context using `multiquery`.
   - Another parallel operation generates an "answer" using a language model (`llm`) prompted by `ANSWER_PROMPT` and retrieves document ("docs") context.
   - The chain is typed with `RagInput`, expecting an input format where "question" is a key.

2. **`postgres_memory_url`**:
   - Defines a connection string to a PostgreSQL database that stores chat histories.

3. **`get_session_history`**:
   - A lambda function creating an instance of `SQLChatMessageHistory` for a given session. This allows fetching chat history from the PostgreSQL database.

4. **`template_with_history`**:
   - A multi-line string defining a template to rephrase follow-up questions into standalone questions, including chat history for context.

5. **`standalone_question_prompt`**:
   - Converts `template_with_history` into a `PromptTemplate`, preparing it for use with a language model.

6. **`standalone_question_mini_chain`**:
   - Constructs a mini processing chain that combines the follow-up question and chat history, uses `standalone_question_prompt` to generate a prompt, feeds it through the language model (`llm`), and parses the string output.

**Final Processing Pipeline**

- **`final_chain`**:
  - Combines `standalone_question_mini_chain` with `old_chain`, applying `RunnableWithMessageHistory`.
  - This setup enables the system to process a user's question, factor in historical context, generate a standalone question, and then produce an answer.
  - It manages input and output keys for questions, chat history, and answers, ensuring the correct flow of data through the system.
  - Utilizes `get_session_history` to fetch and incorporate session-based chat history, enriching the context for generating responses.

Let's explain it in simple terms. Imagine you're having a conversation with a really smart robot that can remember everything you've talked about. You're asking questions, and it's answering them, but sometimes you ask a follow-up question that doesn't make much sense on its own. So, the robot has to figure out how to turn that follow-up question into a new, clear question that anyone could understand, even if they didn't know what you were talking about before. Plus, it has to remember all your past questions and answers to give you the best reply. Here’s how the code does this:

1. **The `old_chain` Part**: Think of this as the robot's basic way of understanding your question and finding the best answer from a big book of knowledge it has. It looks at your question, digs up some related stuff, and then uses that to come up with an answer.

2. **Remembering Past Conversations**: There's a part where the robot is told how to remember everything you've talked about. It uses a special notebook stored somewhere safe (like a database) to jot down all the questions and answers from your conversation.

3. **Turning Follow-up Questions into Standalone Questions**: The robot uses a special format to think about how to change your follow-up question into a new, standalone question. This is important because it wants to make sure the question makes sense on its own.

4. **The Mini-Chain for Standalone Questions**: This is a small set of steps the robot takes to actually turn the follow-up question into a standalone question. It looks at the history of your chat, the question you asked, and then rewrites it to be clear and understandable by itself.

5. **The Final Chain**: Finally, all of this comes together. The robot first tries to turn your follow-up question into a standalone question and then goes through its basic steps of understanding and answering the question, remembering everything you've talked about before. This way, it can give you a really good answer.

In simple terms, this code is like instructions for a very smart robot that helps it understand and remember your conversation, and make sure every question you ask is clear and makes sense on its own, even if it's a follow-up question.

## The frontend needs to send the session_id. 
* In order to do that, let's update frontend/src/App.tsx.

* First, you will need to install this module via terminal:
* `npm install --save-dev @types/uuid`

* Then, you can update frontend/src/App.tsx like this:

In [None]:
# import {v4 as uuidv4} from 'uuid';


# #Inside Function App()
#   const sessionIdRef = useRef<string>(uuidv4());

#   useEffect(() => {
#     sessionIdRef.current = uuidv4();
#   }, []);

The previous code shows the use of the `uuid` library to generate a unique identifier (UUID) and React hooks (`useRef` and `useEffect`) within a functional component named `App()`. Let's break down what each part does:

1. **Importing `uuidv4`**: The `uuid` library provides various methods to generate unique identifiers. `v4` refers to version 4 of the UUID algorithm, which generates random UUIDs. The statement `import {v4 as uuidv4} from 'uuid';` imports this specific method and renames it to `uuidv4` for use within your component.

2. **`useRef` Hook**: `useRef` is a hook that allows you to persist values across re-renders without causing additional renders when its value changes. Here, it's used to create a reference (`sessionIdRef`) that holds a string value. Initially, this reference is set to a new UUID generated by `uuidv4()`. The purpose of using `useRef` here is to have a consistent way to access a value (the UUID) that doesn't trigger re-renders when it changes.

3. **`useEffect` Hook**: `useEffect` lets you perform side effects in functional components. It's akin to lifecycle methods in class components like `componentDidMount`, `componentDidUpdate`, and `componentWillUnmount`. The provided effect function is run after the component mounts (and after every update, depending on the dependencies array). In this case, the effect:
   - Runs only once after the initial render, because the dependencies array (`deps:[]`) is empty. This behavior mimics `componentDidMount` in class components.
   - Inside the effect, it assigns a new UUID to `sessionIdRef.current`, effectively overwriting the initial value assigned during the component's initialization with `useRef`.

#### What Does This Code Do?

When the component `App()` first renders:
- It creates a `sessionIdRef` reference with an initial UUID value.

Then, right after mounting:
- The `useEffect` hook runs and assigns a new UUID to `sessionIdRef.current`.

The practical effect of this code is that it ensures `sessionIdRef` has a unique, stable UUID value that can be accessed throughout the component's lifecycle without causing re-renders. The UUID is refreshed only once (due to `useEffect` with an empty dependencies array) after the component mounts.

Let's explain this in simple terms. Imagine you're at a huge party where everyone needs a unique sticker to enter. When you first decide to go, you make yourself a sticker at home using a special sticker machine (this is like generating the first unique ID with `uuidv4()` when the component is first used).

Now, when you actually step into the party, you notice there's another sticker machine right at the entrance. Even though you already have a sticker, you use this machine to get a new one just because it's there (this is like using `useEffect` to generate a new unique ID right after the component loads for the first time).

So, in simple terms, this code does two main things in our party analogy:
1. **Before going to the party**: It creates a unique sticker for you at home.
2. **As soon as you arrive at the party**: It gives you a new unique sticker, replacing the one you made at home, but after that, it doesn't give you more stickers no matter how many times you go out and come back in.

In your component:
- **At the start (with `useRef`)**: It makes a unique code (like a special sticker) and remembers it.
- **Right after the component appears on the screen (with `useEffect`)**: It makes another new unique code and updates the remembered code with this new one. But, it only does this once, right after the component first shows up.

## Now we need to update the handleSendMessage function
* See the changes after `config: {`

In [None]:
# const handleSendMessage = async (message: string) => {
#     setInputValue("")

#     setMessages(prevMessages => [...prevMessages, {message, isUser: true}]);

#     await fetchEventSource(`http://localhost:8000/rag/stream`, {
#       method: 'POST',
#       openWhenHidden: true,
#       headers: {
#         'Content-Type': 'application/json',
#       },
#       body: JSON.stringify({
#         input: {
#           question: message,
#         },
#         config: {
#           configurable: {
#             sessionId: sessionIdRef.current
#           }
#         }
#       }),
#       onmessage(event) {
#         if (event.event === "data") {
#           handleReceiveMessage(event.data);
#         }
#       },
#     })
#   }

## Now we need to create the postgres database to store the chat history
* In terminal:
    * `psql -U postgres`
    * `CREATE DATABASE pdf_rag_history`
    * `\q`

## You can now ask one question in the App, and then a follow up question.
* The key here is that the follow up question refers to the first question. The App will need to know the chat history to be able to answer the follow up question properly.

## Check the traces in LangSmith
* See how now the chat history appears at the beginning of the trace.

## Check how the chat history is being stored in the database
In your terminal:
* `psql -U postgres`
* `\c pdf_rag_history`
* `SELECT * FROM public.message_store;`
* `\q`