# Redbox RAG chat optimisation guide  <a class="anchor" id="title"></a>

## Overview <a class="anchor" id="one-section"></a>

When it comes to optimising the generation part of our RAG system, the only thing that we can modify are the `RAG prompts` that are passed with context to the LLM. Other components certainly play into the overall generation evaluation score, such as is the retrieved context of high-quality, but the levers to change these other components are further upstream in the RAG pipeline, and evaluated in Retrieval Evaluation and e2d Evaluation notebooks. These other components are also slower to change compared to prompts, which are just natural language!

Currently the experimentation time for changing any componenet in Redbox RAG (core-api) is slow. After each change the docker image has to be rebuilt and then the evaluation dataset has to be passed to the updated RAG and subsequent evaluation metrics analysed to see if they improve performance.

**An aim for future is to make this experimentation process faster and better tracked**

For now, this notebook links to all the files where `RAG prompts` can be changed to try and optimise Redbox Core API performance

## Workflow in this Notebook <a class="anchor" id="four-section"></a>

1. Review the various locations in the codebase where `RAG prompts` are used
2. Make a change in one or more of these locations
3. Rebuild the core-api docker image (and any other images modified), using `docker compose rebuild --no-cache`
4. Follow the rag_e2e_evaluation notebook to generate evaluation score for the modified Redbox RAG based on your changes
5. Record your changes in **TBD**

Follow these steps to run an experiment:
1. Make experimental changes to [`RAG prompts`]() - these will be used by the /chat/rag function
2. (Optional) Make experimental changes to the [/chat/rag function]() 
3. Pass the evaluation dataset through the /chat/rag function to general `actual_output` and append these to the evaluation dataset
4. Run evaluations on dataset to calcuate generation evaluation metrics

## RAG prompt locations

#### 1. System Prompt

When the `/chat/rag` endpoint it used, a system prompt can be sent, which will be considered by the LLM. In future we may want to consolidate any system prompt being sent in a message with the backend prompts, but for now it is a source of variation/optimisation

In [None]:
import os
from jose import jwt
from uuid import UUID
import requests
import json

Below is an example `/chat/rag/` endpoint payload. Notice `"role": "system"` followed by text. You can make the response talk like a pirate if you ask it to here!

In [None]:
bearer_token = jwt.encode({"user_uuid": str(UUID("aaaaaaaa-aaaa-aaaa-aaaa-aaaaaaaaaaaa"))}, key="your-secret-key", algorithm="HS512")

headers = {
    'accept': 'application/json',
    'Authorization': 'Bearer ' + bearer_token,
    'Content-Type': 'application/json',
}

url = 'http://127.0.0.1:5002/chat/rag'

for input in inputs:
    data = {
        "message_history": [
            {
                "role": "system",
                "text": "You are a helpful AI Assistant"
            },
            {
                "role": "user",
                "text": "What is AI?"
            }
        ]
    }
    
    response = requests.post(url, headers=headers, data=json.dumps(data))
    data = response.json()

    retrieval_context.append(data['source_documents'])
    actual_output.append(data['output_text'])

[Back to top](#title)

-------------

#### 2. Prompts in core.py

One prompt, the `_core_redbox_prompt` is located in [core.py](../../redbox/llm/prompts/core.py)

In [None]:
_core_redbox_prompt = """You are RedBox Copilot. An AI focused on helping UK Civil Servants, Political Advisors and\
Ministers triage and summarise information from a wide variety of sources. You are impartial and\
non-partisan. You are not a replacement for human judgement, but you can help humans\
make more informed decisions. If you are asked a question you cannot answer based on your following instructions, you\
should say so. Be concise and professional in your responses. Respond in markdown format.

=== RULES ===

All responses to Tasks **MUST** be translated into the user's preferred language.\
This is so that the user can understand your responses.\
"""

In [None]:
# Check where CORE_REDBOX_PROMPT is used in the codebase
CORE_REDBOX_PROMPT = PromptTemplate.from_template(_core_redbox_prompt)

[Back to top](#title)

----------

#### 3. Prompts in chat.py

There are 4 prompts located in [chat.py](../../redbox/llm/prompts/chat.py)

Things to experiment with:
1. `_with_sources_template`
2. `WITH_SOURCES_PROMPT`
3. `_stuff_document_template`
4. `STUFF_DOCUMENT_PROMPT`

In [None]:
_with_sources_template = """Given the following extracted parts of a long document and \
a question, create a final answer with Sources at the end.  \
If you don't know the answer, just say that you don't know. Don't try to make \
up an answer.
Be concise in your response and summarise where appropriate. \
At the end of your response add a "Sources:" section with the documents you used. \
DO NOT reference the source documents in your response. Only cite at the end. \
ONLY PUT CITED DOCUMENTS IN THE "Sources:" SECTION AND NO WHERE ELSE IN YOUR RESPONSE. \
IT IS CRUCIAL that citations only happens in the "Sources:" section. \
This format should be <DocX> where X is the document UUID being cited.  \
DO NOT INCLUDE ANY DOCUMENTS IN THE "Sources:" THAT YOU DID NOT USE IN YOUR RESPONSE. \
YOU MUST CITE USING THE <DocX> FORMAT. NO OTHER FORMAT WILL BE ACCEPTED.
Example: "Sources: <DocX> <DocY> <DocZ>"

Use **bold** to highlight the most question relevant parts in your response.
If dealing dealing with lots of data return it in markdown table format.

QUESTION: {question}
=========
{summaries}
=========
FINAL ANSWER:"""

In [None]:
WITH_SOURCES_PROMPT = PromptTemplate.from_template(_core_redbox_prompt + _with_sources_template)

In [None]:
_stuff_document_template = "<Doc{parent_doc_uuid}>{page_content}</Doc{parent_doc_uuid}>"

In [None]:
STUFF_DOCUMENT_PROMPT = PromptTemplate.from_template(_stuff_document_template)

[Back to top](#title)

-------

#### 4. LLM being used

We can also optimise the LLM being used, but please **bear in mind that prompts are per LLM and if you change the LLM you will need to optimise the prompts!**

For now, please stick with gpt-3.5-turbo, as we establish a baseline quality

[Back to top](#title)

-----------

## Promote optimised prompts into production

If you find changes to the prompts above improve the generation evaluation scores, please consider making a PR to update the code in `core_api`. Follow these steps:

1. Create a new branch off `main`
2. Make changes in the locations listed below
3. Run through the e2e RAG evaluation notebook
4. If e2e RAG evaluation metrics are improved, please make a PR!

All these prompts are locations in [chat.py](../../redbox/llm/prompts/chat.py), except `_core_redbox_prompt` which is located in [core.py](../../redbox/llm/prompts/core.py)

[Back to top](#title)

-----------