### Dependencies setup

We will setup the retrieval_chain and an LLM model

In [None]:
import os
from dotenv import load_dotenv
load_dotenv(".env")

# these variables are required to initialize Langchain AzureChatOpenAI instance
required_env_vars = [
    "AZURE_OPENAI_API_KEY",
    "AZURE_OPENAI_API_VERSION",
    "AZURE_OPENAI_ENDPOINT",
    "AZURE_OPENAI_MODEL",
    "AZURE_OPENAI_DEPLOYMENT_NAME",
    "AZURE_OPENAI_EMBEDDING_MODEL",
    "AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME",
]

for var in required_env_vars:
    if os.environ.get(var) is None:
        raise Exception(f"Missing `{var}` environment variable")


api_key = os.environ.get("AZURE_OPENAI_API_KEY", "")
api_version=os.environ.get("AZURE_OPENAI_API_VERSION", "2023-03-15-preview")
azure_endpoint=os.environ.get("AZURE_OPENAI_ENDPOINT", "https://public-api.grabgpt.managed.catwalk-k8s.stg-myteksi.com")
deployment_name=os.environ.get("AZURE_OPENAI_DEPLOYMENT_NAME", "gpt-4-turbo")
model=os.environ.get("AZURE_OPENAI_MODEL", "gpt-4-turbo")
embedding_model=os.environ.get("AZURE_OPENAI_EMBEDDING_MODEL", "text-embedding-3-large")
embedding_deployment_name=os.environ.get("AZURE_OPENAI_EMBEDDING_DEPLOYMENT_NAME", "text-embedding-3-large")

In [None]:
from typing import Dict, Any
from langchain.callbacks.base import BaseCallbackHandler

class LoggingHandler(BaseCallbackHandler):
    def on_chain_start(
        self, serialized: Dict[str, Any], inputs: Dict[str, Any], **kwargs
    ) -> None:
        print(f"Chain {serialized.get('name')} started")

    def on_chain_end(self, outputs: Dict[str, Any], **kwargs) -> None:
        print(f"Chain ended, outputs: {outputs}")

In [None]:
from IPython.display import  Markdown, display

def beatify_markdown(input: str):
    display(Markdown(input))

In [None]:
from langchain_chroma import Chroma
from langchain_openai import AzureOpenAIEmbeddings

embedding = AzureOpenAIEmbeddings(
    api_key=api_key,
    api_version=api_version,
    azure_deployment=embedding_deployment_name,
)

persist_directory = "./docs/chroma"
vectordb = Chroma(
    persist_directory=persist_directory,
    embedding_function=embedding,
)

vectordb._collection.count()

In [None]:
from langchain_core.runnables import RunnableSequence, RunnablePassthrough

# Retrieve chain that retrieves the candidates from the key "input" and pass to key "context"
retriever_chain = RunnablePassthrough.assign(
    context=RunnableSequence(
        (lambda x: x["input"]),
        vectordb.as_retriever(),
    ),
).with_config(run_name="retriever")

In [None]:
from langchain_openai import AzureChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

llm = AzureChatOpenAI(
    api_key=api_key,
    api_version=api_version,
    azure_deployment=deployment_name,
)
QA_SYSTEM_PROMPT = """
You are a helpful advisor, collaborating with other agents. \
Don't assume anything you don't know. Use the context below to answer the user question\
Think carefully about the question and provide the best answer you can. \
If you are unable to fully answer, that's OK, just leave follow-up questions for the user. \
You must response in the markdown format. \n

context: {context}.\n
input: {input}
helpful answer:
"""
qa_prompt = ChatPromptTemplate.from_template(template=QA_SYSTEM_PROMPT)
answer_chain = RunnablePassthrough.assign(
    answer=RunnableSequence(
        qa_prompt,
        llm,
        StrOutputParser(),
    ),
).with_config(run_name="LLM")

In [None]:
user_query =(
    "I have error `Fail to push image` while running cop_image:envoy-base step in "
    "pre stage while setting up t6 fabric pipeline, how to resolve it?"
)


## Cook the answer with LLM

After retrieve the relevant information from the retrievers, we will pass them into the LLM model to generate answer.

In this stage, we could use techniques to optimize the context of the question, such as:
- Naive query
- Map-Reduce 
- Refine answer
- Map-rerank

### Naive context

Finally, after selecting a set of relevant documents for the user query, the next step is to pass all this data into the LLM to generate the answers. A typical strategy is to concatenate all the text from the documents into the prompt.

![naive_query](./public/naive_context.webp)

In [None]:
from typing import List, Dict, Any
from langchain_core.documents import Document
from langchain_core.runnables import RunnableSequence
from langchain_core.output_parsers import StrOutputParser


def naive_context(_input: Dict[str, Any]) -> Dict[str, Any]:
    # we will read relevant document in the field "context"
    # then concatenate the documents to form the context
    # and pass to the field "context"
    documents: List[Document] = _input.get("context", [])
    context = "\n".join([doc.page_content for doc in documents])
    return {
        **_input,
        "context": context,
    }


naive_query_chain = RunnableSequence(
    retriever_chain,
    naive_context,
    answer_chain,
)

In [None]:
answer = naive_query_chain.with_config(callbacks=[LoggingHandler()]).invoke({"input": user_query})


In [None]:
beatify_markdown(answer['answer'])

### Map-Reduce

Naive context has a weakness, is that the length of our context can be very large if the number of documents is high. This will cause the context of the LLM to expand and sometimes suffer from context overflow.

Map-reduce strategy, instead of passing documents directly to answer the query, is to iterate through the documents to extract information that may be relevant to answering the question.

This is particularly useful when we have many documents because we can extract information, synthesize results, and repeat this process until the combined text fits the context length of the LLM.

![map_reduce](./public/map_reduce_context.webp)

In [None]:
EXTRACT_SYSTEM_PROMPT = """
You are expert in extracting information from the text.
Use the context below to extract the relevant information,
these information will be used by other agents to answer the user questions.

The more information you can extract the better, but don't extract irrelevant information,
If you don't find any information, that's OK, just leave the original content.
context: {context}.

question: {input}

valuable information:
"""

extract_prompt = ChatPromptTemplate.from_template(
    template=EXTRACT_SYSTEM_PROMPT,
)
extract_chain = RunnablePassthrough.assign(
    context=RunnableSequence(
        extract_prompt,
        llm,
        StrOutputParser(),
    ),
).with_config(run_name="EXTRACTOR")


def reduce_document(inputs: Dict[str, Any]) -> str:
    extracted_contexts = inputs.get("context", "")
    return {
        **inputs,
        "context": "\n".join(extracted_contexts),
    }


REDUCE_SYSTEM_PROMPT = """
Combine these summaries: {context}
"""
reduce_prompt = ChatPromptTemplate.from_template(
    template=REDUCE_SYSTEM_PROMPT,
)
reduce_chain = RunnablePassthrough.assign(
    context = RunnableSequence(
        reduce_document,
        reduce_prompt,
        llm,
        StrOutputParser(),
    ),
)

map_reduce_chain = RunnableSequence(
    retriever_chain,
    extract_chain,
    reduce_chain,
    answer_chain,
).with_config(callbacks=[LoggingHandler()])


In [None]:
answer = map_reduce_chain.invoke({"input": user_query})

In [None]:
beatify_markdown(answer['answer'])

### Refine answer

Refine answer strategy is to gradually refine the answer when iterating through documents and incorporating context.

![refine_answer](./public/refine_context.gif)

In [None]:
from langchain_core.runnables import Runnable


INITIAL_SYSTEM_PROMPT = """
Summarize this context, the summary will be used by other agents to answer the user questions.
context: {context}
question: {input}
"""
summarize_prompt = ChatPromptTemplate.from_template(INITIAL_SYSTEM_PROMPT)
initial_llm_chain = RunnableSequence(
    summarize_prompt,
    llm,
    StrOutputParser(),
)

REFINE_SYSTEM_PROMPT = """
Given your last context summary, and the updated context, refine your summary.
Updated summary will be used by other agents to answer the user questions.
Prev summary: {prev_response}
Updated context: {context}
Question: {input}
"""

refine_prompt = ChatPromptTemplate.from_template(REFINE_SYSTEM_PROMPT)
refine_llm_chain = RunnableSequence(
    refine_prompt,
    llm,
    StrOutputParser(),
)

def refine_chain(
    initial_llm_chain,
    refine_llm_chain,
)-> Runnable:
    def helper(inputs: Dict[str, Any]) -> Dict[str, Any]:
        documents: List[Document] = inputs.get("context", [])
        summary = initial_llm_chain.invoke(inputs)
        for doc in documents[1:]:
            prev_response=summary
            summary = refine_llm_chain.invoke(
                {**inputs, "prev_response": prev_response, "context": doc.page_content}
            )

        return {
            **inputs,
            "context": f"{inputs['prev_response']}\n{inputs['context']}",
        }
    return helper


refine_answer_chain = RunnableSequence(
    retriever_chain,
    refine_chain(initial_llm_chain, refine_llm_chain),
    answer_chain,
).with_config(callbacks=[LoggingHandler()])

In [None]:
refine_answer_chain.invoke({"input": user_query})

### Map-rerank

This strategy iterates through the documents and attempts to answer accompanying questions with a score indicating the level of confidence in the answer.

These answers then will be reranked based on the score, the highest score will be selected as the final answer.

![map_rerank](./public/map_rerank.webp)

Note: `ArgMax` is a function to return the argument that give the maximum value of a function.