# RAG From scratch: Query Transformations

## Purpose

This notebook has the purpose to explain query transformation. A simple definition, query transformation are a set of approaches focused on re-writing and/or modifying questions for retrieval.

- Multi-Query
    - [Multi-Query Docs](https://python.langchain.com/docs/modules/data_connection/retrievers/MultiQueryRetriever)
- RAG-Fusion
    - [RAG-Fusion Docs](https://github.com/langchain-ai/langchain/blob/master/cookbook/rag_fusion.ipynb?ref=blog.langchain.dev)
    - [Forget RAG the future is RAG Fusion](https://towardsdatascience.com/forget-rag-the-future-is-rag-fusion-1147298d8ad1)
- Decomposition
    - Asnwer recursively
        - [LEAST-TO-MOST PROMPTING ENABLES COMPLEX REASONING IN LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2205.10625)
        - [Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions](https://arxiv.org/abs/2212.10509.pdf)
    - Answer Individualy
- Step-back
    - [Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models](https://arxiv.org/pdf/2310.06117)
- HyDE
    - [Improve Document indexing with HyDE](https://github.com/langchain-ai/langchain/blob/master/cookbook/hypothetical_document_embeddings.ipynb)
    - [Precise Zero-Shot Dense Retrieval without Relevance Labels](https://github.com/langchain-ai/langchain/blob/master/cookbook/hypothetical_document_embeddings.ipynb)

## Environment

`(1) Packages`

In [1]:
!pip install langchain_community langchain_openai langchainhub langchain -q
!pip install tiktoken chromadb -q

[?25l   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/2.5 MB[0m [31m?[0m eta [36m-:--:--[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[90m╺[0m[90m━━━━━━━━[0m [32m1.9/2.5 MB[0m [31m57.3 MB/s[0m eta [36m0:00:01[0m[2K   [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m [32m2.5/2.5 MB[0m [31m47.2 MB/s[0m eta [36m0:00:01[0m[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.5/2.5 MB[0m [31m25.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m50.9/50.9 kB[0m [31m2.7 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.2/1.2 MB[0m [31m28.4 MB/s[0m eta [36m0:00:00[0m
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m49.3/49.3 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m67.3/67.3 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies 

`(2) LangSmith`

In [2]:
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_ENDPOINT"] = "https://api.smith.langchain.com"

In [None]:
api_key = os.getenv("LANGCHAIN_API_KEY")
if api_key:
    os.environ["LANGCHAIN_API_KEY"] = api_key
else:
    api_key = input("Enter your API key: ")
    os.environ["LANGCHAIN_API_KEY"] = api_key

`(3) API Keys`

In [None]:
api_key = os.getenv("OPENAI_API_KEY")
if api_key:
    os.environ["OPENAI_API_KEY"] = api_key
else:
    api_key = input("Enter your API key: ")
    os.environ["OPENAI_API_KEY"] = api_key

## Multi Query

### Indexing

In [7]:
# Load blog
import bs4
from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=300,
    chunk_overlap=100,
)

# Make splits
splits = text_splitter.split_documents(blog_docs)

# Index
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=OpenAIEmbeddings(),
)
retriever = vectorstore.as_retriever()

### Prompt

In [8]:
from langchain.prompts import ChatPromptTemplate

# Multi Query: Different Perspectives
template = """You are AI language model assistant. Your task is to generate five
different versions of the given user question to retrieve relevant documents from
a vector database. By generating multiple perspectives on the user question,
your goal is to help the user overcome some of the limitations of the
distance-based similarity search. Provide these alternative questions separeted
by new lines. Original question: {question}
"""
prompt_perspectives = ChatPromptTemplate.from_template(template)

from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

generate_queries = (
    prompt_perspectives
    | ChatOpenAI(temperature=0)
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

In [10]:
from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """Unique union of retrieved docs"""
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]

    # Get unique documents
    unique_docs = list(set(flattened_docs))

    # Convert back to Document
    unique_docs = [loads(doc) for doc in unique_docs]

    return unique_docs

# Retrieve
question = "What is task decomposition for LLM agents?"
retrieval_chain = generate_queries | retriever.map() | get_unique_union
docs = retrieval_chain.invoke({"question": question})
len(docs)

5

In [11]:
from operator import itemgetter
from langchain_openai import ChatOpenAI
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

llm = ChatOpenAI(temperature=0)

final_rag_chain = (
    {
        "context": retrieval_chain,
        "question": itemgetter("question"),
    }
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke(
    {
        "question": question
    }
)


'Task decomposition for LLM agents involves breaking down large tasks into smaller, manageable subgoals using methods such as simple prompting, task-specific instructions, and human inputs. This allows the agent to effectively plan and navigate through complex tasks.'

In [12]:
from langchain.prompts import ChatPromptTemplate

# RAG-Fusion: Related
template = """You are helpful assistant that generates multiple search queries
based on a simple input query. \n
Generate multiple search queries related to: {question} \n
Output (4 queries):
"""
prompt_rag_fusion = ChatPromptTemplate.from_template(template)

In [14]:
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

generate_queries = (
    prompt_rag_fusion
    | ChatOpenAI(temperature=0)
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

In [15]:
from langchain.load import dumps, loads

def reciprocal_rank_fusion(results: list[list], k=60):
    """Reciprocal_rank_fusion that takes multiple lists of ranked documents
    and an optional parameter k used in the RRF formula
    """

    # Initialize a dictionary to hold fused scores for each unique document
    fused_scores = {}

    # Iterate through each list of ranked documents
    for docs in results:
        # Iterate through each document in the list, with its rank (position
        # in the list)
        for rank, doc in enumerate(docs):
            # Conver the document to a string format to use as a key (assumes
            # documents can be serialized to JSON)
            doc_str = dumps(doc)
            # If the document is not yet in the fused_scores dictionary, add it
            # with an initial score of 0
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # Retrieve the current score of the document, if any
            previous_score = fused_scores[doc_str]
            # Update the score of the document using th RRF formula:
            # 1 / (rank + k)
            fused_scores[doc_str] += 1 / (rank + k)

    # Sort the documents based on their fused scores in desceding order to get
    # the final reranked results
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(
            fused_scores.items(),
            key=lambda x: x[1],
            reverse=True,
        )
    ]

    # Return the reranked results as a list of tuples, each containing the
    # document and its fused score
    return reranked_results

retrieval_chain_rag_fusion = (
    generate_queries
    | retriever.map()
    | reciprocal_rank_fusion
)

docs = retrieval_chain_rag_fusion.invoke(
    {
        "question": question
    }
)
len(docs)

7

In [16]:
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    {
        "context": retrieval_chain_rag_fusion,
        "question": itemgetter("question")
    }
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke(
    {
        "question": question
    }
)

'Task decomposition for LLM agents involves breaking down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks. This can be achieved through simple prompting, task-specific instructions, or with human inputs.'

## Decomposition

In [17]:
from langchain.prompts import ChatPromptTemplate

# Decomposition
template = """You are a helpful assistant that generates multiple sub-questions. \n
The goal is to break down the input into a set of sub-problems / sub-questions that
can be answers in isolation. \n
Generate multiple search queries related to: {question} \n
Output (3 queries):
"""
prompt_decomposition = ChatPromptTemplate.from_template(template)

In [18]:
from langchain_openai import ChatOpenAI
from langchain_core.output_parsers import StrOutputParser

# LLM
llm = ChatOpenAI(
    temperature=0
)

# Chain
generate_queries_decomposition = (
    prompt_decomposition
    | llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

# Run
question = "What are the main components of an LLM-powered autonomous agent system?"
questions = generate_queries_decomposition.invoke({"question": question})

In [20]:
questions

['1. What is LLM technology and how is it used in autonomous agent systems?',
 '2. What are the key components of an autonomous agent system?',
 '3. How does an LLM-powered system differ from other autonomous agent systems in terms of components and functionality?']

### Answer Recursively

In [21]:
# Prompt
template = """Here is the question you need to answer:
\n --- \n {question} \n --- \n

Here is any available background question + answer pairs:
\n --- \n {q_a_pairs} \n --- \n

Here is additional context relevant to the question:
\n --- \n {context} \n --- \n

Use the above context and any background question + answer pairs to answer
the question: \n {question}
"""

decomposition_prompt = ChatPromptTemplate.from_template(template)

In [22]:
from operator import itemgetter
from langchain_core.output_parsers import StrOutputParser

def format_qa_pairs(question, answer):
    """Format Q and A pair"""

    formatted_string = ""
    formatted_string += f"Question: {question}\n Answer: {answer}\n\n"
    return formatted_string.strip()


# llm
llm = ChatOpenAI(
    model_name="gpt-3.5-turbo", temperature=0
)

q_a_pairs = ""
for q in questions:
    rag_chain = (
        {
            "context": itemgetter("question") | retriever,
            "question": itemgetter("question"),
            "q_a_pairs": itemgetter("q_a_pairs")
        }
        | decomposition_prompt
        | llm
        | StrOutputParser()
    )

    answer = rag_chain.invoke(
        {
            "question": q,
            "q_a_pairs": q_a_pairs
        }
    )
    q_a_pairs += "\n--\n" + format_qa_pairs(q, answer)

In [23]:
answer

"In an LLM-powered system, LLM technology serves as the agent's brain, enabling it to understand and generate human language for communication and decision-making. This is a key component that sets it apart from other autonomous agent systems. Additionally, LLM-powered systems also include components such as planning mechanisms for breaking down complicated tasks into smaller steps, self-reflection capabilities for iterative improvement, memory mechanisms for storing and retrieving past experiences, and interaction with other agents in a coordinated manner.\n\nFurthermore, LLM-powered systems have been shown to handle tasks like scientific discovery, autonomous design, planning, and performance of complex experiments. They can browse the internet, read documentation, execute code, and interact with APIs to carry out various tasks autonomously. This level of functionality and versatility in handling complex tasks sets LLM-powered systems apart from other autonomous agent systems."

### Answer individually

In [24]:
# Answer each sub-question individually
from langchain import hub
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# RAG Prompt
prompt_rag = hub.pull("rlm/rag-prompt")

def retrieve_and_rag(question, prompt_rag, sub_question_generator_chain):
    """RAG on each sub-question"""

    # Use our decomposition
    sub_questions = sub_question_generator_chain.invoke(
        {
            "question": question
        }
    )

    # Initialize a list to hold RAG chain results
    rag_results = []

    for sub_question in sub_questions:
        # Retrieve documents for each sub-question
        retrieved_docs = retriever.get_relevant_documents(sub_question)

        # Use retrieved documents and sub-question in RAG chain
        answer = (
            prompt_rag
            | llm
            | StrOutputParser()
        ).invoke(
            {
                "context": retrieved_docs,
                "question": sub_question
            }
        )
        rag_results.append(answer)

    return rag_results, sub_questions


# Wrap the retrieval and RAG process in a RunnableLambda for integration
# into a chain
answers, questions = retrieve_and_rag(
    question=question,
    prompt_rag=prompt_rag,
    sub_question_generator_chain=generate_queries_decomposition,
)

  retrieved_docs = retriever.get_relevant_documents(sub_question)


In [25]:
def format_qa_pairs(questions, answers):
    """Format Q and A pairs"""
    formatted_string = ""

    for i, (question, answer) in enumerate(zip(questions, answers), start=1):
        formatted_string += f"Question {i}: {question}\n Answer: {answer}\n\n"
    return formatted_string.strip()

context = format_qa_pairs(questions, answers)

# Prompt
template = """Here is a set of Q+A pairs:
{context}

Use these to synthesize an answer to the question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)
final_rag_chain.invoke(
    {
        "context": context,
        "question": question
    }
)

'The main components of an LLM-powered autonomous agent system include LLM technology, which functions as the brain of the system, along with key components such as planning, task decomposition, self-reflection, memory, and interaction mechanisms. These components work together to enable agents to plan and execute complex tasks, learn from past experiences, and interact with other agents and external sources. By combining LLM technology with these key components, autonomous agents can enhance their performance in various tasks such as scientific discovery, autonomous design, and executing actions like browsing the internet and calling APIs.'

## Step Back

In [26]:
# Few Shot Examples
from langchain_core.prompts import (
    ChatPromptTemplate,
    FewShotChatMessagePromptTemplate,
)

examples = [
    {
        "input": "Could the members of The Police perform lawful arrests?",
        "output": "What can the members of the Police do?"
    },
    {
        "input": "Jan Sindel's was born in what country?",
        "output": "What is Jan Sindel's personal history?"
    },
]
# We now transform these to example messages
example_messages = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)
few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_messages,
    examples=examples,
)

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """Your are an expert at world knowledge. Your task is to step back
            and paraphrase a question to amore generic step-back question, which
            is easier to answer. Here are few examples:
            """
        ),
        # Few shot examples
        few_shot_prompt,
        # New question
        ("user", "{question}")
    ]
)

In [27]:
generate_queries_step_back = (
    prompt
    | ChatOpenAI(temperature=0)
    | StrOutputParser()
)
question = "What is task decomposition for LLM agents?"
generate_queries_step_back.invoke(
    {
        "question": question
    }
)

'What is task decomposition in general?'

In [28]:
# Response prompt
response_prompt_template = """You are an expert of world knowledge. I am going
to ask you a question. Your response should be comprehensive and not contradicted
with the following context if they are relevant.

# {normal_context}
# {step_back_context}

# Original Question: {question}
# Answer:
"""

response_prompt = ChatPromptTemplate.from_template(response_prompt_template)

chain = (
    {
        # Retrieve context using the normal question
        "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
        # Retrieve context using the step-back question
        "step_back_context": generate_queries_step_back | retriever,
        # Pass on the question
        "question": lambda x: x["question"],
    }
    | response_prompt
    | llm
    | StrOutputParser()
)

chain.invoke(
    {
        "question": question
    }
)

'Task decomposition for LLM agents refers to the process of breaking down large tasks into smaller, more manageable subgoals. This allows the agent to efficiently handle complex tasks by dividing them into smaller, more easily achievable steps. \n\nIn the context of LLM-powered autonomous agent systems, task decomposition is a crucial component of the overall system. The system comprises of four stages, with the first stage being task planning. In this stage, the LLM acts as the brain of the system and parses user requests into multiple tasks. Each task is associated with four attributes: task type, ID, dependencies, and arguments. The LLM uses few-shot examples to guide task parsing and planning.\n\nTask decomposition can be achieved in several ways within the LLM agent system. One method is through simple prompting, where the LLM is asked questions like "Steps for XYZ" or "What are the subgoals for achieving XYZ." Another method is by providing task-specific instructions, such as ask

## HyDE (Hypothetical Documents)

In [29]:
from langchain.prompts import ChatPromptTemplate

# HyDE document generation
template = """Please write a scientific paper passage to answer the question
Question: {question}
Passage:
"""
prompt_hyde = ChatPromptTemplate.from_template(template)

from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

generate_docs_for_retrieval = (
    prompt_hyde
    | ChatOpenAI(temperature=0)
    | StrOutputParser()
)

# Run
question = "What is task decomposition for LLM agents?"
generate_docs_for_retrieval.invoke(
    {
        "question": question
    }
)

"Task decomposition is a crucial concept in the field of reinforcement learning for Large Language Models (LLMs). LLM agents are complex models that are trained to perform a wide range of natural language processing tasks, such as text generation, question answering, and language translation. Task decomposition refers to the process of breaking down a complex task into smaller, more manageable sub-tasks that can be tackled individually by the agent.\n\nBy decomposing a task into smaller sub-tasks, LLM agents can effectively leverage their capabilities to solve complex problems. This approach allows the agent to focus on solving each sub-task independently, which can lead to more efficient and effective problem-solving. Additionally, task decomposition can help improve the interpretability and generalization of the agent's learned policies, as it allows for a more structured and modular approach to learning.\n\nOverall, task decomposition plays a crucial role in enabling LLM agents to e

In [30]:
# Retrieve
retrieval_chain = generate_docs_for_retrieval | retriever
retrieved_docs = retrieval_chain.invoke(
    {
        "question": question
    }
)
retrieved_docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Subgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.'),
 Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'

In [31]:
# RAG
template = """Answer the following question based on this context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke(
    {
        "context": retrieved_docs,
        "question": question
    }
)

'Task decomposition for LLM agents involves breaking down large tasks into smaller, manageable subgoals using simple prompting, task-specific instructions, or human inputs. This process enables efficient handling of complex tasks by the agent.'