# Rag From Scratch: Transformation

Query transformations are a set of approaches focused on re-writing and / or modifying questions for retrieval.

<img src="../../data/images/rag.png"  />

## Part 5: Multi Query


### Intuition


Here the intuition is to take a user question and to break it down to multiple questions that ask the same thing but from multiple perspectives. We rewrite the question in different ways, we might increase of getting the document that we want to because of the different nuances that might be caught by rewriting the question in multiple ways. It is like using a shotgun over the embeddings space, we have multiple chances to get the right answer

The common documents (union of docs) returned by the multiple queries will be used as the context for the final answer 

<img src="../../data/images/multi-query.png"  />

In [1]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

True

In [2]:
hugging_face_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")
langchain_token = os.getenv("LANGCHAIN_API_KEY")
pinecone_api_key = os.getenv("PINECONE_API_KEY")
pinecone_env = os.getenv("PINECONE_ENV")

Imports

In [19]:
import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_huggingface import HuggingFaceEndpoint
from langchain.prompts import ChatPromptTemplate
from tqdm import tqdm
import tiktoken
import numpy as np
from operator import itemgetter
from langchain.load import dumps, loads

LLM Model

In [10]:
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
llm = HuggingFaceEndpoint(repo_id=repo_id,
                          huggingfacehub_api_token=hugging_face_token,
                          temperature=0.1)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to C:\Users\Hori\.cache\huggingface\token
Login successful


Indexing

In [4]:
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

Splitting

In [8]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300, 
    chunk_overlap=50)

splits = text_splitter.split_documents(blog_docs)

Embeddings

In [11]:
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=HuggingFaceEmbeddings())

In [12]:
retriever = vectorstore.as_retriever()

Prompt : Multi Query: Different Perspectives

In [14]:
template = """You are an AI language model assistant. Your task is to generate five 
different versions of the given user question to retrieve relevant documents from a vector 
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Provide these alternative questions separated by newlines. Original question: {question}"""

prompt_perspectives = ChatPromptTemplate.from_template(template)

In [15]:
generate_queries = (
    prompt_perspectives 
    | llm
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

Method to get the union of docs

In [17]:
def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    return [loads(doc) for doc in unique_docs]

Retrieve the docs

In [18]:
question = "What is task decomposition for LLM agents?"

retrieval_chain = generate_queries | retriever.map() | get_unique_union
docs = retrieval_chain.invoke({"question": question})
len(docs)

  warn_beta(


9

RAG

In [20]:
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [21]:
final_rag_chain = (
    {"context": retrieval_chain, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"question":question})

'\nAnswer: Task decomposition is a process in which a complex task is broken down into smaller, manageable subgoals by an LLM-powered autonomous agent. This enables efficient handling of complex tasks and improves the quality of final results. It can be done by LLM with simple prompting, using task-specific instructions, or with human inputs. Techniques like Chain of Thought (CoT) and Tree of Thoughts (ToT) are commonly used for task decomposition. CoT instructs the model to "think step by step" to decompose hard tasks into smaller and simpler steps, while ToT explores multiple reasoning possibilities at each step and creates a tree structure.'

## Part 6: RAG-Fusion

It is similar to Multi-Query, the difference is that we provide a ranking of the retreived documents. The break down step is simmilar to the previous Multi-query

<img src="../../data/images/rag-fusion.png"  />

Prompt: RAG-Fusion: Related

In [26]:
template = """You are a helpful assistant that generates multiple search queries based on a single input query. \n
Generate multiple search queries related to: {question} \n
Output (4 queries):"""
prompt_rag_fusion = ChatPromptTemplate.from_template(template)

In [27]:
generate_queries = (
    prompt_rag_fusion
    | llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

Create the ranking of documents mechanism

In [28]:
def reciprocal_rank_fusion(results: list[list], k=60):
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents 
        and an optional parameter k used in the RRF formula """
    
    # Initialize a dictionary to hold fused scores for each unique document
    fused_scores = {}

    # Iterate through each list of ranked documents
    for docs in results:
        # Iterate through each document in the list, with its rank (position in the list)
        for rank, doc in enumerate(docs):
            # Convert the document to a string format to use as a key (assumes documents can be serialized to JSON)
            doc_str = dumps(doc)
            # If the document is not yet in the fused_scores dictionary, add it with an initial score of 0
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # Retrieve the current score of the document, if any
            previous_score = fused_scores[doc_str]
            # Update the score of the document using the RRF formula: 1 / (rank + k)
            fused_scores[doc_str] += 1 / (rank + k)

    # Sort the documents based on their fused scores in descending order to get the final reranked results
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results

retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion
docs = retrieval_chain_rag_fusion.invoke({"question": question})
len(docs)

8

RAG

In [29]:
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    {"context": retrieval_chain_rag_fusion, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"question":question})

'\nAnswer: Task decomposition for LLM agents refers to the process of breaking down large tasks into smaller, manageable subgoals. This enables efficient handling of complex tasks by the agent. It can be done using simple prompting, task-specific instructions, or human inputs. The goal is to transform big tasks into multiple manageable tasks, providing insights into the model\'s thinking process. Techniques like Chain of Thought (CoT) and Tree of Thoughts (ToT) are commonly used for task decomposition in LLM agents. CoT instructs the model to "think step by step," while ToT explores multiple reasoning possibilities at each step, creating a tree structure. The search process can be breadth-first search (BFS) or depth-first search (DFS), with each state evaluated by a classifier or majority vote.'

## Part 7: Decomposition 

We want to decompose a user question to improve retrieval