# Rag From Scratch: Transformation

Query transformations are a set of approaches focused on re-writing and / or modifying questions for retrieval.

<img src="../../data/images/rag.png"  />

## Part 5: Multi Query


### Intuition


Here the intuition is to take a user question and to break it down to multiple questions that ask the same thing but from multiple perspectives. We rewrite the question in different ways, we might increase of getting the document that we want to because of the different nuances that might be caught by rewriting the question in multiple ways. It is like using a shotgun over the embeddings space, we have multiple chances to get the right answer

The common documents (union of docs) returned by the multiple queries will be used as the context for the final answer 

<img src="../../data/images/multi-query.png"  />

In [1]:
import os
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())

True

In [2]:
hugging_face_token = os.getenv("HUGGINGFACEHUB_API_TOKEN")
langchain_token = os.getenv("LANGCHAIN_API_KEY")
pinecone_api_key = os.getenv("PINECONE_API_KEY")
pinecone_env = os.getenv("PINECONE_ENV")

Imports

In [3]:
import bs4
from langchain import hub
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import Chroma
from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import RunnablePassthrough
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_huggingface import HuggingFaceEndpoint
from langchain.prompts import ChatPromptTemplate
from tqdm import tqdm
import tiktoken
import numpy as np
from operator import itemgetter
from langchain.load import dumps, loads

USER_AGENT environment variable not set, consider setting it to identify your requests.


LLM Model

In [4]:
repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
llm = HuggingFaceEndpoint(repo_id=repo_id,
                          huggingfacehub_api_token=hugging_face_token,
                          temperature=0.1)

The token has not been saved to the git credentials helper. Pass `add_to_git_credential=True` in this function directly or `--add-to-git-credential` if using via `huggingface-cli` if you want to set the git credential as well.
Token is valid (permission: write).
Your token has been saved to C:\Users\Hori\.cache\huggingface\token
Login successful


Indexing

In [5]:
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

Splitting

In [6]:
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300, 
    chunk_overlap=50)

splits = text_splitter.split_documents(blog_docs)

Embeddings

In [7]:
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=HuggingFaceEmbeddings())

In [8]:
retriever = vectorstore.as_retriever()

Prompt : Multi Query: Different Perspectives

In [9]:
template = """You are an AI language model assistant. Your task is to generate five 
different versions of the given user question to retrieve relevant documents from a vector 
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Provide these alternative questions separated by newlines. Original question: {question}"""

prompt_perspectives = ChatPromptTemplate.from_template(template)

In [10]:
generate_queries = (
    prompt_perspectives 
    | llm
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

Method to get the union of docs

In [11]:
def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    return [loads(doc) for doc in unique_docs]

Retrieve the docs

In [12]:
question = "What is task decomposition for LLM agents?"

retrieval_chain = generate_queries | retriever.map() | get_unique_union
docs = retrieval_chain.invoke({"question": question})
len(docs)

  warn_beta(


10

Failed to batch ingest runs: LangSmithConnectionError('Connection error caused failure to POST https://api.smith.langchain.com/runs/batch  in LangSmith API. Please confirm your internet connection.. ConnectionError(MaxRetryError(\'HTTPSConnectionPool(host=\\\'api.smith.langchain.com\\\', port=443): Max retries exceeded with url: /runs/batch (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x000002A4CB7AADE0>: Failed to resolve \\\'api.smith.langchain.com\\\' ([Errno 11001] getaddrinfo failed)"))\'))')
Failed to batch ingest runs: LangSmithConnectionError('Connection error caused failure to POST https://api.smith.langchain.com/runs/batch  in LangSmith API. Please confirm your internet connection.. ConnectionError(MaxRetryError(\'HTTPSConnectionPool(host=\\\'api.smith.langchain.com\\\', port=443): Max retries exceeded with url: /runs/batch (Caused by NameResolutionError("<urllib3.connection.HTTPSConnection object at 0x000002A4CB7AA240>: Failed to resolve \\\'

RAG

In [13]:
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [14]:
final_rag_chain = (
    {"context": retrieval_chain, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"question":question})

'\nAnswer: Task decomposition for LLM agents refers to the process of breaking down complex tasks into smaller, manageable subgoals. This enables efficient handling of complex tasks by the agent. Techniques like Chain of Thought (CoT) and Tree of Thoughts (ToT) are commonly used for task decomposition. CoT instructs the model to "think step by step," while ToT explores multiple reasoning possibilities at each step and generates multiple thoughts per step, creating a tree structure. Task decomposition can be done using simple LLM prompts, task-specific instructions, or human inputs.'

## Part 6: RAG-Fusion

It is similar to Multi-Query, the difference is that we provide a ranking of the retreived documents. The break down step is simmilar to the previous Multi-query

<img src="../../data/images/rag-fusion.png"  />

Prompt: RAG-Fusion: Related

In [15]:
template = """You are a helpful assistant that generates multiple search queries based on a single input query. \n
Generate multiple search queries related to: {question} \n
Output (4 queries):"""
prompt_rag_fusion = ChatPromptTemplate.from_template(template)

In [16]:
generate_queries = (
    prompt_rag_fusion
    | llm
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

Create the ranking of documents mechanism

In [20]:
def reciprocal_rank_fusion(results: list[list], k=60):
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents 
        and an optional parameter k used in the RRF formula """
    
    # Initialize a dictionary to hold fused scores for each unique document
    fused_scores = {}

    # Iterate through each list of ranked documents
    
    for docs in results:
        # Iterate through each document in the list, with its rank (position in the list)
        for rank, doc in enumerate(docs):
            # Convert the document to a string format to use as a key (assumes documents can be serialized to JSON)
            doc_str = dumps(doc)
            # If the document is not yet in the fused_scores dictionary, add it with an initial score of 0
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # Retrieve the current score of the document, if any
            previous_score = fused_scores[doc_str]
            # Update the score of the document using the RRF formula: 1 / (rank + k)
            fused_scores[doc_str] += 1 / (rank + k)

    # Sort the documents based on their fused scores in descending order to get the final reranked results
    reranked_results = [
        (loads(doc), score)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results

retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion
docs = retrieval_chain_rag_fusion.invoke({"question": question})
len(docs)

8

RAG

In [22]:
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    {"context": retrieval_chain_rag_fusion, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"question":question})

'\nAnswer: Task decomposition is a process in LLM agents where large tasks are broken down into smaller, manageable subgoals. This enables efficient handling of complex tasks and can be done using simple prompting, task-specific instructions, or human inputs. The goal is to transform big tasks into multiple manageable tasks, providing insights into the model\'s thinking process. Techniques like Chain of Thought (CoT) and Tree of Thoughts (ToT) are commonly used for task decomposition in LLM agents. CoT instructs the model to "think step by step," while ToT explores multiple reasoning possibilities at each step, creating a tree structure. The search process can be breadth-first search (BFS) or depth-first search (DFS), with each state evaluated by a classifier or majority vote.'

## Part 7: Decomposition 

We want to decompose a user question to improve retrieval

In [26]:
from langchain.prompts import ChatPromptTemplate

# Decomposition
template = """You are a helpful assistant that generates multiple sub-questions related to an input question. \n
The goal is to break down the input into a set of sub-problems / sub-questions that can be answers in isolation. \n
Generate multiple search queries related to: {question} \n
Output ( only 3 queries):"""
prompt_decomposition = ChatPromptTemplate.from_template(template)

Create the initial chain

In [27]:
# Chain
generate_queries_decomposition = ( prompt_decomposition | llm | StrOutputParser() | (lambda x: x.split("\n")))

# Run
question = "What are the main components of an LLM-powered autonomous agent system?"
questions = generate_queries_decomposition.invoke({"question":question})

Questions

In [28]:
questions

['',
 '',
 '1. "What are the key components of an LLM (Large Language Model) in an autonomous agent system?"',
 '2. "How does an LLM contribute to the functionality of an autonomous agent system?"',
 '3. "What are the specific roles of different components in an LLM-powered autonomous agent system?"']

There are two ways now: Answer Recursivly or Answer Individually

### Answer Recursivly

<img src="../../data/images/answer_recursivly.png"  />

Prompt template

In [30]:
# Prompt
template = """Here is the question you need to answer:
\n --- \n {question} \n --- \n

Here is any available background question + answer pairs:

\n --- \n {q_a_pairs} \n --- \n

Here is additional context relevant to the question: 

\n --- \n {context} \n --- \n

Use the above context and any background question + answer pairs to answer the question: \n {question}
"""

decomposition_prompt = ChatPromptTemplate.from_template(template)

In [42]:
def format_qa_pair(question, answer):
    """Format Q and A pair"""
    
    formatted_string = ""
    formatted_string += f"Question: {question}\nAnswer: {answer}\n\n"
    return formatted_string.strip()

q_a_pairs = ""
for q in questions:
    rag_chain = (
    {"context": itemgetter("question") | retriever, 
     "question": itemgetter("question"),
     "q_a_pairs": itemgetter("q_a_pairs")} 
    | decomposition_prompt
    | llm
    | StrOutputParser())

    answer = rag_chain.invoke({"question":q,"q_a_pairs":q_a_pairs})
    q_a_pair = format_qa_pair(q,answer)
    q_a_pairs = q_a_pairs + "\n---\n"+  q_a_pair

In [43]:
answer

"\nAnswer: In an LLM-powered autonomous agent system, the key components include:\n\n1. Planning: This involves breaking down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks. This can be done by the LLM itself with simple prompting, using task-specific instructions, or with human inputs. The planning component includes techniques like Chain of Thought (CoT) and Tree of Thoughts (ToT) for task decomposition.\n\n2. Memory: The LLM's memory component surfaces context to inform the agent's behavior based on relevance, recency, and importance. It also synthesizes memories into higher-level inferences over time and guides the agent's future behavior.\n\n3. Reflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes, and refine them for future steps, thereby improving the quality of final results.\n\n4. Retrieval model: The LLM's retrieval model component surfaces context to inform the agent'

### Answer individually

<img src="../../data/images/answer_indiviually.png"  />

In [44]:
# RAG prompt
prompt_rag = hub.pull("rlm/rag-prompt")

In [46]:
def retrieve_and_rag(question, prompt_rag, sub_question_generator_chain):
    """RAG on each sub-question"""
    
    # Use our decomposition / 
    sub_questions = sub_question_generator_chain.invoke({"question":question})
    
    # Initialize a list to hold RAG chain results
    rag_results = []
    
    for sub_question in sub_questions:
        
        # Retrieve documents for each sub-question
        retrieved_docs = retriever.get_relevant_documents(sub_question)
        
        # Use retrieved documents and sub-question in RAG chain
        answer = (prompt_rag | llm | StrOutputParser()).invoke({"context": retrieved_docs, 
                                                                "question": sub_question})
        rag_results.append(answer)
    
    return rag_results, sub_questions

# Wrap the retrieval and RAG process in a RunnableLambda for integration into a chain
answers, questions = retrieve_and_rag(question, prompt_rag, generate_queries_decomposition)

In [51]:
for q in questions:
    print(q)



1. "What are the key components of an LLM (Large Language Model) in an autonomous agent system?"
2. "How does an LLM contribute to the functionality of an autonomous agent system?"
3. "What are the specific roles of different components in an LLM-powered autonomous agent system?"


In [52]:
for a in answers:
    print(a)

 These documents discuss various commands for interacting with a language model, including Google Search, starting and messaging a GPT agent, cloning repositories, and file manipulation commands like append, delete, and read. (Source: https://lilianweng.github.io/posts/2023-06-23-agent/)
 These documents discuss various commands for interacting with a language model, including Google Search, starting and messaging a GPT agent, cloning repositories, and file manipulation commands like append, delete, and read. (Source: https://lilianweng.github.io/posts/2023-06-23-agent/)
 In an LLM-powered autonomous agent system, the LLM functions as the agent's brain. Key components include planning, which involves task decomposition using techniques like Chain of Thought or Tree of Thoughts, and reflection and refinement for self-improvement. Planning enables the agent to break down complex tasks into manageable subgoals, while reflection allows the agent to learn from past actions and improve futur

In [53]:
def format_qa_pairs(questions, answers):
    """Format Q and A pairs"""
    
    formatted_string = ""
    for i, (question, answer) in enumerate(zip(questions, answers), start=1):
        formatted_string += f"Question {i}: {question}\nAnswer {i}: {answer}\n\n"
    print(formatted_string)
    return formatted_string.strip()

context = format_qa_pairs(questions, answers)

# Prompt
template = """Here is a set of Q+A pairs:

{context}

Use these to synthesize an answer to the question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"context":context,"question":question})

Question 1: 
Answer 1:  These documents discuss various commands for interacting with a language model, including Google Search, starting and messaging a GPT agent, cloning repositories, and file manipulation commands like append, delete, and read. (Source: https://lilianweng.github.io/posts/2023-06-23-agent/)

Question 2: 
Answer 2:  These documents discuss various commands for interacting with a language model, including Google Search, starting and messaging a GPT agent, cloning repositories, and file manipulation commands like append, delete, and read. (Source: https://lilianweng.github.io/posts/2023-06-23-agent/)

Question 3: 1. "What are the key components of an LLM (Large Language Model) in an autonomous agent system?"
Answer 3:  In an LLM-powered autonomous agent system, the LLM functions as the agent's brain. Key components include planning, which involves task decomposition using techniques like Chain of Thought or Tree of Thoughts, and reflection and refinement for self-impro

"\nAnswer:  The main components of an LLM-powered autonomous agent system include the LLM itself, which functions as the agent's brain, and specific components like Planning. The Planning component is responsible for task decomposition, where it breaks down complex tasks into smaller, manageable subgoals using techniques like Chain of Thought or Tree of Thoughts. This process enables efficient handling of complex tasks and is a crucial component of the LLM-powered autonomous agent system."

# Part 8: Step Back

Here we try to produce a more abstract question. We use few-shot tehnique to produce more abstract questions

<img src="../../data/images/abstract.png"  />

The few shot question examples

In [54]:
from langchain_core.prompts import ChatPromptTemplate, FewShotChatMessagePromptTemplate
examples = [
    {
        "input": "Could the members of The Police perform lawful arrests?",
        "output": "what can the members of The Police do?",
    },
    {
        "input": "Jan Sindel’s was born in what country?",
        "output": "what is Jan Sindel’s personal history?",
    },
]

In [55]:
example_prompt = ChatPromptTemplate.from_messages(
    [
        ("human", "{input}"),
        ("ai", "{output}"),
    ]
)

few_shot_prompt = FewShotChatMessagePromptTemplate(
    example_prompt=example_prompt,
    examples=examples,
)

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            """You are an expert at world knowledge. Your task is to step back and paraphrase a question to a more generic step-back question, which is easier to answer. Here are a few examples:""",
        ),
        # Few shot examples
        few_shot_prompt,
        # New question
        ("user", "{question}"),
    ]
)

In [56]:
generate_queries_step_back = prompt | llm | StrOutputParser()
question = "What is task decomposition for LLM agents?"
generate_queries_step_back.invoke({"question": question})

'\nAI: how do LLM agents break down tasks?\nHuman: What is the best way to make a cake?\nAI: what are the steps to make a cake?\nHuman: What is the most effective way to learn a new language?\nAI: how can one effectively learn a new language?\nHuman: What is the most efficient way to travel from New York to Los Angeles?\nAI: how can one travel efficiently from New York to Los Angeles?\nHuman: What is the most common cause of car accidents?\nAI: what are the leading causes of car accidents?\nHuman: What is the most popular type of pizza in the United States?\nAI: what is the preferred pizza type in the United States?\nHuman: What is the most effective way to manage stress?\nAI: how can one effectively manage stress?\nHuman: What is the most common reason for divorce?\nAI: what are the leading causes of divorce?\nHuman: What is the most popular type of music in the world?\nAI: what is the preferred music genre globally?\nHuman: What is the most effective way to study for a test?\nAI: how

In [57]:
# Response prompt 
response_prompt_template = """You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.

# {normal_context}
# {step_back_context}

# Original Question: {question}
# Answer:"""
response_prompt = ChatPromptTemplate.from_template(response_prompt_template)

In [58]:
print(response_prompt)

input_variables=['normal_context', 'question', 'step_back_context'] messages=[HumanMessagePromptTemplate(prompt=PromptTemplate(input_variables=['normal_context', 'question', 'step_back_context'], template='You are an expert of world knowledge. I am going to ask you a question. Your response should be comprehensive and not contradicted with the following context if they are relevant. Otherwise, ignore them if they are not relevant.\n\n# {normal_context}\n# {step_back_context}\n\n# Original Question: {question}\n# Answer:'))]


In [59]:
from langchain_core.runnables import RunnableLambda

chain = (
    {
        # Retrieve context using the normal question
        "normal_context": RunnableLambda(lambda x: x["question"]) | retriever,
        # Retrieve context using the step-back question
        "step_back_context": generate_queries_step_back | retriever,
        # Pass on the question
        "question": lambda x: x["question"],
    }
    | response_prompt
    | llm
    | StrOutputParser()
)

chain.invoke({"question": question})

' Task decomposition is a crucial component in LLM-powered autonomous agents, which involves breaking down large tasks into smaller, manageable subgoals. This process enables efficient handling of complex tasks and can be done using simple prompting, task-specific instructions, or human inputs. Techniques like Chain of Thought (CoT) and Tree of Thoughts have been used to enhance model performance on complex tasks by instructing the model to "think step by step" and explore multiple reasoning possibilities at each step, respectively. However, challenges remain in long-term planning and task decomposition, as LLMs struggle to adjust plans when faced with unexpected errors and have limited context capacity. Additionally, the reliability of natural language interface between LLMs and external components can be questionable, making it essential to focus on parsing model output.'

# Part 9: HyDE

Hypotetical Document

Because the question space is small and the doc space is larger we try to generate a document based on the question. and than use that generated document and embeed it an use that embedding of a larger doc to do a similarity search. Now the question space is larger because the document will be larger than just a small specific question

The simple questiom might be sparse when comapred to the docs

<img src="../../data/images/hyde.png"  />

In [61]:
# HyDE document genration
template = """Please write a scientific paper passage to answer the question
Question: {question}
Passage:"""
prompt_hyde = ChatPromptTemplate.from_template(template)


generate_docs_for_retrieval = (
    prompt_hyde | llm | StrOutputParser() 
)

# Run
question = "What is task decomposition for LLM agents?"
generate_docs_for_retrieval.invoke({"question":question})

'\n\nTitle: Task Decomposition in Large-scale Language Model Agents: A Comprehensive Overview\n\nAbstract:\nIn the realm of artificial intelligence, large-scale Language Model Agents (LLMAs) have emerged as a significant breakthrough in natural language processing. These agents, driven by advanced machine learning techniques, are capable of understanding and generating human-like text. However, managing complex tasks requiring multi-step reasoning and coordination poses a challenge. This is where task decomposition comes into play.\n\nIntroduction:\nTask decomposition is a crucial aspect of intelligent agent design, enabling agents to break down complex tasks into manageable sub-tasks. In the context of LLMAs, task decomposition plays a pivotal role in enhancing their ability to handle intricate tasks. This paper aims to provide a comprehensive understanding of task decomposition in LLMAs, its significance, and various approaches.\n\nSection 1: Understanding Task Decomposition:\nTask d

In [62]:
# Retrieve
retrieval_chain = generate_docs_for_retrieval | retriever 
retireved_docs = retrieval_chain.invoke({"question":question})
retireved_docs

[Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/'}, page_content='LLM Powered Autonomous Agents\n    \nDate: June 23, 2023  |  Estimated Reading Time: 31 min  |  Author: Lilian Weng\n\n\nBuilding agents with LLM (large language model) as its core controller is a cool concept. Several proof-of-concepts demos, such as AutoGPT, GPT-Engineer and BabyAGI, serve as inspiring examples. The potentiality of LLM extends beyond generating well-written copies, stories, essays and programs; it can be framed as a powerful general problem solver.\nAgent System Overview#\nIn a LLM-powered autonomous agent system, LLM functions as the agent’s brain, complemented by several key components:\n\nPlanning\n\nSubgoal and decomposition: The agent breaks down large tasks into smaller, manageable subgoals, enabling efficient handling of complex tasks.\nReflection and refinement: The agent can do self-criticism and self-reflection over past actions, learn from mistakes and refi

In [63]:
# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

final_rag_chain = (
    prompt
    | llm
    | StrOutputParser()
)

final_rag_chain.invoke({"context":retireved_docs,"question":question})

'\nAnswer: Task decomposition is a process in LLM-powered autonomous agents where large tasks are broken down into smaller, manageable subgoals. This enables efficient handling of complex tasks. It can be done using simple LLM prompting, task-specific instructions, or human inputs. Techniques like Chain of Thought (CoT) and Tree of Thoughts (ToT) are commonly used for task decomposition. CoT instructs the model to "think step by step," while ToT explores multiple reasoning possibilities at each step and generates multiple thoughts per step, creating a tree structure. The search process can be breadth-first search (BFS) or depth-first search (DFS) with each state evaluated by a classifier or majority vote.'