# RAG part 2

## Query transformation

The idea is that we can transform the query into something which can make retrieval easier

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()
os.environ['LANGCHAIN_TRACING_V2'] = 'true'
os.environ['LANGCHAIN_ENDPOINT'] = 'https://api.smith.langchain.com'
os.environ['LANGCHAIN_API_KEY'] = os.getenv("LANGCHAIN_API_KEY")
os.environ['OPENAI_API_KEY'] = os.getenv("OPENAI_API_KEY")

### Idea 1: Multi query

A single query be reframed into multiple queries, so that we can query into a vector DB multiple times and take the union of the documents fetched everytime. 
![rag-multi-query](rag_part_2_multi_query.png)

In [2]:
#### INDEXING ####

# Load blog
import bs4
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs=dict(
        parse_only=bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    ),
)
blog_docs = loader.load()

# Split
from langchain.text_splitter import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter.from_tiktoken_encoder(
    chunk_size=300, 
    chunk_overlap=50)

# Make splits
splits = text_splitter.split_documents(blog_docs)

# Index
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
vectorstore = Chroma.from_documents(documents=splits, 
                                    embedding=OpenAIEmbeddings())

retriever = vectorstore.as_retriever()


Using prompts

In [3]:
from langchain.prompts import ChatPromptTemplate

# Multi Query: Different Perspectives
template = """You are an AI language model assistant. Your task is to generate five 
different versions of the given user question to retrieve relevant documents from a vector 
database. By generating multiple perspectives on the user question, your goal is to help
the user overcome some of the limitations of the distance-based similarity search. 
Provide these alternative questions separated by newlines. Original question: {question}"""
prompt_perspectives = ChatPromptTemplate.from_template(template)

from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4-turbo", temperature=0.0)

generate_queries = (
    prompt_perspectives 
    | llm
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

In [4]:
from IPython.display import display, Markdown

generate_queries.invoke({"question": "What is task decomposition for LLM agents?"})

['How do LLM agents break down complex tasks into simpler components?',
 'What does task decomposition entail in the context of large language models?',
 'Can you explain the process of task decomposition in large language model agents?',
 'What are the methods used by LLM agents for decomposing tasks?',
 'How do large language models manage task decomposition?']

In [5]:
from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return [loads(doc) for doc in unique_docs]

# Retrieve
question = "What is task decomposition for LLM agents?"
retrieval_chain = generate_queries | retriever.map() | get_unique_union

In [6]:
docs = retrieval_chain.invoke({"question": question})
len(docs)

  warn_beta(


9

In [7]:
from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

# llm = ChatOpenAI(temperature=0)

final_rag_chain = (
    {"context": retrieval_chain, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

In [8]:
display(Markdown(final_rag_chain.invoke({"question":question})))


Task decomposition for LLM (Large Language Model) agents refers to the process of breaking down a complex task into smaller, more manageable subtasks. This technique enhances the performance of LLMs on intricate tasks by allowing them to focus on simpler components one at a time. The concept is rooted in the "Chain of Thought" (CoT) prompting technique, which instructs the model to think step-by-step, thereby utilizing more test-time computation to dissect difficult tasks into simpler steps. This not only transforms large tasks into multiple manageable tasks but also provides insights into the model’s reasoning process.

Further extending this concept, the "Tree of Thoughts" approach, as mentioned in the provided documents, explores multiple reasoning possibilities at each step of task decomposition. It generates a tree structure by creating multiple thoughts per step, which can be navigated using search strategies like breadth-first search (BFS) or depth-first search (DFS). Each state in this tree can be evaluated using methods such as classifier prompts or majority votes.

Task decomposition can be implemented in various ways:
1. By using simple LLM prompting techniques, such as asking for "Steps for XYZ" or querying about the subgoals for achieving a specific objective.
2. By employing task-specific instructions, for example, instructing the model to "Write a story outline" for novel writing.
3. Through human inputs, where the decomposition is guided by direct human interaction or oversight.

This methodical breakdown not only aids in the systematic handling of tasks but also aligns with enhancing the interpretability and effectiveness of LLMs in performing complex tasks.

In [12]:
display(Markdown(final_rag_chain.invoke({"question":"Describe some key sections of the article in a pointwise manner."})))


Based on the provided documents, here are some key sections of the article described in a pointwise manner:

1. **ReAct Prompt Template**:
   - Introduces a structured format for Language Model (LLM) operations, which includes steps like Thought, Action, and Observation, repeated multiple times to guide the LLM's processing and response generation.

2. **Challenges in Building LLM-Centered Agents**:
   - Discusses common limitations encountered when developing agents centered around large language models, highlighting practical challenges in implementation and operation.

3. **Detailed Coding Instructions**:
   - Provides comprehensive guidelines for coding, including the layout of core classes, functions, and methods with comments on their purposes.
   - Specifies the format for outputting code in markdown blocks, ensuring that filenames and languages are appropriately formatted and that the code is fully functional and compatible across different files.

4. **Performance Evaluation**:
   - Outlines methods for continuous review and analysis of actions to ensure optimal performance.
   - Emphasizes the importance of self-criticism, reflection on past decisions, and efficient command execution to minimize resource usage.

5. **Resources for Agent Operation**:
   - Lists resources available to the agent, such as internet access, long-term memory management, and GPT-3.5 powered agents for task delegation.

6. **Experiments and Observations**:
   - Mentions specific experiments like those conducted in AlfWorld Env and HotpotQA, noting particular issues like hallucination and inefficient planning.

These sections collectively provide insights into the structure, challenges, and operational guidelines for building and managing LLM-centered agents, as well as evaluating their performance in practical scenarios.

In [14]:
display(Markdown(final_rag_chain.invoke({"question":"Describe some of the studies mentioned in this article done in the area of 'Generative Agents Simulation' . "})))


The article mentions a study titled "Generative Agents" by Park et al. (2023), which is a significant experiment in the area of Generative Agents Simulation. In this study, 25 virtual characters, each controlled by a large language model (LLM)-powered agent, interact within a sandbox environment inspired by The Sims. This simulation aims to create believable simulacra of human behavior for interactive applications.

The design of these generative agents integrates LLMs with memory, planning, and reflection mechanisms. This allows the agents to behave based on past experiences and interact with other agents. A key feature of this system is the memory stream, which is an external database functioning as a long-term memory module. It records a comprehensive list of the agents' experiences in natural language, capturing each observation or event directly provided by the agent. This memory stream enables the agents to recall past interactions and use this information to inform future behaviors and decisions.

Inter-agent communication within this simulation can trigger new natural language statements, enhancing the dynamic and interactive nature of the environment. This study showcases the potential of LLMs to simulate complex social interactions and behaviors in a controlled virtual setting.

### Using MultiQueryRetriever

Source: [Link](https://python.langchain.com/docs/modules/data_connection/retrievers/MultiQueryRetriever/)

In [16]:
from langchain.retrievers.multi_query import MultiQueryRetriever
from langchain_openai import ChatOpenAI

question = "What are the approaches to Task Decomposition?"
llm = ChatOpenAI(model="gpt-4-turbo", temperature=0)
retriever_from_llm = MultiQueryRetriever.from_llm(
    retriever=vectorstore.as_retriever(), llm=llm
)

In [17]:
# Set logging for the queries
import logging

logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)

In [20]:
question = "What are the approaches to Task Decomposition in context of LLMs?"
unique_docs = retriever_from_llm.invoke(question)
len(unique_docs)

INFO:langchain.retrievers.multi_query:Generated queries: ['What methods are used for breaking down tasks in large language models?', '', 'How do large language models handle task decomposition?', '', 'Can you describe the strategies for task decomposition in large language models?']


10

In [21]:
# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

# llm = ChatOpenAI(temperature=0)

final_rag_chain = (
    {"context": retriever_from_llm, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

In [23]:
display(Markdown(final_rag_chain.invoke({"question": question})))

INFO:langchain.retrievers.multi_query:Generated queries: ['What methods are used for breaking down tasks in large language models?', '', 'How do large language models handle task decomposition?', '', 'Can you describe the strategies for task decomposition in large language models?']


The approaches to task decomposition in the context of Large Language Models (LLMs) as described in the provided documents include:

1. **Chain of Thought (CoT)**: Introduced by Wei et al. in 2022, this technique involves instructing the model to "think step by step." This method allows the LLM to use more test-time computation to break down complex tasks into smaller, more manageable steps. The CoT approach helps in transforming large tasks into multiple manageable tasks and provides insights into the model’s reasoning process.

2. **Tree of Thoughts**: An extension of the CoT approach, developed by Yao et al. in 2023, which explores multiple reasoning possibilities at each step. It starts by decomposing the problem into multiple thought steps and then generates multiple thoughts per step, creating a tree structure. The search process within this structure can be conducted using either breadth-first search (BFS) or depth-first search (DFS), with each state evaluated by a classifier (via a prompt) or by majority vote.

3. **Simple Prompting Techniques**: These involve using straightforward prompts to guide the LLM in decomposing tasks. Examples include prompts like "Steps for XYZ.\n1.", "What are the subgoals for achieving XYZ?", or task-specific instructions such as "Write a story outline." for writing a novel.

4. **Human Inputs**: Involving human interaction to guide or adjust the task decomposition process, ensuring that the decomposition aligns with human understanding and requirements.

These approaches highlight the versatility and adaptability of LLMs in handling complex tasks by breaking them down into simpler, more digestible components, either autonomously or with human guidance.

### Idea 2: RAG Fusion

This is very similar to multi-query, the only difference is that there is an extra re-ranking step added for all the retrieved documents instead of simple union.

![RAG Fusion](rag_part_2_rag_fusion.png)

Docs: [Link](https://github.com/langchain-ai/langchain/blob/master/cookbook/rag_fusion.ipynb?ref=blog.langchain.dev)

In [9]:
from langchain.prompts import ChatPromptTemplate

# RAG-Fusion: Related
template = """You are a helpful assistant that generates multiple search queries based on a single input query. \n
Generate multiple search queries related to: {question} \n
Output (4 queries):"""
prompt_rag_fusion = ChatPromptTemplate.from_template(template)

In [10]:
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4-turbo", temperature=0.0)

generate_queries = (
    prompt_rag_fusion 
    | llm
    | StrOutputParser() 
    | (lambda x: x.split("\n"))
)

In [11]:
question = "What are the approaches to Task Decomposition in context of LLMs?"
generate_queries.invoke({"question": question})

['1. "Overview of task decomposition strategies in large language models"',
 '2. "How do large language models handle task decomposition?"',
 '3. "Examples of task decomposition in AI language processing"',
 '4. "Effective task decomposition techniques for LLMs"']

In [18]:
from langchain.load import dumps, loads

def reciprocal_rank_fusion(results: list[list], k=60):
    """ Reciprocal_rank_fusion that takes multiple lists of ranked documents 
        and an optional parameter k used in the RRF formula """
    
    # Initialize a dictionary to hold fused scores for each unique document
    fused_scores = {}

    # Iterate through each list of ranked documents
    for docs in results:
        # Iterate through each document in the list, with its rank (position in the list)
        for rank, doc in enumerate(docs):
            # Convert the document to a string format to use as a key (assumes documents can be serialized to JSON)
            doc_str = dumps(doc)
            # If the document is not yet in the fused_scores dictionary, add it with an initial score of 0
            if doc_str not in fused_scores:
                fused_scores[doc_str] = 0
            # Retrieve the current score of the document, if any
            previous_score = fused_scores[doc_str]
            # Update the score of the document using the RRF formula: 1 / (rank + k)
            fused_scores[doc_str] += 1 / (rank + k)

    # Sort the documents based on their fused scores in descending order to get the final reranked results
    reranked_results = [
        loads(doc)
        for doc, score in sorted(fused_scores.items(), key=lambda x: x[1], reverse=True)
    ]

    # Return the reranked results as a list of tuples, each containing the document and its fused score
    return reranked_results

In [15]:
def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return [loads(doc) for doc in unique_docs]

In [19]:
retrieval_chain_rag_fusion = generate_queries | retriever.map() | reciprocal_rank_fusion
retrieval_chain_mq_union = generate_queries | retriever.map() | get_unique_union

In [20]:
docs_rag_fusion = retrieval_chain_rag_fusion.invoke({"question": question})
docs_mq_union = retrieval_chain_mq_union.invoke({"question": question})

In [22]:
from operator import itemgetter
from langchain_core.runnables import RunnablePassthrough

# RAG
template = """Answer the following question based on this context:

{context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)

In [23]:
final_rag_chain_rag_fusion = (
    {"context": retrieval_chain_rag_fusion, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

final_rag_chain_mq_union = (
    {"context": retrieval_chain_mq_union, 
     "question": itemgetter("question")} 
    | prompt
    | llm
    | StrOutputParser()
)

In [24]:
display(Markdown(final_rag_chain_rag_fusion.invoke({"question": question})))

The approaches to task decomposition in the context of Large Language Models (LLMs) as described in the provided documents include:

1. **Chain of Thought (CoT)**: Introduced by Wei et al. in 2022, this technique involves prompting the model to "think step by step." This method allows the model to use more test-time computation to break down complex tasks into smaller, simpler steps. The CoT approach helps in transforming large tasks into multiple manageable tasks and provides insights into the model’s reasoning process.

2. **Tree of Thoughts**: An extension of the Chain of Thought, developed by Yao et al. in 2023. This method further explores multiple reasoning possibilities at each step of the task decomposition. It starts by breaking the problem into multiple thought steps and then generates multiple thoughts per step, creating a tree structure. The search process within this structure can be conducted using either breadth-first search (BFS) or depth-first search (DFS), with each state evaluated by a classifier (prompted) or by majority vote.

3. **Simple Prompting Techniques**: These involve using straightforward prompts to guide the LLM in decomposing tasks. Examples include prompts like "Steps for XYZ.\n1." or "What are the subgoals for achieving XYZ?" These prompts are designed to elicit a structured breakdown of tasks directly from the LLM.

4. **Task-Specific Instructions**: This approach uses instructions tailored to specific types of tasks to aid in decomposition. For example, instructing an LLM to "Write a story outline." for writing a novel helps in structuring the task into manageable components.

5. **Human Inputs**: Involving human participation in the task decomposition process. This can include humans providing initial inputs or corrections to the decomposed tasks suggested by the LLM, ensuring that the decomposition aligns with human understanding and requirements.

These approaches highlight the versatility and adaptability of LLMs in handling complex tasks by breaking them down into more manageable sub-tasks, thereby enhancing their performance and applicability in various domains.

In [25]:
display(Markdown(final_rag_chain_mq_union.invoke({"question": question})))

The approaches to task decomposition in the context of Large Language Models (LLMs) as described in the provided documents include:

1. **Chain of Thought (CoT)**: This technique, as mentioned by Wei et al. in 2022, involves instructing the model to "think step by step." This method allows the model to use more test-time computation to break down complex tasks into smaller, more manageable steps. The CoT approach transforms large tasks into multiple manageable tasks and provides insights into the model’s thinking process.

2. **Tree of Thoughts**: An extension of the Chain of Thought, developed by Yao et al. in 2023, this method explores multiple reasoning possibilities at each step of the task decomposition. It starts by breaking the problem into multiple thought steps and then generates multiple thoughts per step, creating a tree structure. The search process within this structure can be conducted using either breadth-first search (BFS) or depth-first search (DFS), with each state evaluated by a classifier (via a prompt) or by majority vote.

3. **Task-Specific Instructions**: This approach involves using specific instructions tailored to particular types of tasks. For example, instructing an LLM to "Write a story outline" when the task is to write a novel. This method relies on the ability of the LLM to understand and generate responses based on the specific instructions provided.

4. **Human Inputs**: Involving human inputs in the task decomposition process allows for a more guided and potentially accurate breakdown of tasks. This can be particularly useful in complex scenarios where human expertise can significantly enhance the model's performance and the accuracy of the task decomposition.

These approaches highlight the versatility and adaptability of LLMs in handling complex tasks by breaking them down into simpler, more manageable components, thereby enhancing their effectiveness and efficiency in various applications.

In [26]:
question

'What are the approaches to Task Decomposition in context of LLMs?'