## Refrence - https://learn.deeplearning.ai/courses/building-agentic-rag-with-llamaindex/lesson/5/building-a-multi-document-agent

In [1]:
import os
from dotenv import load_dotenv
import nest_asyncio
nest_asyncio.apply()
load_dotenv()
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY')


In [2]:
#importing get_doc_tools from Utils_FN_QE_tool.py
# from Utils_FN_QE_tool import get_doc_tools


from llama_index.core import SimpleDirectoryReader, VectorStoreIndex, SummaryIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core.tools import FunctionTool, QueryEngineTool
from llama_index.core.vector_stores import MetadataFilters, FilterCondition
from typing import List, Optional

def get_doc_tools(
    file_path: str,
    name: str,
) -> str:
    """Get vector query and summary query tools from a document."""

    # load documents
    documents = SimpleDirectoryReader(input_files=[file_path]).load_data()
    splitter = SentenceSplitter(chunk_size=1024)
    nodes = splitter.get_nodes_from_documents(documents)
    vector_index = VectorStoreIndex(nodes)
    
    def vector_query(
        query: str, 
        page_numbers: Optional[List[str]] = None
    ) -> str:
        """Use to answer questions over the MetaGPT paper.
    
        Useful if you have specific questions over the MetaGPT paper.
        Always leave page_numbers as None UNLESS there is a specific page you want to search for.
    
        Args:
            query (str): the string query to be embedded.
            page_numbers (Optional[List[str]]): Filter by set of pages. Leave as NONE 
                if we want to perform a vector search
                over all pages. Otherwise, filter by the set of specified pages.
        
        """
    
        page_numbers = page_numbers or []
        metadata_dicts = [
            {"key": "page_label", "value": p} for p in page_numbers
        ]
        
        query_engine = vector_index.as_query_engine(
            similarity_top_k=2,
            filters=MetadataFilters.from_dicts(
                metadata_dicts,
                condition=FilterCondition.OR
            )
        )
        response = query_engine.query(query)
        return response
        
    
    vector_query_tool = FunctionTool.from_defaults(
        name=f"vector_tool_{name}",
        fn=vector_query
    )
    
    summary_index = SummaryIndex(nodes)
    summary_query_engine = summary_index.as_query_engine(
        response_mode="tree_summarize",
        use_async=True,
    )
    summary_tool = QueryEngineTool.from_defaults(
        name=f"summary_tool_{name}",
        query_engine=summary_query_engine,
        description=(
            "Use ONLY IF you want to get a holistic summary of MetaGPT. "
            "Do NOT use if you have specific questions over MetaGPT."
        ),
    )

    return vector_query_tool, summary_tool

In [4]:
from pathlib import Path

papers = [
    "./2_ Multidoc_Agentic_RAG/data/metagpt.pdf",    
    "./2_ Multidoc_Agentic_RAG/data/longlora.pdf",
    "./2_ Multidoc_Agentic_RAG/data/selfrag.pdf",
]

Reading all the papers and retruning for each paper their vectorindex and summary index

In [6]:
paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: ./2_ Multidoc_Agentic_RAG/data/metagpt.pdf


Getting tools for paper: ./2_ Multidoc_Agentic_RAG/data/longlora.pdf
Getting tools for paper: ./2_ Multidoc_Agentic_RAG/data/selfrag.pdf


In [10]:
initial_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]


In [11]:
from llama_index.llms.openai import OpenAI

from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

llm = OpenAI(model="gpt-3.5-turbo")
agent_worker = FunctionCallingAgentWorker.from_tools(
    initial_tools, 
    llm=llm, 
    verbose=True
)
agent = AgentRunner(agent_worker)

In [12]:
response = agent.query(
    "Tell me about the evaluation dataset used in LongLoRA, "
    "and then tell me about the evaluation results"
)

Added user message to memory: Tell me about the evaluation dataset used in LongLoRA, and then tell me about the evaluation results
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "evaluation dataset"}
=== Function Output ===
The evaluation dataset used in the experiments is the Redpajama dataset for training and the PG19 dataset for evaluation. Additionally, the cleaned Arxiv Math proof-pile dataset was also used for evaluation purposes.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "evaluation results"}
=== Function Output ===
The evaluation results indicate that the LongLoRA method effectively extends the context length of large language models while maintaining minimal accuracy compromise. It shows improved perplexity with longer context sizes, demonstrating its effectiveness. The method can fine-tune models to significantly larger context lengths, such as extending Llama2 7B to 100k context length and 70B

In [13]:
response = agent.query("Give me a summary of both Self-RAG and LongLoRA")
print(str(response))

Added user message to memory: Give me a summary of both Self-RAG and LongLoRA
=== Calling Function ===
Calling function: summary_tool_selfrag with args: {"input": "Self-RAG"}
=== Function Output ===
Self-RAG is a framework designed to improve the quality and accuracy of large language models (LLMs) by incorporating retrieval mechanisms and self-assessment. It involves training a Critic LM and a Generator LM to evaluate text using reflection tokens, ensuring that the generated responses are factually grounded and aligned with the predicted reflection tokens. This method outperforms traditional LLMs with more parameters and conventional retrieval-augmented generation approaches, as it enables the LM to adapt its behavior to various task requirements and produce informative and supported text outputs.
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA"}
=== Function Output ===
LongLoRA is an efficient fine-tuning approach that extends the conte

## 2. Setup an agent over 11 papers

In [14]:
papers = [
    "./2_ Multidoc_Agentic_RAG/data/metagpt.pdf",
    "./2_ Multidoc_Agentic_RAG/data/longlora.pdf",
    "./2_ Multidoc_Agentic_RAG/data/loftq.pdf",
    "./2_ Multidoc_Agentic_RAG/data/swebench.pdf",
    "./2_ Multidoc_Agentic_RAG/data/selfrag.pdf",
    "./2_ Multidoc_Agentic_RAG/data/zipformer.pdf",
    "./2_ Multidoc_Agentic_RAG/data/values.pdf",
    "./2_ Multidoc_Agentic_RAG/data/finetune_fair_diffusion.pdf",
    "./2_ Multidoc_Agentic_RAG/data/knowledge_card.pdf",
    "./2_ Multidoc_Agentic_RAG/data/metra.pdf",
    "./2_ Multidoc_Agentic_RAG/data/vr_mcl.pdf"
]

In [15]:
paper_to_tools_dict = {}
for paper in papers:
    print(f"Getting tools for paper: {paper}")
    vector_tool, summary_tool = get_doc_tools(paper, Path(paper).stem)
    paper_to_tools_dict[paper] = [vector_tool, summary_tool]

Getting tools for paper: ./2_ Multidoc_Agentic_RAG/data/metagpt.pdf
Getting tools for paper: ./2_ Multidoc_Agentic_RAG/data/longlora.pdf
Getting tools for paper: ./2_ Multidoc_Agentic_RAG/data/loftq.pdf
Getting tools for paper: ./2_ Multidoc_Agentic_RAG/data/swebench.pdf
Getting tools for paper: ./2_ Multidoc_Agentic_RAG/data/selfrag.pdf
Getting tools for paper: ./2_ Multidoc_Agentic_RAG/data/zipformer.pdf
Getting tools for paper: ./2_ Multidoc_Agentic_RAG/data/values.pdf
Getting tools for paper: ./2_ Multidoc_Agentic_RAG/data/finetune_fair_diffusion.pdf
Getting tools for paper: ./2_ Multidoc_Agentic_RAG/data/knowledge_card.pdf
Getting tools for paper: ./2_ Multidoc_Agentic_RAG/data/metra.pdf
Getting tools for paper: ./2_ Multidoc_Agentic_RAG/data/vr_mcl.pdf


### Extend the Agent with Tool Retrieval

In [18]:
all_tools = [t for paper in papers for t in paper_to_tools_dict[paper]]
len(all_tools)

22

Now the problem with many large document is as the no of dcoument increase your llm context window may not be able to fit the name of the tools , hence may lead to wrong selection of tools.
Solution- We do RAG for tool selection and then pass the relevant tool for calling to the llm
- We need to index the toll -since these tools are python object we need to serialze them and convert them into string representation- this is done using "Object index" in llamaindex


In [17]:
# define an "object" index and retriever over these tools
from llama_index.core import VectorStoreIndex
from llama_index.core.objects import ObjectIndex

obj_index = ObjectIndex.from_objects(
    all_tools,
    index_cls=VectorStoreIndex,
)

In [19]:
obj_retriever = obj_index.as_retriever(similarity_top_k=3)

In [26]:
tools = obj_retriever.retrieve(
    "Tell me about the eval dataset used in MetaGPT and SWE-Bench"
)

In [27]:
tools[2].metadata

ToolMetadata(description='Use ONLY IF you want to get a holistic summary of MetaGPT. Do NOT use if you have specific questions over MetaGPT.', name='summary_tool_values', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [28]:
tools[0].metadata

ToolMetadata(description='Use ONLY IF you want to get a holistic summary of MetaGPT. Do NOT use if you have specific questions over MetaGPT.', name='summary_tool_swebench', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [29]:
tools[1].metadata

ToolMetadata(description='Use ONLY IF you want to get a holistic summary of MetaGPT. Do NOT use if you have specific questions over MetaGPT.', name='summary_tool_metra', fn_schema=<class 'llama_index.core.tools.types.DefaultToolFnSchema'>, return_direct=False)

In [30]:
from llama_index.core.agent import FunctionCallingAgentWorker
from llama_index.core.agent import AgentRunner

agent_worker = FunctionCallingAgentWorker.from_tools(
    tool_retriever=obj_retriever,
    llm=llm, 
    system_prompt=""" \
You are an agent designed to answer queries over a set of given papers.
Please always use the tools provided to answer a question. Do not rely on prior knowledge.\

""",
    verbose=True
)
agent = AgentRunner(agent_worker)

In [31]:
response = agent.query(
    "Tell me about the evaluation dataset used "
    "in MetaGPT and compare it against SWE-Bench"
)
print(str(response))

Added user message to memory: Tell me about the evaluation dataset used in MetaGPT and compare it against SWE-Bench
=== Calling Function ===
Calling function: summary_tool_metra with args: {"input": "evaluation dataset used in MetaGPT"}
=== Function Output ===
The evaluation dataset used in MetaGPT is not explicitly mentioned in the provided context information.
=== Calling Function ===
Calling function: summary_tool_swebench with args: {"input": "evaluation dataset used in SWE-Bench"}
=== Function Output ===
The evaluation dataset used in SWE-Bench is constructed by collecting pull requests from the top 100 most downloaded PyPI libraries in August 2023. It consists of task instances derived from merged pull requests that resolve issues in the repositories and introduce new tests. Each task instance includes information about the codebase, problem statements, test patches, and gold patches. The dataset is continuously updated with new task instances based on PRs created after the train

In [32]:
response = agent.query(
    "Compare and contrast the LoRA papers (LongLoRA, LoftQ). "
    "Analyze the approach in each paper first. "
)

Added user message to memory: Compare and contrast the LoRA papers (LongLoRA, LoftQ). Analyze the approach in each paper first. 
=== Calling Function ===
Calling function: summary_tool_longlora with args: {"input": "LongLoRA paper"}
=== Function Output ===
The LongLoRA paper presents an efficient fine-tuning approach for extending the context lengths of large language models by combining improved LoRA with shifted sparse attention (S2-Attn). It introduces an Action Units Relation Learning framework that includes the Action Units Relation Transformer (ART) and the Tampered AU Prediction (TAP) components. The ART encoder models relations between different facial action units at AU-agnostic patches to aid in forgery detection, while TAP tampers with AU-related regions to provide Local Tampering Supervision. The paper achieves state-of-the-art performance on cross-dataset and cross-manipulation evaluations, offers qualitative visualizations of tampered regions, and proposes an effective fr