# File-level and Chunk-Level Retrieval with LlamaCloud and Workflows

In this notebook we will show you how to perform file-level and chunk-level retrieval with LlamaCloud using a custom router query engine and a custom agent built with [Workflows](https://docs.llamaindex.ai/en/latest/module_guides/workflow/).

File-level retrieval is useful for handling user questions that require the entire document context to properly answer the question. Since only doing file-level retrieval can be slow + expensive, we also show you how to build an agent that can dynamically decide whether to do file-level or chunk-level retrieval! 

## Setup

Install LlamaIndex, apply nest_asyncio, and set up your OpenAI API key.

In [None]:
%pip install llama-index llama-index-indices-managed-llama-cloud

In [1]:
import nest_asyncio
nest_asyncio.apply()

In [2]:
import os
os.environ["OPENAI_API_KEY"] = "<Your OpenAI API Key>"

## Load Documents into LlamaCloud

The first order of business is to download the 5 Apple and Tesla 10Ks and upload them into LlamaCloud.

You can easily do this by creating a pipeline and uploading docs via the "Files" mode.

After this is done, proceed to the next section.

In [None]:
# download Apple 
!wget "https://s2.q4cdn.com/470004039/files/doc_earnings/2023/q4/filing/_10-K-Q4-2023-As-Filed.pdf" -O data/apple_2023.pdf
!wget "https://s2.q4cdn.com/470004039/files/doc_financials/2022/q4/_10-K-2022-(As-Filed).pdf" -O data/apple_2022.pdf
!wget "https://s2.q4cdn.com/470004039/files/doc_financials/2021/q4/_10-K-2021-(As-Filed).pdf" -O data/apple_2021.pdf
!wget "https://s2.q4cdn.com/470004039/files/doc_financials/2020/ar/_10-K-2020-(As-Filed).pdf" -O data/apple_2020.pdf
!wget "https://www.dropbox.com/scl/fi/i6vk884ggtq382mu3whfz/apple_2019_10k.pdf?rlkey=eudxh3muxh7kop43ov4bgaj5i&dl=1" -O data/apple_2019.pdf

# download Tesla
!wget "https://ir.tesla.com/_flysystem/s3/sec/000162828024002390/tsla-20231231-gen.pdf" -O data/tesla_2023.pdf
!wget "https://ir.tesla.com/_flysystem/s3/sec/000095017023001409/tsla-20221231-gen.pdf" -O data/tesla_2022.pdf
!wget "https://www.dropbox.com/scl/fi/ptk83fmye7lqr7pz9r6dm/tesla_2021_10k.pdf?rlkey=24kxixeajbw9nru1sd6tg3bye&dl=1" -O data/tesla_2021.pdf
!wget "https://ir.tesla.com/_flysystem/s3/sec/000156459021004599/tsla-10k_20201231-gen.pdf" -O data/tesla_2020.pdf
!wget "https://ir.tesla.com/_flysystem/s3/sec/000156459020004475/tsla-10k_20191231-gen_0.pdf" -O data/tesla_2019.pdf

## Helper Classes

We define the `Answer` model, which is a model that stores whether to pick chunk-level retrieval or document-level retrieval, along with a reason for that choice. We will let the LLM choose given a query string, and we will ask the LLM to produce a JSON output that can be parsed by a Pydantic model.

We will define the `RouterOutputParser` helper class, which parses the output from the LLM into a list of `Answer` models, which is then put into the `Answers` model that contains a list of `Answer`s.

In [3]:
import json

from llama_index.core.bridge.pydantic import BaseModel
from typing import List
from llama_index.core.types import BaseOutputParser
from llama_index.core import PromptTemplate

# tells LLM to select choices given a list
ROUTER_PRMOPT = PromptTemplate(
    "Some choices are given below. It is provided in a numbered list (1 to"
    " {num_choices}), where each item in the list corresponds to a"
    " summary.\n---------------------\n{context_list}\n---------------------\nUsing"
    " only the choices above and not prior knowledge, return the top choices"
    " (no more than {max_outputs}, but only select what is needed) that are"
    " most relevant to the question: '{query_str}'\n"
)

# tells LLM to format list of choices in a certain way
FORMAT_STR = """The output should be formatted as a JSON instance that conforms to 
the JSON schema below. 

Here is the output schema:
{
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "choice": {
        "type": "integer"
      },
      "reason": {
        "type": "string"
      }
    },
    "required": [
      "choice",
      "reason"
    ],
    "additionalProperties": false
  }
}
"""

class Answer(BaseModel):
    """Answer model."""

    choice: int
    reason: str


class Answers(BaseModel):
    """List of answers model."""

    answers: List[Answer]

class RouterOutputParser(BaseOutputParser):
    """Custom output parser."""

    def _escape_curly_braces(self, input_string: str):
        """Escape the brackets in the format string so contents are not treated as variables."""

        return input_string.replace("{", "{{").replace("}", "}}")

    def _marshal_output_to_json(self, output: str):
        """Find JSON string within response."""

        output = output.strip()
        left = output.find("[")
        right = output.find("]")
        output = output[left : right + 1]
        return output

    def parse(self, output: str) -> Answers:
        """Parse string"""

        json_output = self._marshal_output_to_json(output)
        json_dicts = json.loads(json_output)
        answers = [Answer.parse_obj(json_dict) for json_dict in json_dicts]
        return Answers(answers=answers)
    
    def format(self, query: str) -> str:
        return query + "\n\n" + self._escape_curly_braces(FORMAT_STR)

## Router Query Workflow

In the code snippet below, we define a router query workflow. This workflow requires 2 events: a `ChooseQueryEngineEvent`, which chooses the document-level or chunk-retrieval query engine, and `SynthesizeAnswersEvent`, which contains the results from the query engines and synthesizes a final response.

The workflow consists of the following steps:
1. Choosing the query engine(s) by passing the prompt and output parser defined above into an LLM. Both query engines can be chosen if the LLM thinks both query engines (defined in `choose_query_engine()`).
2. Queries the engines chosen by the LLM in the previous step (defined in `query_each_engine`).
3. Synthesizes a final response given the results from the queries above (defined in `synthesize_response()`).

In [4]:
from typing import List, Optional, Any

from llama_index.core.query_engine import (
    BaseQueryEngine,
    RetrieverQueryEngine,
)
from llama_index.core import PromptTemplate
from llama_index.llms.openai import OpenAI
from llama_index.core.llms import LLM
from llama_index.core.response_synthesizers import TreeSummarize
from llama_index.core.program import LLMTextCompletionProgram
from llama_index.core.workflow import (
    Workflow,
    Event,
    StartEvent,
    StopEvent,
    step,
)

class ChooseQueryEngineEvent(Event):
    """Query engine event."""

    answers: Answers
    query_str: str

class SynthesizeAnswersEvent(Event):
    """Synthesize answers event."""

    responses: List[Any]
    query_str: str


class RouterQueryWorkflow(Workflow):
    """Router query workflow."""

    def __init__(
        self,
        query_engines: List[BaseQueryEngine],
        choice_descriptions: List[str],
        router_prompt: PromptTemplate,
        timeout: Optional[float] = 10.0,
        disable_validation: bool = False,
        verbose: bool = False,
        llm: Optional[LLM] = None,
        summarizer: Optional[TreeSummarize] = None,
    ):
        """Constructor"""

        super().__init__(timeout=timeout, disable_validation=disable_validation, verbose=verbose)

        self.query_engines: List[BaseQueryEngine] = query_engines
        self.choice_descriptions: List[str] = choice_descriptions
        self.router_prompt: PromptTemplate = router_prompt
        self.llm: LLM = llm or OpenAI(temperature=0, model="gpt-4o")
        self.summarizer: TreeSummarize = summarizer or TreeSummarize()

    def _get_choice_str(self, choices):
        """String of choices to feed into LLM."""

        choices_str = "\n\n".join([f"{idx+1}. {c}" for idx, c in enumerate(choices)])
        return choices_str
    
    async def _query(self, query_str: str, choice_idx: int):
        """Query using query engine"""

        query_engine = self.query_engines[choice_idx]
        return await query_engine.aquery(query_str)

    
    @step()
    async def choose_query_engine(self, ev: StartEvent) -> ChooseQueryEngineEvent:
        """Choose query engine."""

        # get query str
        query_str = ev.get("query_str")
        if query_str is None:
            raise ValueError("'query_str' is required.")
        
        # partially format prompt with number of choices and max outputs
        router_prompt1 = self.router_prompt.partial_format(
            num_choices=len(self.choice_descriptions),
            max_outputs=len(self.choice_descriptions),
        )

        # get structured output of answers
        program = LLMTextCompletionProgram.from_defaults(
            output_cls=Answers,
            prompt=router_prompt1,
            verbose=self._verbose,
            llm=self.llm,
            output_parser=RouterOutputParser()
        )
        # get choices selected by LLM
        choices_str = self._get_choice_str(self.choice_descriptions)
        output: Answers = program(context_list=choices_str, query_str=query_str)

        if self._verbose:
            print(f"Selected choice(s):")
            for answer in output.answers:
                print(f"Choice: {answer.choice}, Reason: {answer.reason}")
        
        return ChooseQueryEngineEvent(answers=output, query_str=query_str)
            
    @step()
    async def query_each_engine(self, ev: ChooseQueryEngineEvent) -> SynthesizeAnswersEvent:
        """Query each engine."""

        query_str = ev.query_str
        answers = ev.answers

        # query using corresponding query engine given in Answers list
        responses = []

        for answer in answers.answers:
            choice_idx = answer.choice - 1
            response = await self._query(query_str, choice_idx)
            responses.append(response)
        
        return SynthesizeAnswersEvent(responses=responses, query_str=query_str)
    
    @step()
    async def synthesize_response(self, ev: SynthesizeAnswersEvent) -> StopEvent:
        """Synthesizes response."""

        responses = ev.responses
        query_str = ev.query_str

        # get result of responses
        if len(responses) == 1:
            return StopEvent(result=responses[0])
        else:
            response_strs = [str(r) for r in responses]
            result_response = self.summarizer.get_response(query_str, response_strs)
            return StopEvent(result=result_response)


## Define LlamaCloud Retriever over documents

We'll define an instance of `LLamaCloudIndex`, which will allow us to access the indexed docs stored on LlamaCloud. We define two separate retrievers for this index: a file-level retriever and a chunk-level retriever. We create two query engines from these retrievers.

After this, we give a description for what each retriever does to allow the LLM to know which one to pick. We'll define our router workflow based on the two query engines and descriptions.

In [5]:
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex

index = LlamaCloudIndex(
    name="<Your Index Name>",
    project_name="<Your Project Name>",
    api_key="<Your API Key>",
    organization_id="<Your Org ID>",
)
llm = OpenAI("gpt-4o")

doc_retriever = index.as_retriever(retrieval_mode="files_via_content", files_top_k=1)
query_engine_doc = RetrieverQueryEngine.from_args(
    doc_retriever, llm=llm, response_mode="tree_summarize"
)

chunk_retriever = index.as_retriever(retrieval_mode="chunks", rerank_top_n=10)
query_engine_chunk = RetrieverQueryEngine.from_args(
    chunk_retriever, llm=llm, response_mode="tree_summarize"
)

DOC_METADATA_EXTRA_STR = """\
Each document represents a complete 10K report for a given year (e.g. Apple in 2019).
Here's an example of relevant documents:
1. apple_2019.pdf
2. tesla_2020.pdf
"""

TOOL_DOC_DESC = f"""\
Synthesizes an answer to your question by feeding in an entire relevant document as context. Best used for higher-level summarization options.
Do NOT use if answer can be found in a specific chunk of a given document. Use the chunk_query_engine instead for that purpose.

Below we give details on the format of each document:
{DOC_METADATA_EXTRA_STR}
"""

TOOL_CHUNK_DESC = f"""\
Synthesizes an answer to your question by feeding in a relevant chunk as context. Best used for questions that are more pointed in nature.
Do NOT use if the question asks seems to require a general summary of any given document. Use the doc_query_engine instead for that purpose.

Below we give details on the format of each document:
{DOC_METADATA_EXTRA_STR}
"""

router_query_workflow = RouterQueryWorkflow(
    query_engines=[query_engine_doc, query_engine_chunk],
    choice_descriptions=[TOOL_DOC_DESC, TOOL_CHUNK_DESC],
    verbose=True,
    llm=llm,
    router_prompt=ROUTER_PRMOPT,
    timeout=60
)

After defining our router query workflow, we'll create a query engine wrapper around this workflow, and we'll define a query engine tool around this wrapper to pass into an agent.

In [6]:
from llama_index.core.query_engine import CustomQueryEngine
from llama_index.core.async_utils import asyncio_run
from llama_index.core.tools import QueryEngineTool, ToolMetadata

class RouterQueryEngine(CustomQueryEngine):
    """Router query engine (for tool usage)."""

    router_query_workflow: RouterQueryWorkflow

    def custom_query(self, query_str: str):
        """Query."""
        
        return asyncio_run(router_query_workflow.run(query_str=query_str))
    
router_query_engine = RouterQueryEngine(router_query_workflow=router_query_workflow)
router_query_engine_tool = QueryEngineTool(router_query_engine, metadata=ToolMetadata(
    name="Query_engine",
    description="Queries 10k reports for a given year."
))

## Creating an Agent Around the Query Engine

We'll create a workflow that acts as an agent around the router query engine. In this workflow, we need four events:
1. `GatherToolsEvent`: Gets all tools that need to be called (which is determined by the LLM).
2. `ToolCallEvent`: An individual tool call. Multiple of these events will be triggered at the same time.
3. `ToolCallEventResult`: Gets result from a tool call.
4. `GatherEvent`: Returned from dispatcher that triggers the `ToolCallEvent`.

This workflow consists of the following steps:
1. `chat()`: Appends the message to the chat history. This chat history is fed into the LLM, along with the given tools, and the LLM determines which tools to call. This returns a `GatherToolsEvent`.
2. `dispatch_calls()`: Triggers a `ToolCallEvent` for each tool call given in the `GatherToolsEvent` using `send_event()`. Returns a `GatherEvent` with the number of tool calls.
3. `call_tool()`: Calls an individual tool. This step will run multiple times if there is more than one tool call. This step calls the tool and appends the result as a chat message to the chat history. It returns a `ToolCallEventResult` with the result of the tool call.
4. `gather()`: Gathers the results from all tool calls using `collect_events()`. Waits for all tool calls to finish, then feeds chat history (following all tool calls) into the LLM. Returns the response from the LLM.

In [7]:
from typing import Dict

from llama_index.core.tools import BaseTool
from llama_index.core.llms import ChatMessage
from llama_index.core.workflow import Context
from openai.types.chat import ChatCompletionMessageToolCall

class GatherToolsEvent(Event):
    """Gather Tools Event"""

    tool_calls: Any
    message: str

class ToolCallEvent(Event):
    """Tool Call event"""

    tool_call: ChatCompletionMessageToolCall

class ToolCallEventResult(Event):
    """Tool call event result."""

    msg: ChatMessage

class GatherEvent(Event):
    """Gather event"""

    num_tool_calls: int

class RouterOutputAgentWorkflow(Workflow):
    """Custom router output agent workflow."""

    def __init__(self,
        tools: List[BaseTool],
        timeout: Optional[float] = 10.0,
        disable_validation: bool = False,
        verbose: bool = False,
        llm: Optional[LLM] = None,
        chat_history: Optional[List[ChatMessage]] = None,
    ):
        """Constructor."""

        super().__init__(timeout=timeout, disable_validation=disable_validation, verbose=verbose)

        self.tools: List[BaseTool] = tools
        self.tools_dict: Optional[Dict[str, BaseTool]] = {tool.metadata.name: tool for tool in self.tools}
        self.llm: LLM = llm or OpenAI(temperature=0, model="gpt-4o")
        self.chat_history: List[ChatMessage] = chat_history or []
    

    def reset(self) -> None:
        """Resets Chat History"""

        self.chat_history = []

    @step()
    async def chat(self, ev: StartEvent) -> GatherToolsEvent | StopEvent:
        """Appends msg to chat history, then gets tool calls."""

        message = ev.get("message")
        if message is None:
            raise ValueError("'message' field is required.")
        
        # add msg to chat history
        chat_history = self.chat_history
        chat_history.append(ChatMessage(role="user", content=message))

        # Put msg into LLM with tools included
        tools = [tool.metadata.to_openai_tool() for tool in self.tools]
        chat_res = await self.llm.achat(chat_history, tools=tools)
        ai_message = chat_res.message
        additional_kwargs = ai_message.additional_kwargs
        chat_history.append(ai_message)

        if self._verbose:
            print(f"Chat message: {ai_message.content}")

        # get tool calls
        tool_calls = additional_kwargs.get("tool_calls", None)

        # no tool calls, return chat message.
        if tool_calls is None:
            return StopEvent(result=ai_message.content)

        return GatherToolsEvent(tool_calls=tool_calls, message=message)

    @step()
    async def dispatch_calls(self, ev: GatherToolsEvent) -> ToolCallEvent | GatherEvent:
        """Dispatches calls."""

        tool_calls = ev.tool_calls

        # trigger tool call events
        for tool_call in tool_calls:
            self.send_event(ToolCallEvent(tool_call=tool_call))
        
        return GatherEvent(num_tool_calls=len(tool_calls))
    
    @step()
    async def call_tool(self, ev: ToolCallEvent) -> ToolCallEventResult:
        """Calls tool."""

        tool_call = ev.tool_call

        # get tool ID and function call
        id_ = tool_call.id
        function_call = tool_call.function

        if self._verbose:
            print(f"Calling function {function_call.name} with msg {function_call.arguments}")

        # call function and put result into a chat message
        tool = self.tools_dict[function_call.name]
        output = await tool.acall(**json.loads(function_call.arguments))
        msg = ChatMessage(
            name=function_call.name,
            content=str(output),
            role="tool",
            additional_kwargs={
                "tool_call_id": id_,
                "name": function_call.name
            }
        )

        # append chat message to history
        self.chat_history.append(msg)

        return ToolCallEventResult(msg=msg)
    
    @step(pass_context=True)
    async def gather(self, ctx: Context, ev: GatherEvent | ToolCallEventResult) -> StopEvent | None:
        """Gathers tool calls."""

        if isinstance(ev, GatherEvent):
            ctx.data["num_tool_calls"] = ev.num_tool_calls

        # wait for all tool call events to finish.
        events = ctx.collect_events(ev, [ToolCallEventResult] * ctx.data["num_tool_calls"])
        if not events:
            return None
        
        # after all tool calls finish, pass chat history into LLM
        # and return result.
        ai_message = self.llm.chat(self.chat_history).message
        self.chat_history.append(ai_message)

        return StopEvent(result=ai_message.content)
    

Creates an instance of the agent.

In [8]:
agent = RouterOutputAgentWorkflow(tools=[router_query_engine_tool], verbose=True, timeout=60)

## Example Queries

In [9]:
from IPython.display import display, Markdown

response = await agent.run(message="Tell me the revenue for Apple and Tesla in 2021.")
display(Markdown(response))

Running step chat
Chat message: None
Step chat produced event GatherToolsEvent
Running step dispatch_calls
Step dispatch_calls produced event GatherEvent
Running step gather
Step gather produced no event
Running step call_tool
Calling function Query_engine with msg {"input": "Apple 2021 revenue"}
Running step choose_query_engine
Selected choice(s):
Choice: 2, Reason: The question 'Apple 2021 revenue' is pointed in nature and likely can be answered by extracting specific information from a relevant chunk of the document. Therefore, using the chunk_query_engine is appropriate.
Step choose_query_engine produced event ChooseQueryEngineEvent
Running step query_each_engine
Step query_each_engine produced event SynthesizeAnswersEvent
Running step synthesize_response
Step synthesize_response produced event StopEvent
Step call_tool produced event ToolCallEventResult
Running step call_tool
Calling function Query_engine with msg {"input": "Tesla 2021 revenue"}
Running step gather
Step gather prod

In 2021, Apple reported total net sales of $365.8 billion, which included $297.4 billion from products and $68.4 billion from services. Tesla, on the other hand, reported total revenue of $53.82 billion for the same year.

In [10]:
response = await agent.run(message="Tell me the tailwinds for Apple and Tesla in 2021.")
display(Markdown(response))

Running step chat
Chat message: None
Step chat produced event GatherToolsEvent
Running step dispatch_calls
Step dispatch_calls produced event GatherEvent
Running step gather
Step gather produced no event
Running step call_tool
Calling function Query_engine with msg {"input": "Apple 2021 tailwinds"}
Running step choose_query_engine
Selected choice(s):
Choice: 2, Reason: The question 'Apple 2021 tailwinds' is pointed in nature and likely requires specific information from a particular section of the 2021 10K report. Therefore, using the chunk_query_engine to find the relevant chunk is the best approach.
Step choose_query_engine produced event ChooseQueryEngineEvent
Running step query_each_engine
Step query_each_engine produced event SynthesizeAnswersEvent
Running step synthesize_response
Step synthesize_response produced event StopEvent
Step call_tool produced event ToolCallEventResult
Running step call_tool
Calling function Query_engine with msg {"input": "Tesla 2021 tailwinds"}
Running

In 2021, both Apple and Tesla experienced several tailwinds that contributed to their strong performance:

### Apple:
1. **Higher Net Sales**: Increased sales of iPhone, Services, and Mac across various regions, including the Americas, Europe, Greater China, Japan, and the Rest of Asia Pacific.
2. **Favorable Currency Movements**: The favorable movement of foreign currencies relative to the U.S. dollar positively impacted net sales in Europe, Greater China, and the Rest of Asia Pacific.
3. **Increased Gross Margin**: Higher product volumes, a different product mix, and improved leverage contributed to an increase in gross margin. The strength in foreign currencies relative to the U.S. dollar also supported the increase in gross margin percentage.

### Tesla:
1. **Increased Revenues**: Total revenues rose by 71% compared to the prior year, driven by ramped-up production and expanded manufacturing capacity, enabling increased deliveries and deployments of their products.
2. **Improved Net Income**: Net income attributable to common stockholders saw a favorable change, increasing by $4.80 billion.
3. **Enhanced Cash Flows**: Cash flows provided by operating activities increased by $5.55 billion.
4. **Operational Efficiencies**: Focus on production and operational efficiencies.
5. **Market Trends**: Ongoing electrification of the automotive sector and increasing environmental awareness supported Tesla's growth.

These factors collectively helped both companies achieve significant financial and operational success in 2021.

In [11]:
response = await agent.run(message="How was apple doing generally in 2019?")
display(Markdown(response))

Running step chat
Chat message: None
Step chat produced event GatherToolsEvent
Running step dispatch_calls
Step dispatch_calls produced event GatherEvent
Running step gather
Step gather produced no event
Running step call_tool
Calling function Query_engine with msg {"input":"Apple 2019 performance"}
Running step choose_query_engine
Selected choice(s):
Choice: 1, Reason: The question 'Apple 2019 performance' requires a general summary of the entire document, which is best handled by synthesizing an answer using the entire relevant document as context.
Step choose_query_engine produced event ChooseQueryEngineEvent
Running step query_each_engine
Step query_each_engine produced event SynthesizeAnswersEvent
Running step synthesize_response
Step synthesize_response produced event StopEvent
Step call_tool produced event ToolCallEventResult
Running step gather
Step gather produced event StopEvent


In 2019, Apple Inc. experienced a mixed performance:

### Financial Highlights:
- **Total Net Sales**: $260.2 billion, a 2% decrease compared to 2018.
- **iPhone Sales**: Declined by 14% to $142.4 billion.
- **Wearables, Home and Accessories**: Increased by 41% to $24.5 billion.
- **Services Sales**: Rose by 16% to $46.3 billion.
- **Operating Income**: $63.9 billion.
- **Net Income**: $55.3 billion.

### Investments and Shareholder Returns:
- **Research and Development**: Expenses increased by 14% to $16.2 billion.
- **Stock Repurchase**: Apple repurchased $67.1 billion of its common stock.
- **Dividends**: Paid $14.1 billion in dividends and dividend equivalents.

### Summary:
Despite a decline in overall net sales, primarily due to lower iPhone sales, Apple saw significant growth in its Wearables, Home and Accessories, and Services segments. The company continued to invest heavily in research and development and returned substantial value to shareholders through stock repurchases and dividends.

## [Advanced] Setup Auto-Retrieval for Files

We make our file-level retrieval more sophisticated by allowing the LLM to infer a set of metadata filters, based on some relevant example documents. This allows document-level retrieval to be more precise, since it allows the LLM to narrow down search results via metadata filters and not just top-k.

We do some advanced things to make this happen
- Define a custom prompt to generate metadata filters
- Dynamically include few-shot examples of metadata as context to infer the set of metadata filters. These initial few-shot examples of metadata are obtained through vector search.

We prompt the LLM to generate a prompt, a list of filters, and optionally a top-k value. We will define another workflow that is subclassed from the `RouterQueryWorkflow`. In this workflow, we will replace the `_query()` method defined in `RouterQueryWorkflow`.

In this `_query()` method, we will check if the choice is the document-level retrieval. If it is, then we'll create a new query engine with certain LLM-generated filters applied. We'll return the response from this query engine.

A lot of the code below is lifted from our **VectorIndexAutoRetriever** module, which provides an out of the box way to do auto-retrieval against a vector index.

Since we are adding some customizations like adding few-shot examples, we re-use prompt pieces and implement auto-retrieval from scratch. 

In [12]:
from llama_index.core.prompts import ChatPromptTemplate
from llama_index.core.vector_stores.types import (
    VectorStoreInfo,
    VectorStoreQuerySpec,
    MetadataInfo,
    MetadataFilters,
)

SYS_PROMPT = """\
Your goal is to structure the user's query to match the request schema provided below.

<< Structured Request Schema >>
When responding use a markdown code snippet with a JSON object formatted in the \
following schema:

{schema_str}

The query string should contain only text that is expected to match the contents of \
documents. Any conditions in the filter should not be mentioned in the query as well.

Make sure that filters only refer to attributes that exist in the data source.
Make sure that filters take into account the descriptions of attributes.
Make sure that filters are only used as needed. If there are no filters that should be \
applied return [] for the filter value.\

If the user's query explicitly mentions number of documents to retrieve, set top_k to \
that number, otherwise do not set top_k.

The schema of the metadata filters in the vector db table is listed below, along with some example metadata dictionaries from relevant rows.
The user will send the input query string.

Data Source:
```json
{info_str}
```

Example metadata from relevant chunks:
{example_rows}
"""


class AutoRetrievalRouterQueryWorkflow(RouterQueryWorkflow):
    """Router query engine with auto retrieval."""

    async def _get_auto_retriever_query_engine(
        self, query: str, verbose: bool = False
    ) -> RetrieverQueryEngine:
        """Gets auto doc retriever query engine"""

        # retriever that retrieves example rows
        example_rows_retriever = index.as_retriever(
            retrieval_mode="chunks", rerank_top_n=4
        )

        def get_example_rows_fn(**kwargs):
            """Retrieve relevant few-shot examples of metadata."""

            query_str = kwargs["query_str"]
            nodes = example_rows_retriever.retrieve(query_str)
            # get the metadata, join them
            metadata_list = [n.metadata for n in nodes]

            return "\n".join([json.dumps(m) for m in metadata_list])

        # define chat prompt template to feed into LLM
        chat_prompt_tmpl = ChatPromptTemplate.from_messages(
            [
                ("system", SYS_PROMPT),
                ("user", "{query_str}"),
            ],
            function_mappings={"example_rows": get_example_rows_fn},
        )

        # information about vector store - used to generate json schema in prompt template
        vector_store_info = VectorStoreInfo(
            content_info="document chunks around Apple and Tesla 10K documents",
            metadata_info=[
                MetadataInfo(
                    name="file_name",
                    type="str",
                    description="Name of the source file",
                ),
            ],
        )

        # use structured output to get metadata filters and query str
        program = LLMTextCompletionProgram.from_defaults(
            output_cls=VectorStoreQuerySpec,
            prompt=chat_prompt_tmpl,
            llm=self.llm,
            verbose=self._verbose,
        )
        query_spec: VectorStoreQuerySpec = await program.acall(
            info_str=vector_store_info.json(indent=4),
            schema_str=VectorStoreQuerySpec.schema_json(indent=4),
            query_str=query,
        )

        # build retriever and query engine
        filters = (
            MetadataFilters(filters=query_spec.filters)
            if len(query_spec.filters) > 0
            else None
        )
        if verbose:
            print(f"> Using query str: {query_spec.query}")

        if filters and verbose:
            print(f"> Using filters{filters.json()}")

        retriever = index.as_retriever(
            retrieval_mode="files_via_content", files_top_k=1, filters=filters
        )

        query_engine = RetrieverQueryEngine.from_args(
            retriever, llm=self.llm, response_mode="tree_summarize"
        )

        return query_engine

    async def _query(self, query_str: str, choice_idx: int):
        """Query with auto retriever"""

        if choice_idx == 0:
            query_engine = await self._get_auto_retriever_query_engine(
                query_str, self._verbose
            )
        else:
            query_engine = self.query_engines[choice_idx]
        return await query_engine.aquery(query_str)


Create the auto retrieval query workflow, then wrap it around a RouterQueryEngine, then create a tool around that engine.

In [13]:
# auto retrieval query engine
auto_retrieval_query_workflow = AutoRetrievalRouterQueryWorkflow(
    query_engines=[query_engine_doc, query_engine_chunk],
    choice_descriptions=[TOOL_DOC_DESC, TOOL_CHUNK_DESC],
    verbose=True,
    llm=llm,
    router_prompt=ROUTER_PRMOPT,
)

# auto retrieval query engine
auto_retrieval_query_engine = RouterQueryEngine(
    router_query_workflow=auto_retrieval_query_workflow
)

# create tool
auto_retrieval_query_engine_tool = QueryEngineTool(
    query_engine=auto_retrieval_query_engine,
    metadata=ToolMetadata(
        name="Query_engine_auto_retrieval",
        description="Queries 10k reports for a given year.",
    ),
)

Create an agent using auto retrieval.

In [14]:
# agent
agent_router_output = RouterOutputAgentWorkflow(tools=[auto_retrieval_query_engine_tool], verbose=True, timeout=120)

## Example Queries

In [15]:
response = await agent_router_output.run(message="How was Tesla doing generally in 2021 and 2022?")
display(Markdown(response))

Running step chat
Chat message: None
Step chat produced event GatherToolsEvent
Running step dispatch_calls
Step dispatch_calls produced event GatherEvent
Running step gather
Step gather produced no event
Running step call_tool
Calling function Query_engine_auto_retrieval with msg {"input": "Tesla 2021"}
Running step choose_query_engine
Selected choice(s):
Choice: 1, Reason: The question 'Tesla 2021' seems to require a general summary of the 10K report for Tesla in 2021. Choice 1 is best suited for higher-level summarization options and synthesizes an answer by feeding in an entire relevant document as context.
Step choose_query_engine produced event ChooseQueryEngineEvent
Running step query_each_engine
Step query_each_engine produced event SynthesizeAnswersEvent
Running step synthesize_response
Step synthesize_response produced event StopEvent
Step call_tool produced event ToolCallEventResult
Running step call_tool
Calling function Query_engine_auto_retrieval with msg {"input": "Tesla 

### Tesla in 2021

In 2021, Tesla experienced significant growth and made strategic investments to bolster its financial and operational capabilities:

1. **Investment in Bitcoin**: Tesla updated its investment policy to allow for more flexibility in diversifying and maximizing returns on its cash reserves. This included a notable $1.5 billion investment in bitcoin and plans to accept bitcoin as a form of payment.

2. **Convertible Senior Notes**: The company received conversion notices for its 2022 and 2024 convertible senior notes, which it intended to settle in cash during the first quarter of 2021. Additionally, Tesla issued 2.00% Convertible Senior Notes due May 15, 2024.

3. **Partnerships and Agreements**: Tesla entered into a 2021 Pricing Agreement for Japan Cells with Panasonic Corporation of North America and SANYO Electric Co., Ltd., highlighting its ongoing efforts to secure essential components for its electric vehicles.

4. **Expansion in Electric Vehicle and Clean Energy Sectors**: Tesla continued to grow in the electric vehicle and clean energy sectors, managing various financial and operational activities, including securing loans and managing assets through agreements with financial institutions.

### Tesla in 2022

In 2022, Tesla built on its previous successes and achieved significant milestones:

1. **Vehicle Production and Delivery**: Tesla produced and delivered a substantial number of vehicles, focusing on increasing production capacity and efficiency to meet growing demand.

2. **Revenue Growth**: The company's revenue for 2022 was $81.46 billion, marking a significant increase from previous years.

3. **Energy Generation and Storage**: Tesla made strides in its energy generation and storage segment, deploying more energy storage products and solar energy systems.

4. **Global Infrastructure Expansion**: Tesla continued to expand its global infrastructure, including service centers and Supercharger stations, to support its growing customer base.

Overall, Tesla's performance in 2021 and 2022 reflected its ongoing commitment to innovation, strategic investments, and expansion in both the electric vehicle and clean energy markets.