# Auto-Retrieval with LlamaCloud

<a href="https://colab.research.google.com/github/run-llama/llamacloud-demo/blob/main/examples/advanced_rag/auto_retrieval.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Auto-retrieval** is an advanced RAG technique that uses an LLM to dynamically infer the metadata filter parameters along with the semantic query before initiating vector database retrieval, in comparison to naive RAG which directly sends the user query to the vector db retrieval interface (e.g. dense vector search). It can both be thought of as a form of query expansion/rewriting if you come from the retrieval world, as well as a specific form of function calling.

LlamaCloud helps you easily define chunk and document-level retrieval interfaces on top of any documents. In this guide we show you how to build an auto-retrieval pipeline on top of LlamaCloud retrievers over a research document corpus.

## Setup LlamaCloud 

Install core packages and download relevant files. Upload these documents to LlamaCloud, and then define a chunk and document-level retriever interface over these documents.

For more information on chunk-level and document-level retrieval, check out our interface [here](https://github.com/run-llama/llamacloud-demo/blob/main/examples/10k_apple_tesla/demo_file_retrieval.ipynb).

In [None]:
!pip install llama-index
!pip install llama-index-core
!pip install llama-parse

In [2]:
# NOTE: uncomment more papers if you want to do research over a larger subset of docs

urls = [
    # "https://openreview.net/pdf?id=VtmBAGCN7o",
    # "https://openreview.net/pdf?id=6PmJoRfdaK",
    # "https://openreview.net/pdf?id=LzPWWPAdY4",
    "https://openreview.net/pdf?id=VTF8yNQM66",
    "https://openreview.net/pdf?id=hSyW5go0v8",
    # "https://openreview.net/pdf?id=9WD9KwssyT",
    # "https://openreview.net/pdf?id=yV6fD7LYkF",
    # "https://openreview.net/pdf?id=hnrB5YHoYu",
    # "https://openreview.net/pdf?id=WbWtOYIzIK",
    "https://openreview.net/pdf?id=c5pwL0Soay",
    # "https://openreview.net/pdf?id=TpD2aG1h0D",
]

papers = [
    # "metagpt.pdf",
    # "longlora.pdf",
    # "loftq.pdf",
    "swebench.pdf",
    "selfrag.pdf",
    # "zipformer.pdf",
    # "values.pdf",
    # "finetune_fair_diffusion.pdf",
    # "knowledge_card.pdf",
    "metra.pdf",
    # "vr_mcl.pdf",
]

data_dir = "iclr_docs"

In [None]:
!mkdir "{data_dir}"
for url, paper in zip(urls, papers):
    !wget "{url}" -O "{data_dir}/{paper}"

#### Load Documents into LlamaCloud

Create a new index in LlamaCloud and drag and drop these downloaded PDFs into the data source.

For best results, in the Transformation Configuration click on the "Manual" tab, and set page-level segmentation configuration and "None" for additional chunking.

#### Setup LlamaCloud Index

In [13]:
from llama_index.indices.managed.llama_cloud import LlamaCloudIndex
import os

index = LlamaCloudIndex(
  name="research_papers_page",
  project_name="llamacloud_demo",
  api_key=os.environ["LLAMA_CLOUD_API_KEY"]
)

#### Define LlamaCloud File/Chunk Retriever over Documents

The document-level retriever returns documents at the level of entire files, and the chunk-level retriever returns specific chunks.

In [14]:
doc_retriever = index.as_retriever(
    retrieval_mode="files_via_content",
    # retrieval_mode="files_via_metadata",
    files_top_k=1
)

chunk_retriever = index.as_retriever(
    retrieval_mode="chunks",
    rerank_top_n=5
)

## Setup Auto-Retrieval

Now we setup an **auto-retrieval** function over our LlamaCloud retrievers. At a high-level our auto-retrieval function uses a function-calling LLM to infer the metadata filters for a user query - this leads to more precise and relevant retrieval results beyond just using a raw semantic query.

This section shows you how to build it from scratch, also includes some advanced few-shot example selection to increase reliability.
1. Define a custom prompt to generate metadata filters
2. Given a user query, first do chunk-level retrieval to dynamically retrieve the metadata of the retrieved chunks.
3. Inject the metadata as few-shot examples in the auto-retrieval prompt. The goal is to show the LLM what existing, relevant examples of metadata values already look like, so that the LLM can infer correct metadata filters.

A lot of the code below is lifted from our **VectorIndexAutoRetriever** module, which provides an out of the box way to do auto-retrieval against a vector index.

In [15]:
from llama_index.llms.openai import OpenAI

llm = OpenAI(model="gpt-4o")

In [38]:
from llama_index.core.prompts import ChatPromptTemplate
from llama_index.core.vector_stores.types import VectorStoreInfo, VectorStoreQuerySpec, MetadataInfo, MetadataFilters
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core import Response

import json

SYS_PROMPT = """\
Your goal is to structure the user's query to match the request schema provided below.
You MUST call the tool in order to generate the query spec.

<< Structured Request Schema >>
When responding use a markdown code snippet with a JSON object formatted in the \
following schema:

{schema_str}

The query string should contain only text that is expected to match the contents of \
documents. Any conditions in the filter should not be mentioned in the query as well.

Make sure that filters only refer to attributes that exist in the data source.
Make sure that filters take into account the descriptions of attributes.
Make sure that filters are only used as needed. If there are no filters that should be \
applied return [] for the filter value.\

If the user's query explicitly mentions number of documents to retrieve, set top_k to \
that number, otherwise do not set top_k.

The schema of the metadata filters in the vector db table is listed below, along with some example metadata dictionaries from relevant rows.
The user will send the input query string.

Data Source:
```json
{info_str}
```

Example metadata from relevant chunks:
{example_rows}

"""

example_rows_retriever = index.as_retriever(
    retrieval_mode="chunks",
    rerank_top_n=4
)

def get_example_rows_fn(**kwargs):
    """Retrieve relevant few-shot examples."""
    query_str = kwargs["query_str"]
    nodes = example_rows_retriever.retrieve(query_str)
    # get the metadata, join them
    metadata_list = [n.metadata for n in nodes]

    return "\n".join([json.dumps(m) for m in metadata_list])
        
    

# TODO: define function mapping for `example_rows`.
chat_prompt_tmpl = ChatPromptTemplate.from_messages(
    [
        ("system", SYS_PROMPT),
        ("user", "{query_str}"),
    ],
    function_mappings={
        "example_rows": get_example_rows_fn
    }
)


## NOTE: this is a dataclass that contains information about the metadata
vector_store_info = VectorStoreInfo(
    content_info="contains content from various research papers",
    metadata_info=[
        MetadataInfo(
            name="file_name",
            type="str",
            description="Name of the source paper",
        ),
    ],
)

def auto_retriever_rag(query: str, retriever: BaseRetriever) -> Response:
    """Synthesizes an answer to your question by feeding in an entire relevant document as context."""
    print(f"> User query string: {query}")
    # Use structured predict to infer the metadata filters and query string.
    query_spec = llm.structured_predict(
        VectorStoreQuerySpec,
        chat_prompt_tmpl,
        info_str=vector_store_info.json(indent=4),
        schema_str=VectorStoreQuerySpec.schema_json(indent=4),
        query_str=query
    )
    # build retriever and query engine
    filters = MetadataFilters(filters=query_spec.filters) if len(query_spec.filters) > 0 else None
    print(f"> Inferred query string: {query_spec.query}")
    if filters:
        print(f"> Inferred filters: {filters.json()}")
    query_engine = RetrieverQueryEngine.from_args(
        retriever, 
        llm=llm,
        response_mode="tree_summarize"
    )
    # run query
    return query_engine.query(query_spec.query)


### Try out Auto-Retrieval

Let's try running our auto-retriever on some sample queries. We try out both the chunk-level and document-level retrieval

In [39]:
from functools import partial

auto_doc_rag = partial(auto_retriever_rag, retriever=doc_retriever)
auto_chunk_rag = partial(auto_retriever_rag, retriever=chunk_retriever)

In [34]:
response = auto_chunk_rag("ELI5 the objective function in Metra")
print(str(response))

> User query string: ELI5 the objective function in Metra
> Inferred query string: objective function in Metra
The objective function in METRA involves maximizing the Wasserstein dependency measure (WDM) using a tractable approach. This is achieved by jointly training a 1-Lipschitz-constrained score function \( f(s, z) \) and a skill policy \( \pi(a|s, z) \) with the reward function being an empirical estimate of the WDM. The simplified objective is expressed as:

\[ IW(S; Z) \approx \sup_{\|\varphi\|_L \le 1} \mathbb{E}_{p(s, z)}[\varphi(s)^\top \psi(z)] - \mathbb{E}_{p(s)}[\varphi(s)]^\top \mathbb{E}_{p(z)}[\psi(z)] \]

where \( \varphi(s) \) and \( \psi(z) \) are parameterizations with independent 1-Lipschitz constraints. The full objective also incorporates the temporal distance between states as a distance metric, which is crucial for learning a compact set of useful behaviors.


In [35]:
response = auto_chunk_rag("How was SWE-Bench constructed? Tell me all the stages that went into it.")
print(str(response))

> User query string: How was SWE-Bench constructed? Tell me all the stages that went into it.
> Inferred query string: SWE-Bench construction stages
The construction of SWE-Bench involves a three-stage pipeline:

1. **Repo Selection and Data Scraping**: Pull requests (PRs) are collected from 12 popular open-source Python repositories on GitHub, resulting in approximately 90,000 PRs. These repositories are chosen for their popularity, better maintenance, clear contributor guidelines, and extensive test coverage.

2. **Attribute-Based Filtering**: Candidate tasks are created by selecting merged PRs that resolve a GitHub issue and make changes to the test files of the repository. This indicates that the user likely contributed tests to check whether the issue has been resolved.

3. **Execution-Based Filtering**: For each candidate task, the PR’s test content is applied, and the associated test results are logged before and after the PR’s other content is applied. Tasks are filtered out if

In [40]:
auto_doc_rag("Give me a summary of the SWE-bench paper") 
print(str(response))

> User query string: Give me a summary of the SWE-bench paper
> Inferred query string: summary of the SWE-bench paper
> Inferred filters: {"filters": [{"key": "file_name", "value": "swebench.pdf", "operator": "=="}], "condition": "and"}
The construction of SWE-Bench involves a three-stage pipeline:

1. **Repo Selection and Data Scraping**: Pull requests (PRs) are collected from 12 popular open-source Python repositories on GitHub, resulting in approximately 90,000 PRs. These repositories are chosen for their popularity, better maintenance, clear contributor guidelines, and extensive test coverage.

2. **Attribute-Based Filtering**: Candidate tasks are created by selecting merged PRs that resolve a GitHub issue and make changes to the test files of the repository. This indicates that the user likely contributed tests to check whether the issue has been resolved.

3. **Execution-Based Filtering**: For each candidate task, the PR’s test content is applied, and the associated test results 

In [41]:
auto_doc_rag("Give me a summary of the Self-RAG paper") 
print(str(response))

> User query string: Give me a summary of the Self-RAG paper
> Inferred query string: summary of the Self-RAG paper
> Inferred filters: {"filters": [{"key": "file_name", "value": "selfrag.pdf", "operator": "=="}], "condition": "and"}
The construction of SWE-Bench involves a three-stage pipeline:

1. **Repo Selection and Data Scraping**: Pull requests (PRs) are collected from 12 popular open-source Python repositories on GitHub, resulting in approximately 90,000 PRs. These repositories are chosen for their popularity, better maintenance, clear contributor guidelines, and extensive test coverage.

2. **Attribute-Based Filtering**: Candidate tasks are created by selecting merged PRs that resolve a GitHub issue and make changes to the test files of the repository. This indicates that the user likely contributed tests to check whether the issue has been resolved.

3. **Execution-Based Filtering**: For each candidate task, the PR’s test content is applied, and the associated test results are

## Next Steps

Now that you've learned the basics of auto-retrieval, you can choose to build a standalone RAG pipeline powered by this, or choose to plug this in as part of a broader agentic system. For instance, you can plug in both chunk and doc-level auto-retriever pipelines as tools for an agent to interact with. 