<a href="https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/tools/eval_query_engine_tool.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Evaluation Query Engine Tool

In this section we will show you how you can use an `EvalQueryEngineTool` with an agent. Some reasons you may want to use a `EvalQueryEngineTool`:
1. Use specific kind of evaluation for a tool, and not just the agent's reasoning
2. Use a different LLM for evaluating tool responses than the agent LLM

An `EvalQueryEngineTool` is built on top of the `QueryEngineTool`. Along with wrapping an existing [query engine](https://docs.llamaindex.ai/en/stable/module_guides/deploying/query_engine/root.html), it also must be given an existing [evaluator](https://docs.llamaindex.ai/en/stable/examples/evaluation/answer_and_context_relevancy.html) to evaluate the responses of that query engine.


## Install Dependencies

In [None]:
%pip install llama-index-embeddings-huggingface
%pip install llama-index-llms-openai

## Initialize and Set LLM and Local Embedding Model


In [None]:
from llama_index.core.settings import Settings
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.openai import OpenAI

Settings.embed_model = HuggingFaceEmbedding(
    model_name="BAAI/bge-small-en-v1.5"
)
Settings.llm = OpenAI()

## Download and Index Data
This is something we are donig for the sake of this demo. In production environments, data stores and indexes should already exist and not be created on the fly.

### Create Storage Contexts

In [None]:
from llama_index.core import (
    StorageContext,
    load_index_from_storage,
)

try:
    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/lyft",
    )
    lyft_index = load_index_from_storage(storage_context)

    storage_context = StorageContext.from_defaults(
        persist_dir="./storage/uber"
    )
    uber_index = load_index_from_storage(storage_context)

    index_loaded = True
except:
    index_loaded = False

Download Data

In [None]:
!mkdir -p 'data/10k/'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'
!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'

### Load Data

In [None]:
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

if not index_loaded:
    # load data
    lyft_docs = SimpleDirectoryReader(
        input_files=["./data/10k/lyft_2021.pdf"]
    ).load_data()
    uber_docs = SimpleDirectoryReader(
        input_files=["./data/10k/uber_2021.pdf"]
    ).load_data()

    # build index
    lyft_index = VectorStoreIndex.from_documents(lyft_docs)
    uber_index = VectorStoreIndex.from_documents(uber_docs)

    # persist index
    lyft_index.storage_context.persist(persist_dir="./storage/lyft")
    uber_index.storage_context.persist(persist_dir="./storage/uber")

## Create Query Engines

In [None]:
lyft_engine = lyft_index.as_query_engine(similarity_top_k=3)
uber_engine = uber_index.as_query_engine(similarity_top_k=3)

## Create Evaluator

In [None]:
from llama_index.core.evaluation import RelevancyEvaluator

evaluator = RelevancyEvaluator()

## Create Query Engine Tools

For demonstration purposes, we are setting tool descriptions to "force" the llm to choose `tool_1` first so that we can control the outcome and force evaluation to fail.

In [None]:
from llama_index.core.tools import EvalQueryEngineTool, ToolMetadata

query_engine_tools = [
    EvalQueryEngineTool(
        evaluator=evaluator,
        query_engine=lyft_engine,
        metadata=ToolMetadata(
            name="tool_1",
            description=(
                "Provides information about financials for year 2021. Use this tool first"
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
    EvalQueryEngineTool(
        evaluator=evaluator,
        query_engine=uber_engine,
        metadata=ToolMetadata(
            name="tool_2",
            description=(
                "Provides information about financials for year 2021. Use this tool if the first one does not provide data."
                "Use a detailed plain text question as input to the tool."
            ),
        ),
    ),
]

## Setup ReAct Agent

In [None]:
from llama_index.core.agent import ReActAgent

agent = ReActAgent.from_tools(query_engine_tools, verbose=True)

## Query Engine Fails Evaluation

For demonstration purposes, we will "force" the agent to choose the wrong tool first so that we can observe the effect of the `EvalQueryEngineTool` when evaluation fails. As mentioned earlier, we have used tool descriptions to force this scenario.

You can also try specifying the `tool_choice` argument when calling `chat`. We will pass `tool_1` as the `tool_choice`, which contains Lyft financials, and ask a question about Uber financials.

This is what we should expect to happen:
1. The agent will use the `tool_1` tool first, which contains the wrong financials, as we have instructed it to do so
2. The `EvalQueryEngineTool` will evaluate the response of the query engine using its evaluator
3. The query engine output will fail evaluation because it contains Lyft's financials and not Uber's
4. The tool will form a response that informs the agent that the tool could not be used, giving a reason
5. The agent will fallback to the second tool, being `uber_10k`
6. The query engine output of the second tool will pass evaluation because it contains Uber's financials
6. The agent will respond with an answer

In [None]:
response = await agent.achat(
    "What was Uber's revenue growth in 2021?", tool_choice="tool_1"
)
print(str(response))

[1;3;38;5;200mThought: I need to use a tool to help me answer the question.
Action: tool_1
Action Input: {'input': "What was Uber's revenue growth in 2021?"}
[0mquery="What was Uber's revenue growth in 2021?" contexts=['Revenue per Active Rider reached an all-time high in the three months ended December 31, 2021, increasingcompared\n to the three months ended September 30, 2021. This was driven by an increase in ride frequency as well as a shift toward higher revenue rides such as airportrides,\n reflecting the increased travel experienced in the fourth quarter in 2021 nationwide. Revenue per Active Rider also benefited from revenues from licensing and dataaccess agreements, beg\ninning in the second quarter of 2021.Critical Accounting Policies and Estimates\nOur\n consolidated  financial  statements  and  the  related  notes  thereto  included  elsewhere  in  this  Annual  Report  on  Form  10-K  are  prepared  in  accordance  withGAAP.\n The  preparation  of  consolidated  financia

## Query Engine Passes Evaluation

Here we are asking a question about Lyft's financials. This is what we should expect to happen:
1. The agent will use the `lyft_10k` tool first, simply based on its description
2. The `EvalQueryEngineTool` will evaluate the response of the query engine using its evaluator
3. The output of the query engine will pass evaluation because it contains Lyft's financials
4. The agent will respond with an answer

In [None]:
response = await agent.achat("What was Lyft's revenue growth in 2021?")
print(str(response))

[1;3;38;5;200mThought: I need to use a tool to help me answer the question.
Action: tool_1
Action Input: {'input': "What was Lyft's revenue growth in 2021?"}
[0mquery="What was Lyft's revenue growth in 2021?" contexts=['Revenue per Active Rider reached an all-time high in the three months ended December 31, 2021, increasingcompared\n to the three months ended September 30, 2021. This was driven by an increase in ride frequency as well as a shift toward higher revenue rides such as airportrides,\n reflecting the increased travel experienced in the fourth quarter in 2021 nationwide. Revenue per Active Rider also benefited from revenues from licensing and dataaccess agreements, beg\ninning in the second quarter of 2021.Critical Accounting Policies and Estimates\nOur\n consolidated  financial  statements  and  the  related  notes  thereto  included  elsewhere  in  this  Annual  Report  on  Form  10-K  are  prepared  in  accordance  withGAAP.\n The  preparation  of  consolidated  financia