diff --git a/docs/docs/CHANGELOG.md b/docs/docs/CHANGELOG.md index 7da8b3966331c..e40d1b0aa8b67 100644 --- a/docs/docs/CHANGELOG.md +++ b/docs/docs/CHANGELOG.md @@ -1,5 +1,32 @@ # ChangeLog +## [2024-03-23] + +### `llama-index-core` [0.10.23] + +- Added `(a)predict_and_call()` function to base LLM class + openai + mistralai (#12188) +- fixed bug with `wait()` in async agent streaming (#12187) + +### `llama-index-embeddings-alephalpha` [0.1.0] + +- Added alephalpha embeddings (#12149) + +### `llama-index-llms-alephalpha` [0.1.0] + +- Added alephalpha LLM (#12149) + +### `llama-index-llms-openai` [0.1.7] + +- fixed bug with `wait()` in async agent streaming (#12187) + +### `llama-index-readers-docugami` [0.1.4] + +- fixed import errors in docugami reader (#12154) + +### `llama-index-readers-file` [0.1.12] + +- fix PDFReader for remote fs (#12186) + ## [2024-03-21] ### `llama-index-core` [0.10.22] diff --git a/docs/docs/CONTRIBUTING.md b/docs/docs/CONTRIBUTING.md index d80321b7f3353..37c7cc660b189 100644 --- a/docs/docs/CONTRIBUTING.md +++ b/docs/docs/CONTRIBUTING.md @@ -375,22 +375,31 @@ Whether if it's the latest research, or what you thought of in the shower, we'd We would love your help in making the project cleaner, more robust, and more understandable. If you find something confusing, it most likely is for other people as well. Help us be better! -## Development Guideline +## Development Guidelines -### Environment Setup +### Setting up environment LlamaIndex is a Python package. We've tested primarily with Python versions >= 3.8. Here's a quick -and dirty guide to getting your environment setup. - -First, create a fork of LlamaIndex, by clicking the "Fork" button on the [LlamaIndex Github page](https://github.com/jerryjliu/llama_index). -Following [these steps](https://docs.github.com/en/get-started/quickstart/fork-a-repo) for more details -on how to fork the repo and clone the forked repo. - -Then, create a new Python virtual environment using poetry. - -- [Install poetry](https://python-poetry.org/docs/#installation) - this will help you manage package dependencies -- `poetry shell` - this command creates a virtual environment, which keeps installed packages contained to this project -- `poetry install --with dev,docs` - this will install all dependencies needed for most local development +and dirty guide to setting up your environment for local development. + +1. Fork [LlamaIndex Github repo][ghr]\* and clone it locally. (New to GitHub / git? Here's [how][frk].) +2. In a terminal, `cd` into the directory of your local clone of your forked repo. +3. Install [pre-commit hooks][pch]\* by running `pre-commit install`. These hooks are small house-keeping scripts executed every time you make a git commit, which automates away a lot of chores. +4. `cd` into the specific package you want to work on. For example, if I want to work on the core package, I execute `cd llama-index-core/`. (New to terminal / command line? Here's a [getting started guide][gsg].) +5. Prepare a [virtual environment][vev]. + 1. [Install Poetry][pet]\*. This will help you manage package dependencies. + 2. Execute `poetry shell`. This command will create a [virtual environment][vev] specific for this package, which keeps installed packages contained to this project. (New to Poetry, the dependency & packaging manager for Python? Read about its basic usage [here][bus].) + 3. Execute `poetry install --with dev,docs`\*. This will install all dependencies needed for local development. To see what will be installed, read the `pyproject.toml` under that directory. + +[frk]: https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo +[ghr]: https://github.com/run-llama/llama_index/ +[pch]: https://pre-commit.com/ +[gsg]: https://www.freecodecamp.org/news/command-line-for-beginners/ +[pet]: https://python-poetry.org/docs/#installation +[vev]: https://python-poetry.org/docs/managing-environments/ +[bus]: https://python-poetry.org/docs/basic-usage/ + +Steps marked with an asterisk (`*`) are one-time tasks. You don't have to repeat them when you attempt to contribute on something else next time. Now you should be set! @@ -401,41 +410,52 @@ let's also make sure to `test` it and perhaps create an `example notebook`. #### Formatting/Linting -You can format and lint your changes with the following commands in the root directory: +We run an assortment of linters: `black`, `ruff`, `mypy`. -```bash -make format; make lint -``` +If you have installed pre-commit hooks in this repo, they should have taken care of the formatting and linting automatically. -You can also make use of our pre-commit hooks by setting up git hook scripts: +If -- for whatever reason -- you would like to do it manually, you can format and lint your changes with the following commands in the root directory: ```bash -pre-commit install +make format; make lint ``` -We run an assortment of linters: `black`, `ruff`, `mypy`. +Under the hood, we still install pre-commit hooks for you, so that you don't have to do this manually next time. #### Testing -For bigger changes, you'll want to create a unit test. Our tests are in the `tests` folder. -We use `pytest` for unit testing. To run all unit tests, run the following in the root dir: +If you modified or added code logic, **create test(s)**, because they help preventing other maintainers from accidentally breaking the nice things you added / re-introducing the bugs you fixed. -```bash -pytest tests -``` +- In almost all cases, add **unit tests**. +- If your change involves adding a new integration, also add **integration tests**. When doing so, please [mock away][mck] the remote system that you're integrating LlamaIndex with, so that when the remote system changes, LlamaIndex developers won't see test failures. -or +Reciprocally, you should **run existing tests** (from every package that you touched) before making a git commit, so that you can be sure you didn't break someone else's good work. -```bash +(By the way, when a test is run with the goal of detecting whether something broke in a new version of the codebase, it's referred to as a "[regression test][reg]". You'll also hear people say "the test _regressed_" as a more diplomatic way of saying "the test _failed_".) + +Our tests are stored in the `tests` folders under each package directory. We use the testing framework [pytest][pyt], so you can **just run `pytest` in each package you touched** to run all its tests. + +Just like with formatting and linting, if you prefer to do things the [make][mkf] way, run: + +```shell make test ``` +Regardless of whether you have run them locally, a [CI system][cis] will run all affected tests on your PR when you submit one anyway. There, tests are orchestrated with [Pants][pts], the build system of our choice. There is a slight chance that tests broke on CI didn't break on your local machine or the other way around. When that happens, please take our CI as the source of truth. This is because our release pipeline (which builds the packages users are going to download from PyPI) are run in the CI, not on your machine (even if you volunteer), so it's the CI that is the golden standard. + +[reg]: https://www.browserstack.com/guide/regression-testing +[mck]: https://pytest-mock.readthedocs.io/en/latest/ +[pyt]: https://docs.pytest.org/ +[mkf]: https://makefiletutorial.com/ +[cis]: https://www.atlassian.com/continuous-delivery/continuous-integration +[pts]: https://www.pantsbuild.org/ + ### Creating an Example Notebook For changes that involve entirely new features, it may be worth adding an example Jupyter notebook to showcase this feature. -Example notebooks can be found in this folder: . +Example notebooks can be found in [this folder](https://github.com/run-llama/llama_index/tree/main/docs/examples). ### Creating a pull request diff --git a/docs/docs/examples/tools/eval_query_engine_tool.ipynb b/docs/docs/examples/tools/eval_query_engine_tool.ipynb new file mode 100644 index 0000000000000..d094153013e8a --- /dev/null +++ b/docs/docs/examples/tools/eval_query_engine_tool.ipynb @@ -0,0 +1,404 @@ +{ + "cells": [ + { + "attachments": {}, + "cell_type": "markdown", + "id": "6b0186a4", + "metadata": {}, + "source": [ + "\"Open" + ] + }, + { + "cell_type": "markdown", + "id": "b50c4af8-fec3-4396-860a-1322089d76cb", + "metadata": {}, + "source": [ + "# Evaluation Query Engine Tool\n", + "\n", + "In this section we will show you how you can use an `EvalQueryEngineTool` with an agent. Some reasons you may want to use a `EvalQueryEngineTool`:\n", + "1. Use specific kind of evaluation for a tool, and not just the agent's reasoning\n", + "2. Use a different LLM for evaluating tool responses than the agent LLM\n", + "\n", + "An `EvalQueryEngineTool` is built on top of the `QueryEngineTool`. Along with wrapping an existing [query engine](https://docs.llamaindex.ai/en/stable/module_guides/deploying/query_engine/root.html), it also must be given an existing [evaluator](https://docs.llamaindex.ai/en/stable/examples/evaluation/answer_and_context_relevancy.html) to evaluate the responses of that query engine.\n" + ] + }, + { + "cell_type": "markdown", + "id": "db402a8b-90d6-4e1d-8df6-347c54624f26", + "metadata": {}, + "source": [ + "## Install Dependencies" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "dd31aff7", + "metadata": {}, + "outputs": [], + "source": [ + "%pip install llama-index-embeddings-huggingface\n", + "%pip install llama-index-llms-openai\n", + "%pip install llama-index-agents-openai" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9f9fcf29", + "metadata": {}, + "outputs": [], + "source": [ + "import os\n", + "\n", + "os.environ[\"OPENAI_API_KEY\"] = \"sk-...\"" + ] + }, + { + "cell_type": "markdown", + "id": "7603dec1", + "metadata": {}, + "source": [ + "## Initialize and Set LLM and Local Embedding Model\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "05fd9050", + "metadata": {}, + "outputs": [], + "source": [ + "from llama_index.core.settings import Settings\n", + "from llama_index.embeddings.huggingface import HuggingFaceEmbedding\n", + "from llama_index.llms.openai import OpenAI\n", + "\n", + "Settings.embed_model = HuggingFaceEmbedding(\n", + " model_name=\"BAAI/bge-small-en-v1.5\"\n", + ")\n", + "Settings.llm = OpenAI()" + ] + }, + { + "cell_type": "markdown", + "id": "6c6bdb82", + "metadata": {}, + "source": [ + "## Download and Index Data\n", + "This is something we are donig for the sake of this demo. In production environments, data stores and indexes should already exist and not be created on the fly." + ] + }, + { + "cell_type": "markdown", + "id": "64df0568", + "metadata": {}, + "source": [ + "### Create Storage Contexts" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "91618236-54d3-4783-86b7-7b7554efeed1", + "metadata": {}, + "outputs": [], + "source": [ + "from llama_index.core import (\n", + " StorageContext,\n", + " load_index_from_storage,\n", + ")\n", + "\n", + "try:\n", + " storage_context = StorageContext.from_defaults(\n", + " persist_dir=\"./storage/lyft\",\n", + " )\n", + " lyft_index = load_index_from_storage(storage_context)\n", + "\n", + " storage_context = StorageContext.from_defaults(\n", + " persist_dir=\"./storage/uber\"\n", + " )\n", + " uber_index = load_index_from_storage(storage_context)\n", + "\n", + " index_loaded = True\n", + "except:\n", + " index_loaded = False" + ] + }, + { + "attachments": {}, + "cell_type": "markdown", + "id": "6a79cbc9", + "metadata": {}, + "source": [ + "Download Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "36d80144", + "metadata": {}, + "outputs": [], + "source": [ + "!mkdir -p 'data/10k/'\n", + "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/uber_2021.pdf' -O 'data/10k/uber_2021.pdf'\n", + "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/10k/lyft_2021.pdf' -O 'data/10k/lyft_2021.pdf'" + ] + }, + { + "cell_type": "markdown", + "id": "4f801ac5", + "metadata": {}, + "source": [ + "### Load Data" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "d3d0bb8c-16c8-4946-a9d8-59528cf3952a", + "metadata": {}, + "outputs": [], + "source": [ + "from llama_index.core import SimpleDirectoryReader, VectorStoreIndex\n", + "\n", + "if not index_loaded:\n", + " # load data\n", + " lyft_docs = SimpleDirectoryReader(\n", + " input_files=[\"./data/10k/lyft_2021.pdf\"]\n", + " ).load_data()\n", + " uber_docs = SimpleDirectoryReader(\n", + " input_files=[\"./data/10k/uber_2021.pdf\"]\n", + " ).load_data()\n", + "\n", + " # build index\n", + " lyft_index = VectorStoreIndex.from_documents(lyft_docs)\n", + " uber_index = VectorStoreIndex.from_documents(uber_docs)\n", + "\n", + " # persist index\n", + " lyft_index.storage_context.persist(persist_dir=\"./storage/lyft\")\n", + " uber_index.storage_context.persist(persist_dir=\"./storage/uber\")" + ] + }, + { + "cell_type": "markdown", + "id": "ccb89178", + "metadata": {}, + "source": [ + "## Create Query Engines" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "31892898-a2dc-43c8-812a-3442feb2108d", + "metadata": {}, + "outputs": [], + "source": [ + "lyft_engine = lyft_index.as_query_engine(similarity_top_k=5)\n", + "uber_engine = uber_index.as_query_engine(similarity_top_k=5)" + ] + }, + { + "cell_type": "markdown", + "id": "880c2007", + "metadata": {}, + "source": [ + "## Create Evaluator" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "911235b3", + "metadata": {}, + "outputs": [], + "source": [ + "from llama_index.core.evaluation import RelevancyEvaluator\n", + "\n", + "evaluator = RelevancyEvaluator()" + ] + }, + { + "cell_type": "markdown", + "id": "60c542c1", + "metadata": {}, + "source": [ + "## Create Query Engine Tools" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f9f3158a-7647-4442-8de1-4db80723b4d2", + "metadata": {}, + "outputs": [], + "source": [ + "from llama_index.core.tools import ToolMetadata\n", + "from llama_index.core.tools.eval_query_engine import EvalQueryEngineTool\n", + "\n", + "query_engine_tools = [\n", + " EvalQueryEngineTool(\n", + " evaluator=evaluator,\n", + " query_engine=lyft_engine,\n", + " metadata=ToolMetadata(\n", + " name=\"lyft\",\n", + " description=(\n", + " \"Provides information about Lyft's financials for year 2021. \"\n", + " \"Use a detailed plain text question as input to the tool.\"\n", + " ),\n", + " ),\n", + " ),\n", + " EvalQueryEngineTool(\n", + " evaluator=evaluator,\n", + " query_engine=uber_engine,\n", + " metadata=ToolMetadata(\n", + " name=\"uber\",\n", + " description=(\n", + " \"Provides information about Uber's financials for year 2021. \"\n", + " \"Use a detailed plain text question as input to the tool.\"\n", + " ),\n", + " ),\n", + " ),\n", + "]" + ] + }, + { + "cell_type": "markdown", + "id": "275c01b1-8dce-4216-9203-1e961b7fc313", + "metadata": {}, + "source": [ + "## Setup OpenAI Agent" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "ded93297-fee8-4329-bf37-cf77e87621ae", + "metadata": {}, + "outputs": [], + "source": [ + "from llama_index.agent.openai import OpenAIAgent\n", + "\n", + "agent = OpenAIAgent.from_tools(query_engine_tools, verbose=True)" + ] + }, + { + "cell_type": "markdown", + "id": "699ee1bb", + "metadata": {}, + "source": [ + "## Query Engine Fails Evaluation\n", + "\n", + "For demonstration purposes, we will tell the agent to choose the wrong tool first so that we can observe the effect of the `EvalQueryEngineTool` when evaluation fails. To achieve this, we will `tool_choice` to `lyft` when calling the agent.\n", + "\n", + "This is what we should expect to happen:\n", + "1. The agent will use the `lyft` tool first, which contains the wrong financials, as we have instructed it to do so\n", + "2. The `EvalQueryEngineTool` will evaluate the response of the query engine using its evaluator\n", + "3. The query engine output will fail evaluation because it contains Lyft's financials and not Uber's\n", + "4. The tool will form a response that informs the agent that the tool could not be used, giving a reason\n", + "5. The agent will fallback to the second tool, being `uber`\n", + "6. The query engine output of the second tool will pass evaluation because it contains Uber's financials\n", + "6. The agent will respond with an answer" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "70a82471-9226-42ad-bd8a-aebde3530d95", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Added user message to memory: What was Uber's revenue growth in 2021?\n", + "=== Calling Function ===\n", + "Calling function: lyft with args: {\"input\":\"What was Uber's revenue growth in 2021?\"}\n", + "Got output: Could not use tool lyft because it failed evaluation.\n", + "Reason: NO\n", + "========================\n", + "\n", + "=== Calling Function ===\n", + "Calling function: uber with args: {\"input\":\"What was Uber's revenue growth in 2021?\"}\n", + "Got output: Uber's revenue grew by 57% in 2021.\n", + "========================\n", + "\n", + "Uber's revenue grew by 57% in 2021.\n" + ] + } + ], + "source": [ + "response = await agent.achat(\n", + " \"What was Uber's revenue growth in 2021?\", tool_choice=\"lyft\"\n", + ")\n", + "print(str(response))" + ] + }, + { + "cell_type": "markdown", + "id": "48eec4e4", + "metadata": {}, + "source": [ + "## Query Engine Passes Evaluation\n", + "\n", + "Here we are asking a question about Lyft's financials. This is what we should expect to happen:\n", + "1. The agent will use the `lyftk` tool first, simply based on its description as we have **not** set `tool_choice` here\n", + "2. The `EvalQueryEngineTool` will evaluate the response of the query engine using its evaluator\n", + "3. The output of the query engine will pass evaluation because it contains Lyft's financials" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7b114dd1", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Added user message to memory: What was Lyft's revenue growth in 2021?\n", + "=== Calling Function ===\n", + "Calling function: lyft with args: {\"input\": \"What was Lyft's revenue growth in 2021?\"}\n", + "Got output: Lyft's revenue growth in 2021 was $3,208,323, which increased compared to the revenue in 2020 and 2019.\n", + "========================\n", + "\n", + "=== Calling Function ===\n", + "Calling function: uber with args: {\"input\": \"What was Lyft's revenue growth in 2021?\"}\n", + "Got output: Could not use tool uber because it failed evaluation.\n", + "Reason: NO\n", + "========================\n", + "\n", + "Lyft's revenue grew by $3,208,323 in 2021, which increased compared to the revenue in 2020 and 2019.\n" + ] + } + ], + "source": [ + "response = await agent.achat(\"What was Lyft's revenue growth in 2021?\")\n", + "print(str(response))" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/mkdocs.yml b/docs/mkdocs.yml index a6fe2d8859c5e..776b9bc72de5e 100644 --- a/docs/mkdocs.yml +++ b/docs/mkdocs.yml @@ -77,6 +77,7 @@ nav: - ./examples/agent/openai_agent_query_plan.ipynb - ./examples/agent/openai_retrieval_benchmark.ipynb - ./examples/agent/agent_runner/agent_around_query_pipeline_with_HyDE_for_PDFs.ipynb + - ./examples/agent/mistral_agent.ipynb - Callbacks: - ./examples/callbacks/HoneyHiveLlamaIndexTracer.ipynb - ./examples/callbacks/PromptLayerHandler.ipynb @@ -427,6 +428,7 @@ nav: - ./examples/retrievers/videodb_retriever.ipynb - Tools: - ./examples/tools/OnDemandLoaderTool.ipynb + - ./examples/tools/eval_query_engine_tool.ipynb - Transforms: - ./examples/transforms/TransformsEval.ipynb - Use Cases: @@ -814,6 +816,7 @@ nav: - ./api_reference/packs/ollama_query_engine.md - ./api_reference/packs/panel_chatbot.md - ./api_reference/packs/query_understanding_agent.md + - ./api_reference/packs/raft_dataset.md - ./api_reference/packs/rag_cli_local.md - ./api_reference/packs/rag_evaluator.md - ./api_reference/packs/rag_fusion_query_pipeline.md @@ -1134,6 +1137,7 @@ nav: - ./api_reference/storage/chat_store/simple.md - Docstore: - ./api_reference/storage/docstore/dynamodb.md + - ./api_reference/storage/docstore/elasticsearch.md - ./api_reference/storage/docstore/firestore.md - ./api_reference/storage/docstore/index.md - ./api_reference/storage/docstore/mongodb.md @@ -1150,6 +1154,7 @@ nav: - ./api_reference/storage/graph_stores/simple.md - Index Store: - ./api_reference/storage/index_store/dynamodb.md + - ./api_reference/storage/index_store/elasticsearch.md - ./api_reference/storage/index_store/firestore.md - ./api_reference/storage/index_store/index.md - ./api_reference/storage/index_store/mongodb.md @@ -1740,6 +1745,9 @@ plugins: - ../llama-index-integrations/graph_stores/llama-index-graph-stores-neptune - ../llama-index-integrations/embeddings/llama-index-embeddings-alephalpha - ../llama-index-integrations/llms/llama-index-llms-alephalpha + - ../llama-index-packs/llama-index-packs-raft-dataset + - ../llama-index-integrations/storage/docstore/llama-index-storage-docstore-elasticsearch + - ../llama-index-integrations/storage/index_store/llama-index-storage-index-store-elasticsearch - redirects: redirect_maps: ./api/llama_index.vector_stores.MongoDBAtlasVectorSearch.html: api_reference/storage/vector_store/mongodb.md diff --git a/llama-index-core/llama_index/core/tools/eval_query_engine.py b/llama-index-core/llama_index/core/tools/eval_query_engine.py new file mode 100644 index 0000000000000..506af37dd5f66 --- /dev/null +++ b/llama-index-core/llama_index/core/tools/eval_query_engine.py @@ -0,0 +1,93 @@ +from typing import Any, Optional + +from llama_index.core.base.base_query_engine import BaseQueryEngine +from llama_index.core.evaluation import ( + AnswerRelevancyEvaluator, + BaseEvaluator, + EvaluationResult, +) +from llama_index.core.tools import QueryEngineTool +from llama_index.core.tools.types import ToolMetadata, ToolOutput + + +DEFAULT_NAME = "query_engine_tool" +DEFAULT_DESCRIPTION = """Useful for running a natural language query +against a knowledge base and get back a natural language response. +""" +FAILED_TOOL_OUTPUT_TEMPLATE = ( + "Could not use tool {tool_name} because it failed evaluation.\n" "Reason: {reason}" +) + + +class EvalQueryEngineTool(QueryEngineTool): + """Evaluating query engine tool. + + A tool that makes use of a query engine and an evaluator, where the + evaluation of the query engine response will determine the tool output. + + Args: + evaluator (BaseEvaluator): A query engine. + query_engine (BaseQueryEngine): A query engine. + metadata (ToolMetadata): The associated metadata of the query engine. + """ + + _evaluator: BaseEvaluator + _failed_tool_output_template: str + + def __init__( + self, + evaluator: BaseEvaluator, + *args, + failed_tool_output_template: str = FAILED_TOOL_OUTPUT_TEMPLATE, + **kwargs + ): + super().__init__(*args, **kwargs) + self._evaluator = evaluator + self._failed_tool_output_template = failed_tool_output_template + + def _process_tool_output( + self, + tool_output: ToolOutput, + evaluation_result: EvaluationResult, + ) -> ToolOutput: + if evaluation_result.passing: + return tool_output + + tool_output.content = self._failed_tool_output_template.format( + tool_name=self.metadata.name, + reason=evaluation_result.feedback, + ) + return tool_output + + @classmethod + def from_defaults( + cls, + query_engine: BaseQueryEngine, + evaluator: Optional[BaseEvaluator] = None, + name: Optional[str] = None, + description: Optional[str] = None, + resolve_input_errors: bool = True, + ) -> "EvalQueryEngineTool": + return cls( + evaluator=evaluator or AnswerRelevancyEvaluator(), + query_engine=query_engine, + metadata=ToolMetadata( + name=name or DEFAULT_NAME, + description=description or DEFAULT_DESCRIPTION, + ), + resolve_input_errors=resolve_input_errors, + ) + + def call(self, *args: Any, **kwargs: Any) -> ToolOutput: + tool_output = super().call(*args, **kwargs) + evaluation_results = self._evaluator.evaluate_response( + tool_output.raw_input["input"], tool_output.raw_output + ) + return self._process_tool_output(tool_output, evaluation_results) + + async def acall(self, *args: Any, **kwargs: Any) -> ToolOutput: + tool_output = await super().acall(*args, **kwargs) + evaluation_results = await self._evaluator.aevaluate_response( + tool_output.raw_input["input"], tool_output.raw_output + ) + return self._process_tool_output(tool_output, evaluation_results) diff --git a/llama-index-core/llama_index/core/tools/query_engine.py b/llama-index-core/llama_index/core/tools/query_engine.py index d4bed51a0b961..f5712b407106b 100644 --- a/llama-index-core/llama_index/core/tools/query_engine.py +++ b/llama-index-core/llama_index/core/tools/query_engine.py @@ -61,18 +61,7 @@ def metadata(self) -> ToolMetadata: return self._metadata def call(self, *args: Any, **kwargs: Any) -> ToolOutput: - if args is not None and len(args) > 0: - query_str = str(args[0]) - elif kwargs is not None and "input" in kwargs: - # NOTE: this assumes our default function schema of `input` - query_str = kwargs["input"] - elif kwargs is not None and self._resolve_input_errors: - query_str = str(kwargs) - else: - raise ValueError( - "Cannot call query engine without specifying `input` parameter." - ) - + query_str = self._get_query_str(*args, **kwargs) response = self._query_engine.query(query_str) return ToolOutput( content=str(response), @@ -82,16 +71,7 @@ def call(self, *args: Any, **kwargs: Any) -> ToolOutput: ) async def acall(self, *args: Any, **kwargs: Any) -> ToolOutput: - if args is not None and len(args) > 0: - query_str = str(args[0]) - elif kwargs is not None and "input" in kwargs: - # NOTE: this assumes our default function schema of `input` - query_str = kwargs["input"] - elif kwargs is not None and self._resolve_input_errors: - query_str = str(kwargs) - else: - raise ValueError("Cannot call query engine without inputs") - + query_str = self._get_query_str(*args, **kwargs) response = await self._query_engine.aquery(query_str) return ToolOutput( content=str(response), @@ -112,3 +92,17 @@ def as_langchain_tool(self) -> "LlamaIndexTool": description=self.metadata.description, ) return LlamaIndexTool.from_tool_config(tool_config=tool_config) + + def _get_query_str(self, *args, **kwargs) -> str: + if args is not None and len(args) > 0: + query_str = str(args[0]) + elif kwargs is not None and "input" in kwargs: + # NOTE: this assumes our default function schema of `input` + query_str = kwargs["input"] + elif kwargs is not None and self._resolve_input_errors: + query_str = str(kwargs) + else: + raise ValueError( + "Cannot call query engine without specifying `input` parameter." + ) + return query_str diff --git a/llama-index-core/tests/tools/test_eval_query_engine_tool.py b/llama-index-core/tests/tools/test_eval_query_engine_tool.py new file mode 100644 index 0000000000000..22618f7b5a30e --- /dev/null +++ b/llama-index-core/tests/tools/test_eval_query_engine_tool.py @@ -0,0 +1,81 @@ +"""Test EvalQueryEngine tool.""" +from typing import Optional, Sequence, Any +from unittest import IsolatedAsyncioTestCase +from unittest.mock import AsyncMock + +from llama_index.core.evaluation import EvaluationResult +from llama_index.core.evaluation.base import BaseEvaluator +from llama_index.core.prompts.mixin import PromptDictType +from llama_index.core.query_engine.custom import CustomQueryEngine +from llama_index.core.response import Response +from llama_index.core.tools.eval_query_engine import EvalQueryEngineTool +from llama_index.core.tools.types import ToolOutput + + +class MockEvaluator(BaseEvaluator): + """Mock Evaluator for testing purposes.""" + + def _get_prompts(self) -> PromptDictType: + ... + + def _update_prompts(self, prompts_dict: PromptDictType) -> None: + ... + + async def aevaluate( + self, + query: Optional[str] = None, + response: Optional[str] = None, + contexts: Optional[Sequence[str]] = None, + **kwargs: Any, + ) -> EvaluationResult: + ... + + +class MockQueryEngine(CustomQueryEngine): + """Custom query engine.""" + + def custom_query(self, query_str: str) -> str: + """Query.""" + return "custom_" + query_str + + +class TestEvalQueryEngineTool(IsolatedAsyncioTestCase): + def setUp(self) -> None: + self.mock_evaluator = MockEvaluator() + self.mock_evaluator.aevaluate = AsyncMock() + self.mock_evaluator.aevaluate.return_value = EvaluationResult(passing=True) + + tool_name = "nice_tool" + self.tool_input = "hello world" + self.expected_content = f"custom_{self.tool_input}" + self.expected_tool_output = ToolOutput( + content=self.expected_content, + raw_input={"input": self.tool_input}, + raw_output=Response( + response=self.expected_content, + source_nodes=[], + ), + tool_name=tool_name, + ) + self.eval_query_engine_tool = EvalQueryEngineTool.from_defaults( + MockQueryEngine(), evaluator=self.mock_evaluator, name=tool_name + ) + + def test_eval_query_engine_tool_with_eval_passing(self) -> None: + """Test eval query engine tool with evaluation passing.""" + tool_output = self.eval_query_engine_tool(self.tool_input) + self.assertEqual(self.expected_tool_output, tool_output) + + def test_eval_query_engine_tool_with_eval_failing(self) -> None: + """Test eval query engine tool with evaluation failing.""" + evaluation_feedback = "The context does not provide a relevant answer." + self.mock_evaluator.aevaluate.return_value = EvaluationResult( + passing=False, feedback=evaluation_feedback + ) + self.expected_tool_output.content = ( + "Could not use tool nice_tool because it failed evaluation.\n" + f"Reason: {evaluation_feedback}" + ) + + tool_output = self.eval_query_engine_tool(self.tool_input) + self.assertEqual(self.expected_tool_output, tool_output)