New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement EvalQueryEngineTool #11679
Implement EvalQueryEngineTool #11679
Conversation
433e075
to
8f054ad
Compare
@d-mariano Yea I think I like this approach. If this was combined with an agent that had the chance to "retry" tool calls, it would be pretty powerful (and also helps avoid tracebacks when calling a tool) |
@logan-markewich great! And totally agree, that's the idea down the road. I would like to test this with the simple ReAct implementation. Then, after that, test it with more complex use-cases like within a query pipeline. Also curious what implications this has on a QueryPlan, but that doesn't need to be addressed today. Okay, great. So with this, I can implement some UTs, of course...should I introduce this in the docs with a notebook example as well? <3 |
@d-mariano yea may as well make this PR a little more complete 💪 |
8f054ad
to
e3428fb
Compare
@logan-markewich I've made a few updates so far:
I would like to update the I need to get at some other stuff so I'll have this open a bit longer. |
@d-mariano yea, there's a There should be some existing evaluation sections you can maybe hook into? Or maybe the section on tools would be better |
e3428fb
to
2ccd6ca
Compare
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
0f82f98
to
dd606b6
Compare
Implement the EvalQueryEngineTool: * Inherits from QueryEngineTool * Uses a given evaluator to evaluate query engine responses * If the response fails evaluation: * The tool output is manipulated to deter the LLM from using it * The evaluator feedback is used as a reason for failure
dd606b6
to
54e8b25
Compare
* Simplify eval_query_engine_tool imports and avoid circular dependencies
* Fix EvalQueryEngineTool notebook by increasing similarity_top_k to 5 * The above resulted in the lyft query engine to return a response that passed evaluation
Description
Notice
I would like input on this PR from the
llama-index
team. If the team agrees with the need and approach, I will provide unit tests, documentation updates, and Google Colab notebooks as required.Summary
The reason for this change is to provide a plug-and-play method of using on-the-fly evaluation for tools within an agent. Some areas I would like feedback on:
llama-index
visionIf the team decides this is a good approach, I will provide unit tests, documentation updates, and Google Colab notebooks as required.
Implements the
EvalQueryEngineTool
:QueryEngineTool
Type of Change
How Has This Been Tested?
Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration
Suggested Checklist:
make format; make lint
to appease the lint gods