-
Notifications
You must be signed in to change notification settings - Fork 139
fix: accuracy test fixes #651
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This comment has been minimized.
This comment has been minimized.
b0079de
to
0e14c14
Compare
0e14c14
to
8ec9c59
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes false-negatives in accuracy tests by introducing optional tool call support and updating test expectations. The main change is allowing certain tool calls (like atlas-list-projects
and list-databases
) to be marked as optional, meaning LLMs may or may not invoke them depending on context.
Key changes:
- Added
optional
field toExpectedToolCall
type to mark tool calls that may be skipped by LLMs - Updated scoring logic to handle optional tool calls correctly
- Refactored test expectations to reduce duplication and mark exploration tools as optional
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
tests/accuracy/sdk/accuracyScorer.ts |
Modified scoring logic to properly handle optional expected tool calls instead of returning 0 |
tests/accuracy/sdk/accuracyResultStorage/resultStorage.ts |
Added optional field to ExpectedToolCall type definition |
tests/accuracy/getPerformanceAdvisor.test.ts |
Extracted common tool calls into reusable array and marked them as optional to reduce duplication |
tests/accuracy/find.test.ts |
Marked exploration tool call as optional and relaxed filter matching to allow empty objects |
tests/accuracy/dropCollection.test.ts |
Added optional exploration tool calls that LLMs may invoke before dropping collection |
tests/accuracy/createCollection.test.ts |
Added optional list-databases tool call that LLMs may invoke for verification |
scripts/accuracy/generateTestSummary.ts |
Updated UI to wrap optional tool names in parentheses for visual distinction |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
📊 Accuracy Test Results📈 Summary
📊 Baseline Comparison
📎 Download Full HTML Report - Look for the Report generated on: 10/15/2025, 1:50:47 PM |
Proposed changes
Fixes a bunch of false-negatives for accuracy tests. It also adds the ability to mark expected tool calls as optional to handle cases where the LLM may or may not call some tool.
Based on #621.