fix: accuracy test fixes #651

nirinchev · 2025-10-14T11:35:36Z

Proposed changes

Fixes a bunch of false-negatives for accuracy tests. It also adds the ability to mark expected tool calls as optional to handle cases where the LLM may or may not call some tool.

Based on #621.

Copilot

Pull Request Overview

This PR fixes false-negatives in accuracy tests by introducing optional tool call support and updating test expectations. The main change is allowing certain tool calls (like atlas-list-projects and list-databases) to be marked as optional, meaning LLMs may or may not invoke them depending on context.

Key changes:

Added optional field to ExpectedToolCall type to mark tool calls that may be skipped by LLMs
Updated scoring logic to handle optional tool calls correctly
Refactored test expectations to reduce duplication and mark exploration tools as optional

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
`tests/accuracy/sdk/accuracyScorer.ts`	Modified scoring logic to properly handle optional expected tool calls instead of returning 0
`tests/accuracy/sdk/accuracyResultStorage/resultStorage.ts`	Added `optional` field to `ExpectedToolCall` type definition
`tests/accuracy/getPerformanceAdvisor.test.ts`	Extracted common tool calls into reusable array and marked them as optional to reduce duplication
`tests/accuracy/find.test.ts`	Marked exploration tool call as optional and relaxed filter matching to allow empty objects
`tests/accuracy/dropCollection.test.ts`	Added optional exploration tool calls that LLMs may invoke before dropping collection
`tests/accuracy/createCollection.test.ts`	Added optional `list-databases` tool call that LLMs may invoke for verification
`scripts/accuracy/generateTestSummary.ts`	Updated UI to wrap optional tool names in parentheses for visual distinction

github-actions · 2025-10-15T13:50:51Z

📊 Accuracy Test Results

📈 Summary

Metric	Value
Commit SHA	`6fb033e01a3554bedffc9c40054c5fbf55c41412`
Run ID	`f7eb78a6-32cc-4df9-95e8-e9270500391b`
Status	done
Total Prompts Evaluated	73
Models Tested	1
Average Accuracy	96.6%
Responses with 0% Accuracy	2
Responses with 75% Accuracy	2
Responses with 100% Accuracy	69

📊 Baseline Comparison

Metric	Value
Baseline Commit	`18fe5495cea00cd3de484077d1e3711ca0a9389e`
Baseline Run ID	`ac33cde0-acca-492c-8a1b-6962a9795686`
Baseline Run Status	`done`
Responses Improved	7
Responses Regressed	2

📎 Download Full HTML Report - Look for the accuracy-test-summary artifact for detailed results.

Report generated on: 10/15/2025, 1:50:47 PM

nirinchev requested a review from a team as a code owner October 14, 2025 11:35

nirinchev added the accuracy-tests label Oct 14, 2025

This comment has been minimized.

Sign in to view

nirinchev force-pushed the ni/accuracy-test-fixes branch from b0079de to 0e14c14 Compare October 14, 2025 11:44

himanshusinghs approved these changes Oct 14, 2025

View reviewed changes

Base automatically changed from ni/create-vector-index to main October 15, 2025 11:45

nirinchev added 2 commits October 15, 2025 14:07

fix accuracy tests

2cb0865

remove import

8ec9c59

Copilot AI review requested due to automatic review settings October 15, 2025 12:08

nirinchev force-pushed the ni/accuracy-test-fixes branch from 0e14c14 to 8ec9c59 Compare October 15, 2025 12:08

Copilot AI reviewed Oct 15, 2025

View reviewed changes

nirinchev added accuracy-tests and removed accuracy-tests labels Oct 15, 2025

This comment has been minimized.

Sign in to view

revert before/afterAll solution for createIndex.test.ts

16121e3

nirinchev added accuracy-tests and removed accuracy-tests labels Oct 15, 2025

This comment has been minimized.

Sign in to view

nirinchev removed the accuracy-tests label Oct 15, 2025

use a date that's way in the past for slow queries

584f02a

nirinchev added the accuracy-tests label Oct 15, 2025

nirinchev merged commit 1cf6f6d into main Oct 15, 2025
17 of 19 checks passed

nirinchev deleted the ni/accuracy-test-fixes branch October 15, 2025 14:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: accuracy test fixes #651

fix: accuracy test fixes #651

Uh oh!

nirinchev commented Oct 14, 2025

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Uh oh!

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Oct 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fix: accuracy test fixes #651

fix: accuracy test fixes #651

Uh oh!

Conversation

nirinchev commented Oct 14, 2025

Proposed changes

Uh oh!

This comment has been minimized.

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

This comment has been minimized.

This comment has been minimized.

github-actions bot commented Oct 15, 2025

📊 Accuracy Test Results

📈 Summary

📊 Baseline Comparison

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants