Skip to content

Conversation

@dcramer
Copy link
Member

@dcramer dcramer commented Jul 22, 2025

Summary

  • Round confidence percentages to remove repeating decimals in evaluation output
  • Remove flaky LLM-dependent test assertions that check for specific keywords
  • Skip unreliable tests that depend on LLM detecting subtle patterns

Changes

  1. Evaluation Runner: Added Math.round() to confidence percentage display
  2. Test Suite Improvements:
    • Replaced regex/keyword assertions with confidence threshold checks
    • Skipped 3 tests that rely on LLM detecting subtle patterns (verbose naming, systematic refactoring, multi-step solutions)
    • These subtle patterns are better validated through our evaluation system which measures real-world accuracy

🤖 Generated with Claude Code

dcramer and others added 3 commits July 21, 2025 18:08
Remove repeating decimals by rounding confidence values to integers.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Remove flaky regex assertion that depends on LLM response content.
The test now focuses on verifying AI detection rather than specific
reasoning keywords.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
Replace unreliable regex/keyword assertions with confidence checks.
Skip tests that depend on LLM detecting subtle patterns since we have
the evaluation system to measure actual detection accuracy.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>
@dcramer dcramer changed the title fix: round confidence percentages in evaluation output fix: round confidence percentages and remove flaky tests Jul 22, 2025
@dcramer dcramer merged commit c8e903d into main Jul 22, 2025
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants