Skip to content

fix(test): widen hallucination detection tolerance (#809)#810

Merged
planetf1 merged 1 commit intogenerative-computing:mainfrom
planetf1:fix/qualitative-marker-809
Apr 10, 2026
Merged

fix(test): widen hallucination detection tolerance (#809)#810
planetf1 merged 1 commit intogenerative-computing:mainfrom
planetf1:fix/qualitative-marker-809

Conversation

@planetf1
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 commented Apr 9, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Widen pytest.approx tolerance from abs=3e-2abs=5e-2 in test_hallucination_detection. Scores are logprob-derived weighted expected values — inference non-determinism causes drift of ~0.036, breaching the old threshold while the categorical signal remains stable.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

^ on tests this now seems more consistent with the looser tolerance

…ing#809)

Logprob-derived scores drift ~0.036 across runs due to inference
non-determinism. Widen from abs=3e-2 to abs=5e-2 to absorb jitter
while still catching real regressions.
@github-actions github-actions Bot added the bug Something isn't working label Apr 9, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 9, 2026

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@planetf1 planetf1 marked this pull request as ready for review April 9, 2026 23:09
@planetf1 planetf1 requested a review from a team as a code owner April 9, 2026 23:09
@planetf1 planetf1 enabled auto-merge April 10, 2026 06:41
Copy link
Copy Markdown
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm; I had previously widened it but I guess there's more variability than realized

@planetf1 planetf1 added this pull request to the merge queue Apr 10, 2026
Merged via the queue into generative-computing:main with commit e0ffd3d Apr 10, 2026
10 of 11 checks passed
@planetf1 planetf1 deleted the fix/qualitative-marker-809 branch April 10, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: test_hallucination_detection missing qualitative marker and tight tolerance

2 participants