Skip to content

fix: test_hallucination_detection missing qualitative marker and tight tolerance #809

@planetf1

Description

@planetf1

Problem

test_hallucination_detection in test/stdlib/components/intrinsic/test_rag.py:152 has two issues:

  1. Missing @pytest.mark.qualitative — every other LLM-output-quality test in the file is marked qualitative, but this one isn't. It runs in fast test loops where it shouldn't.

  2. Tolerance too tight — asserts pytest.approx(r, abs=3e-2) on a generative model score. Observed drift of 0.036 causes spurious failures (reported in fix: evict Ollama models between test modules to prevent memory starvation #804).

Fix

  • Add @pytest.mark.qualitative decorator
  • Widen tolerance from abs=3e-2 to abs=5e-2

Related

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingtesting

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions