Skip to content

fix: issues with tests (alora example, rag intrinsics, mistral tool use, vllm auto-skip)#570

Merged
jakelorocco merged 3 commits intomainfrom
jal/test-fixes
Mar 4, 2026
Merged

fix: issues with tests (alora example, rag intrinsics, mistral tool use, vllm auto-skip)#570
jakelorocco merged 3 commits intomainfrom
jal/test-fixes

Conversation

@jakelorocco
Copy link
Contributor

@jakelorocco jakelorocco commented Mar 3, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

A few issues with tests that I stumbled on. These were errors in our test code not the mellea code the tests were testing.

  • test/backends/test_huggingface_tools.py- we are using a mistral model that requires sentencepiece package installed -> fixed in pyproject.toml

  • test/stdlib/components/intrinsic/test_rag.py- changes to the adapters for citations / hallucination detection resulted in slightly different values -> fixed the expected data

  • docs/examples/aLora/102_example.py- expected input-> fixed by skipping this example and unskipping the 101_example.py that tests the same functionality.

  • test/backends/test_openai_vllm.py -> exceptions raised during vllm setup were causing the test to error out instead of be skipped

Test passes

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

@jakelorocco jakelorocco requested a review from a team as a code owner March 3, 2026 17:30
@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@mergify
Copy link

mergify bot commented Mar 3, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@jakelorocco jakelorocco force-pushed the jal/test-fixes branch 2 times, most recently from 8795893 to 8ed36c8 Compare March 3, 2026 17:41
@jakelorocco jakelorocco requested a review from planetf1 March 3, 2026 17:41
@jakelorocco jakelorocco changed the title fix: issues with tests (alora example, rag intrinsics, mistral tool use) fix: issues with tests (alora example, rag intrinsics, mistral tool use, vllm auto-skip) Mar 3, 2026
@psschwei
Copy link
Member

psschwei commented Mar 4, 2026

Two of the test_rag tests failed for me:

=================================================================== FAILURES ====================================================================
________________________________________________________________ test_citations _________________________________________________________________

backend = <mellea.backends.huggingface.LocalHFBackend object at 0x351d82480>

    @pytest.mark.qualitative
    def test_citations(backend):
        """Verify that the citations intrinsic functions properly."""
        context, assistant_response, docs = _read_input_json("citations.json")
        expected = _read_output_json("citations.json")

        # First call triggers adapter loading
        result = rag.find_citations(assistant_response, docs, context, backend)
>       assert result == expected
E       assert [{'citation_b...nion. ", ...}] == [{'citation_b...nion. ", ...}]
E
E         At index 0 diff: {'response_begin': 0, 'response_end': 96, 'response_text': 'Murdoch expanded in Australia and New Zealand by acquiring and expanding local newspapers. ', 'citation_doc_id': '0', 'citation_begin'
E
E         ...Full output truncated (2 lines hidden), use '-vv' to show

test/stdlib/components/intrinsic/test_rag.py:130: AssertionError
------------------------------------------------------------- Captured stdout call --------------------------------------------------------------
=== 14:46:06-INFO ======
passing in model options when generating with an adapter; some model options may be overwritten / ignored
------------------------------------------------------------- Captured stderr call --------------------------------------------------------------
Fetching 1 files: 100%|██████████| 1/1 [00:00<00:00, 32263.88it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 190650.18it/s]
--------------------------------------------------------------- Captured log call ---------------------------------------------------------------
INFO     fancy_logger:huggingface.py:428 passing in model options when generating with an adapter; some model options may be overwritten / ignored
_________________________________________________________ test_hallucination_detection __________________________________________________________

backend = <mellea.backends.huggingface.LocalHFBackend object at 0x351d82480>

    @pytest.mark.qualitative
    def test_hallucination_detection(backend):
        """Verify that the hallucination detection intrinsic functions properly."""
        context, assistant_response, docs = _read_input_json("hallucination_detection.json")
        expected = _read_output_json("hallucination_detection.json")

        # First call triggers adapter loading
        result = rag.flag_hallucinated_content(assistant_response, docs, context, backend)
        # pytest.approx() chokes on lists of records, so we do this complicated dance.
        for r, e in zip(result, expected, strict=True):  # type: ignore
>           assert pytest.approx(r, abs=2e-2) == e
E           AssertionError: assert approx({'resp...he sentence.}) == {'explanation...end': 31, ...}
E
E             comparison failed. Mismatched elements: 1 / 5:
E             Max absolute difference: 5
E             Max relative difference: 0.1388888888888889
E             Index        | Obtained | Expected
E             response_end | 31       | 36 ± 0.02

test/stdlib/components/intrinsic/test_rag.py:164: AssertionError
------------------------------------------------------------- Captured stdout call --------------------------------------------------------------
=== 14:46:15-INFO ======
passing in model options when generating with an adapter; some model options may be overwritten / ignored
------------------------------------------------------------- Captured stderr call --------------------------------------------------------------
Fetching 1 files: 100%|██████████| 1/1 [00:00<00:00, 26886.56it/s]
Fetching 9 files: 100%|██████████| 9/9 [00:00<00:00, 147456.00it/s]
--------------------------------------------------------------- Captured log call ---------------------------------------------------------------
INFO     fancy_logger:huggingface.py:428 passing in model options when generating with an adapter; some model options may be overwritten / ignored

@psschwei
Copy link
Member

psschwei commented Mar 4, 2026

The other three all passed (or were skipped) successfully

@jakelorocco
Copy link
Contributor Author

Two of the test_rag tests failed for me:
...

@psschwei, can you please clarify. Did these tests fail when running against this branch and with packages updated?

@psschwei
Copy link
Member

psschwei commented Mar 4, 2026

@psschwei, can you please clarify. Did these tests fail when running against this branch and with packages updated?

Yes, against this branch with a fresh venv (checkout branch as new worktree and uv sync --all-groups --all-extras in worktree dir)

@psschwei
Copy link
Member

psschwei commented Mar 4, 2026

though I think your force push came after I checked out, let me retry

@psschwei
Copy link
Member

psschwei commented Mar 4, 2026

more failures now (though all seem to be related to modules not found after the granite-common merge)

@jakelorocco
Copy link
Contributor Author

Updated the commit to fix the pyproject packages and tests pass for me locally on mac and on linux with a clean environment.

Copy link
Member

@psschwei psschwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests all pass for me now too

@jakelorocco jakelorocco merged commit 4cc75c8 into main Mar 4, 2026
5 checks passed
@jakelorocco jakelorocco deleted the jal/test-fixes branch March 4, 2026 21:25
planetf1 pushed a commit to planetf1/mellea that referenced this pull request Mar 6, 2026
…se, vllm auto-skip) (generative-computing#570)

* fix: issues with tests (alora example, rag intrinsics, mistral tool use)

* fix: uv lock update after pyproject changes
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants