Skip to content

test: recategorize llm based astream_incremental & add versions with mocks#567

Open
planetf1 wants to merge 2 commits intogenerative-computing:mainfrom
planetf1:fix/issue-562-mark-astream-qualitative
Open

test: recategorize llm based astream_incremental & add versions with mocks#567
planetf1 wants to merge 2 commits intogenerative-computing:mainfrom
planetf1:fix/issue-562-mark-astream-qualitative

Conversation

@planetf1
Copy link
Contributor

@planetf1 planetf1 commented Mar 3, 2026

Misc PR

Fixes #562

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

The tests in test_astream_incremental.py were flaking in CI because they depended on live LLMs streaming responses. Due to the non-deterministic nature of token streaming (chunk arrival times and sizes), exact string matching assertions would fail intermittently.

To address this:

  1. All tests in test/core/test_astream_incremental.py are now marked with @pytest.mark.qualitative. This follows the Mellea convention for tests dependent on varying live LLM interaction, ensuring they are skipped in CI but remain available for local execution during development.
  2. A new test file, test/core/test_astream_mock.py, has been introduced. These tests use an asynchronous mock queue to deterministically test the incremental chunking logic of ModelOutputThunk without relying on a live LLM backend or network timing, guaranteeing they run reliably in all environments including CI.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

@github-actions
Copy link
Contributor

github-actions bot commented Mar 3, 2026

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@mergify
Copy link

mergify bot commented Mar 3, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

@planetf1 planetf1 force-pushed the fix/issue-562-mark-astream-qualitative branch from 9d3ddf3 to 9c78504 Compare March 3, 2026 10:24
@planetf1 planetf1 marked this pull request as ready for review March 3, 2026 10:24
@planetf1 planetf1 requested a review from a team as a code owner March 3, 2026 10:24
Introduces `test_astream_mock.py` to test `ModelOutputThunk`'s async queue incremental streaming logic deterministically without relying on highly-variable LLM backends.
@planetf1 planetf1 force-pushed the fix/issue-562-mark-astream-qualitative branch from 2bd3aa6 to 79e88a4 Compare March 3, 2026 10:42
@planetf1
Copy link
Contributor Author

planetf1 commented Mar 3, 2026

My results (macOS):

> uv run pytest test/core/test_astream_mock.py -v
================== test session starts ===================
platform darwin -- Python 3.12.8, pytest-9.0.0, pluggy-1.6.0 -- /Users/jonesn/src/mellea-review/.venv/bin/python3
...
collected 6 items                                        
test/core/test_astream_mock.py::test_astream_returns_incremental_chunks PASSED [ 16%]
test/core/test_astream_mock.py::test_astream_multiple_calls_accumulate_correctly PASSED [ 33%]
test/core/test_astream_mock.py::test_astream_beginning_length_tracking PASSED [ 50%]
test/core/test_astream_mock.py::test_astream_empty_beginning PASSED [ 66%]
test/core/test_astream_mock.py::test_astream_computed_returns_full_value PASSED [ 83%]
test/core/test_astream_mock.py::test_astream_final_call_returns_full_value PASSED [100%]
====================== 6 passed in 3.81s =======================

@planetf1 planetf1 changed the title test: isolate astream_incremental tests from CI test: recategorize llm based astream_incremental & add versions with mocks Mar 3, 2026
@planetf1 planetf1 requested review from ajbozarth and psschwei March 3, 2026 12:14
Copy link
Contributor

@jakelorocco jakelorocco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the addition of the mock tests. But can you please explain:

The tests in test_astream_incremental.py were flaking in CI because they depended on live LLMs streaming responses. Due to the non-deterministic nature of token streaming (chunk arrival times and sizes), exact string matching assertions would fail intermittently.

It looks like all the tests just match the values of chunks. I think these should be exact matches no matter the conditions since we aren't relying on the LLM to reproduce output; we are relying on us saving that non-deterministic output correctly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test: test failed in CI, may be qualitative

2 participants