test: recategorize llm based astream_incremental & add versions with mocks by planetf1 · Pull Request #567 · generative-computing/mellea

planetf1 · 2026-03-03T10:18:44Z

Misc PR

Fixes #562

Type of PR

Bug Fix
New Feature
Documentation
Other

Description

Link to Issue: test: test failed in CI, may be qualitative #562

The tests in test_astream_incremental.py were flaking in CI because they depended on live LLMs streaming responses. Due to the non-deterministic nature of token streaming (chunk arrival times and sizes), exact string matching assertions would fail intermittently.

To address this:

All tests in test/core/test_astream_incremental.py are now marked with @pytest.mark.qualitative. This follows the Mellea convention for tests dependent on varying live LLM interaction, ensuring they are skipped in CI but remain available for local execution during development.
A new test file, test/core/test_astream_mock.py, has been introduced. These tests use an asynchronous mock queue to deterministically test the incremental chunking logic of ModelOutputThunk without relying on a live LLM backend or network timing, guaranteeing they run reliably in all environments including CI.

Testing

Tests added to the respective file if code was changed
New code has 100% coverage if code as added
Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

github-actions · 2026-03-03T10:18:58Z

The PR description has been updated. Please fill out the template for your PR to be reviewed.

mergify · 2026-03-03T10:19:19Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

Fixes generative-computing#562

Introduces `test_astream_mock.py` to test `ModelOutputThunk`'s async queue incremental streaming logic deterministically without relying on highly-variable LLM backends.

planetf1 · 2026-03-03T10:50:09Z

My results (macOS):

> uv run pytest test/core/test_astream_mock.py -v
================== test session starts ===================
platform darwin -- Python 3.12.8, pytest-9.0.0, pluggy-1.6.0 -- /Users/jonesn/src/mellea-review/.venv/bin/python3
...
collected 6 items                                        
test/core/test_astream_mock.py::test_astream_returns_incremental_chunks PASSED [ 16%]
test/core/test_astream_mock.py::test_astream_multiple_calls_accumulate_correctly PASSED [ 33%]
test/core/test_astream_mock.py::test_astream_beginning_length_tracking PASSED [ 50%]
test/core/test_astream_mock.py::test_astream_empty_beginning PASSED [ 66%]
test/core/test_astream_mock.py::test_astream_computed_returns_full_value PASSED [ 83%]
test/core/test_astream_mock.py::test_astream_final_call_returns_full_value PASSED [100%]
====================== 6 passed in 3.81s =======================

jakelorocco

I like the addition of the mock tests. But can you please explain:

The tests in test_astream_incremental.py were flaking in CI because they depended on live LLMs streaming responses. Due to the non-deterministic nature of token streaming (chunk arrival times and sizes), exact string matching assertions would fail intermittently.

It looks like all the tests just match the values of chunks. I think these should be exact matches no matter the conditions since we aren't relying on the LLM to reproduce output; we are relying on us saving that non-deterministic output correctly?

test: isolate astream_incremental tests from CI

9c78504

Fixes generative-computing#562

planetf1 force-pushed the fix/issue-562-mark-astream-qualitative branch from 9d3ddf3 to 9c78504 Compare March 3, 2026 10:24

planetf1 marked this pull request as ready for review March 3, 2026 10:24

planetf1 requested a review from a team as a code owner March 3, 2026 10:24

test: add deterministic mock tests for astream incremental logic

79e88a4

Introduces `test_astream_mock.py` to test `ModelOutputThunk`'s async queue incremental streaming logic deterministically without relying on highly-variable LLM backends.

planetf1 force-pushed the fix/issue-562-mark-astream-qualitative branch from 2bd3aa6 to 79e88a4 Compare March 3, 2026 10:42

planetf1 changed the title ~~test: isolate astream_incremental tests from CI~~ test: recategorize llm based astream_incremental & add versions with mocks Mar 3, 2026

planetf1 requested review from ajbozarth and psschwei March 3, 2026 12:14

jakelorocco reviewed Mar 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test: recategorize llm based astream_incremental & add versions with mocks#567

test: recategorize llm based astream_incremental & add versions with mocks#567
planetf1 wants to merge 2 commits intogenerative-computing:mainfrom
planetf1:fix/issue-562-mark-astream-qualitative

planetf1 commented Mar 3, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

mergify bot commented Mar 3, 2026

Uh oh!

planetf1 commented Mar 3, 2026

Uh oh!

jakelorocco left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

planetf1 commented Mar 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Misc PR

Type of PR

Description

Testing

Uh oh!

github-actions bot commented Mar 3, 2026

Uh oh!

mergify bot commented Mar 3, 2026

Merge Protections

🟢 Enforce conventional commit

Uh oh!

planetf1 commented Mar 3, 2026

Uh oh!

jakelorocco left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

planetf1 commented Mar 3, 2026 •

edited

Loading

jakelorocco left a comment •

edited

Loading