test: recategorize llm based astream_incremental & add versions with mocks#567
test: recategorize llm based astream_incremental & add versions with mocks#567planetf1 wants to merge 2 commits intogenerative-computing:mainfrom
Conversation
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
9d3ddf3 to
9c78504
Compare
Introduces `test_astream_mock.py` to test `ModelOutputThunk`'s async queue incremental streaming logic deterministically without relying on highly-variable LLM backends.
2bd3aa6 to
79e88a4
Compare
|
My results (macOS): |
There was a problem hiding this comment.
I like the addition of the mock tests. But can you please explain:
The tests in test_astream_incremental.py were flaking in CI because they depended on live LLMs streaming responses. Due to the non-deterministic nature of token streaming (chunk arrival times and sizes), exact string matching assertions would fail intermittently.
It looks like all the tests just match the values of chunks. I think these should be exact matches no matter the conditions since we aren't relying on the LLM to reproduce output; we are relying on us saving that non-deterministic output correctly?
Misc PR
Fixes #562
Type of PR
Description
The tests in
test_astream_incremental.pywere flaking in CI because they depended on live LLMs streaming responses. Due to the non-deterministic nature of token streaming (chunk arrival times and sizes), exact string matching assertions would fail intermittently.To address this:
test/core/test_astream_incremental.pyare now marked with@pytest.mark.qualitative. This follows the Mellea convention for tests dependent on varying live LLM interaction, ensuring they are skipped in CI but remain available for local execution during development.test/core/test_astream_mock.py, has been introduced. These tests use an asynchronous mock queue to deterministically test the incremental chunking logic ofModelOutputThunkwithout relying on a live LLM backend or network timing, guaranteeing they run reliably in all environments including CI.Testing