Skip to content

feat: streaming support in m serve OpenAI API server#823

Merged
markstur merged 23 commits intogenerative-computing:mainfrom
markstur:issue_822
Apr 17, 2026
Merged

feat: streaming support in m serve OpenAI API server#823
markstur merged 23 commits intogenerative-computing:mainfrom
markstur:issue_822

Conversation

@markstur
Copy link
Copy Markdown
Contributor

@markstur markstur commented Apr 11, 2026

Misc PR

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

Add OpenAI API compatible support for streaming in m serve app.

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code as added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

@markstur markstur requested a review from a team as a code owner April 11, 2026 01:41
@github-actions github-actions bot added the enhancement New feature or request label Apr 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

The PR description has been updated. Please fill out the template for your PR to be reviewed.

@psschwei psschwei requested review from psschwei and removed request for akihikokuroda April 11, 2026 02:06
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some blocking findings

Comment thread mellea/helpers/openai_compatible_helpers.py Outdated
Comment thread mellea/helpers/openai_compatible_helpers.py Outdated
Comment thread mellea/helpers/openai_compatible_helpers.py Outdated
Comment thread cli/serve/app.py
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noticed a few things to tighten up.

Minor, but looking at the tests - The TestStreamingEndpoint tests (e.g. line 166) are marked @pytest.mark.asyncio and declared async def but contain no await — TestClient is synchronous and doesn't need the marker. TestStreamingHelpers is fine.

Suspect it will still work but it implies the wrong behaviour?

Comment thread mellea/helpers/openai_compatible_helpers.py Outdated
Comment thread cli/serve/models.py Outdated
Comment thread docs/examples/m_serve/README.md
Comment thread docs/examples/m_serve/m_serve_example_streaming.py Outdated
Comment thread docs/examples/m_serve/README.md
Comment thread mellea/helpers/openai_compatible_helpers.py Outdated
Comment thread mellea/helpers/openai_compatible_helpers.py Outdated
@markstur
Copy link
Copy Markdown
Contributor Author

noticed a few things to tighten up.

Minor, but looking at the tests - The TestStreamingEndpoint tests (e.g. line 166) are marked @pytest.mark.asyncio and declared async def but contain no await — TestClient is synchronous and doesn't need the marker. TestStreamingHelpers is fine.

Suspect it will still work but it implies the wrong behaviour?

fixed

@markstur
Copy link
Copy Markdown
Contributor Author

thanks for the review! Addressed all the comments. See replies if there was some question, but I think they are covered.

@markstur markstur requested a review from planetf1 April 15, 2026 06:18
@markstur markstur enabled auto-merge April 15, 2026 06:18
Copy link
Copy Markdown
Member

@psschwei psschwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

couple of things flagged by claude

Comment thread cli/serve/streaming.py Outdated
Comment thread mellea/helpers/openai_compatible_helpers.py Outdated
Comment thread cli/serve/models.py Outdated
Comment thread cli/serve/streaming.py Outdated
@markstur markstur requested a review from psschwei April 15, 2026 17:59
Copy link
Copy Markdown
Member

@psschwei psschwei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
Thanks @markstur !
I think since @planetf1 had requested changes, he'll also need to approve (?)

Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR — streaming support is well-structured and the test coverage is thorough. Two things worth addressing before merge:

Comment thread cli/serve/streaming.py Outdated
Comment thread cli/serve/streaming.py
Comment thread cli/serve/models.py
@planetf1
Copy link
Copy Markdown
Contributor

Also opened #873 as I noted example is importing from the cli package, which I would suggest is inappropriate - it's not new to this PR though.

@markstur
Copy link
Copy Markdown
Contributor Author

Thanks for the review! Even the little things are really good for me to look closer at.
Addressed all the comments and rebased.

@markstur markstur requested a review from planetf1 April 16, 2026 18:30
@markstur markstur disabled auto-merge April 16, 2026 19:11
@markstur
Copy link
Copy Markdown
Contributor Author

I guess now that another PR merged, I have to get used to saying Assisted-by: IBM Bob
I guess I can do that on squash/merge.

markstur added 23 commits April 16, 2026 16:17
Fixes: generative-computing#822

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Added streaming support w/ setting system_fingerprint. Make it consistent.

We are currently just setting it to None but now it is consistent for
future use.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Doesn't belong in library. This is for cli.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
astream() returns deltas (only new fragments), not accumulated text.
Update docstring.
Fix unused previous_length in streaming.py. Rename vars for clarity.
Fix streaming tests to be consistent with the non-accumulating behavoir.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Adds missing yield of the [DONE] that clients will expect.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Since stream defaults to False, a regression was introduced where
stream=False is now passed to backends where it used to be default.
Fix is to only forward stream=True and not add stream=False.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Stuff was marked asyncio and async def when it didn't need to be.
Moved the usage tests a little for readability.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
In m serve, usage was included for backward compatibility but this
is a new feature so that's not an issue. Instead the OpenAI spec is
what we want to follow.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Use content=None instead of content="" to be more correct with OpenAI API.
Remove unneeded check for "" in the test.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Test-first for...
Bug where we get into a loop without checking for this first.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Fix bug where we get into loop despite already having computed chunk.

Update test which was written to test existing bug -- use wording that
makes it current and not referencing the non-existent bug. Make the
asserts more precise with expected chunks.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
…vior

The default is now to only include usage when it is asked for when streaming.
This is consistent with the OpenAI API spec.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Following OpenAI API spec and better for validation.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Some validation tests including how coercion from 1 or "true" to True
is handled.

Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Signed-off-by: Mark Sturdevant <mark.sturdevant@ibm.com>
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good! Thanks for addressing (and followup)

@markstur markstur added this pull request to the merge queue Apr 17, 2026
Merged via the queue into generative-computing:main with commit ec8cc18 Apr 17, 2026
4 checks passed
@markstur markstur deleted the issue_822 branch April 17, 2026 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

m serve OpenAI API streaming support

4 participants