feat(cherry-pick): fixes for 0.3.1 release #3991

ashwinb · 2025-10-30T22:29:42Z

Summary

Cherry-picks 5 critical fixes from main to the release-0.3.x-maint branch for the v0.3.1 release, plus CI workflow updates.

Commits

2c56a85 - fix(context): prevent provider data leak between streaming requests (fix(context): prevent provider data leak between streaming requests #3924)
- CRITICAL SECURITY FIX: Prevents provider credentials from leaking between requests
- Fixed import path for 0.3.0 compatibility
ddd32b1 - fix(inference): enable routing of models with provider_data alone (fix(inference): enable routing of models with provider_data alone #3928)
- Enables routing for fully qualified model IDs with provider_data
- Resolved merge conflicts, adapted for 0.3.0 structure
f7c2973 - fix: Avoid BadRequestError due to invalid max_tokens (fix: Avoid BadRequestError due to invalid max_tokens #3667)
- Fixes failures with Gemini and other providers that reject max_tokens=0
- Non-breaking API change
d7f9da6 - fix(responses): sync conversation before yielding terminal events in streaming (fix(responses): sync conversation before yielding terminal events in streaming #3888)
- Ensures conversation sync executes even when streaming consumers break early
0ffa865 - fix(logging): ensure logs go to stderr, loggers obey levels (fix(logging): ensure logs go to stderr, loggers obey levels #3885)
- Fixes logging infrastructure
90234d6 - ci: support release branches and match client branch (ci: support release branches and match client branch #3990)
- Updates CI workflows to support release-X.Y.x-maint branches
- Matches client branch from llama-stack-client-python for release testing
- Fixes artifact name collisions

Adaptations for 0.3.0

Fixed import paths: llama_stack.core.telemetry.tracing → llama_stack.providers.utils.telemetry.tracing
Fixed import paths: llama_stack.core.telemetry.telemetry → llama_stack.apis.telemetry
Changed self.telemetry_enabled → self.telemetry (0.3.0 attribute name)
Removed rerank() method that doesn't exist in 0.3.0

Testing

All imports verified and tests should pass once CI is set up.

…lamastack#3924) ## Summary - `preserve_contexts_async_generator` left `PROVIDER_DATA_VAR` (and other context vars) populated after a streaming generator completed on HEAD~1, so the asyncio context for request N+1 started with request N's provider payload. - FastAPI dependencies and middleware execute before `request_provider_data_context` rebinds the header data, meaning auth/logging hooks could observe a prior tenant's credentials or treat them as authenticated. Traces and any background work that inspects the context outside the `with` block leak as well—this is a real security regression, not just a CLI artifact. - The wrapper now restores each tracked `ContextVar` to the value it held before the iteration (falling back to clearing when necessary) after every yield and when the generator terminates, so provider data is wiped while callers that set their own defaults keep them. ## Test Plan - `uv run pytest tests/unit/core/test_provider_data_context.py -q` - `uv run pytest tests/unit/distribution/test_context.py -q` Both suites fail on HEAD~1 and pass with this change.

…amastack#3928) This PR enables routing of fully qualified model IDs of the form `provider_id/model_id` even when the models are not registered with the Stack. Here's the situation: assume a remote inference provider which works only when users provide their own API keys via `X-LlamaStack-Provider-Data` header. By definition, we cannot list models and hence update our routing registry. But because we _require_ a provider ID in the models now, we can identify which provider to route to and let that provider decide. Note that we still try to look up our registry since it may have a pre-registered alias. Just that we don't outright fail when we are not able to look it up. Also, updated inference router so that the responses have the _exact_ model that the request had. Added an integration test Closes llamastack#3929 --------- Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>

This patch ensures if max tokens is not defined, then is set to None instead of 0 when calling openai_chat_completion. This way some providers (like gemini) that cannot handle the `max_tokens = 0` will not fail Issue: llamastack#3666

…streaming (llamastack#3888) Move conversation sync logic before yield to ensure it executes even when streaming consumers break early after receiving response.completed event. ## Test Plan ``` OLLAMA_URL=http://localhost:11434 \ pytest -sv tests/integration/responses/ \ --stack-config server:ci-tests \ --text-model ollama/llama3.2:3b-instruct-fp16 \ --inference-mode live \ -k conversation_multi ``` This test now passes.

…ck#3885) Important fix to the logging system

- Update workflows to trigger on release-X.Y.x-maint branches - When PR targets release branch, fetch matching branch from llama-stack-client-python - Falls back to main if matching client branch doesn't exist - Updated workflows: - integration-tests.yml - integration-auth-tests.yml - integration-sql-store-tests.yml - integration-vector-io-tests.yml - unit-tests.yml - backward-compat.yml - pre-commit.yml

## Summary Cherry-picks 5 critical fixes from main to the release-0.3.x branch for the v0.3.1 release, plus CI workflow updates. **Note**: This recreates the cherry-picks from the closed PR #3991, now targeting the renamed `release-0.3.x` branch (previously `release-0.3.x-maint`). ## Commits 1. **2c56a8560** - fix(context): prevent provider data leak between streaming requests (#3924) - **CRITICAL SECURITY FIX**: Prevents provider credentials from leaking between requests - Fixed import path for 0.3.0 compatibility 2. **ddd32b187** - fix(inference): enable routing of models with provider_data alone (#3928) - Enables routing for fully qualified model IDs with provider_data - Resolved merge conflicts, adapted for 0.3.0 structure 3. **f7c2973aa** - fix: Avoid BadRequestError due to invalid max_tokens (#3667) - Fixes failures with Gemini and other providers that reject max_tokens=0 - Non-breaking API change 4. **d7f9da616** - fix(responses): sync conversation before yielding terminal events in streaming (#3888) - Ensures conversation sync executes even when streaming consumers break early 5. **0ffa8658b** - fix(logging): ensure logs go to stderr, loggers obey levels (#3885) - Fixes logging infrastructure 6. **75b49cb3c** - ci: support release branches and match client branch (#3990) - Updates CI workflows to support release-X.Y.x branches - Matches client branch from llama-stack-client-python for release testing - Fixes artifact name collisions ## Adaptations for 0.3.0 - Fixed import paths: `llama_stack.core.telemetry.tracing` → `llama_stack.providers.utils.telemetry.tracing` - Fixed import paths: `llama_stack.core.telemetry.telemetry` → `llama_stack.apis.telemetry` - Changed `self.telemetry_enabled` → `self.telemetry` (0.3.0 attribute name) - Removed `rerank()` method that doesn't exist in 0.3.0 ## Testing All imports verified and tests should pass once CI is set up.

ashwinb and others added 5 commits October 30, 2025 14:14

fix(logging): ensure logs go to stderr, loggers obey levels (llamasta…

0ffa865

…ck#3885) Important fix to the logging system

ashwinb requested review from bbrowning, ehhuang, franciscojavierarceo, hardikjshah, leseb, mattf, raghotham, reluctantfuturist, slekkala1, terrytangyuan and yanxi0830 as code owners October 30, 2025 22:29

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 30, 2025

ashwinb changed the title ~~Cherry-pick critical fixes for v0.3.1 release~~ release-0.3.1: cherry-pick some fixes Oct 30, 2025

ashwinb changed the title ~~release-0.3.1: cherry-pick some fixes~~ release(0.3.1): cherry-pick some fixes Oct 30, 2025

ashwinb changed the title ~~release(0.3.1): cherry-pick some fixes~~ feat(cherry-pick): fixes for 0.3.1 release Oct 30, 2025

ashwinb added 2 commits October 30, 2025 15:33

pre-commit

45d85bd

ehhuang approved these changes Oct 30, 2025

View reviewed changes

package-lock

db129b8

ashwinb deleted the branch llamastack:release-0.3.x-maint October 31, 2025 03:50

ashwinb closed this Oct 31, 2025

ashwinb mentioned this pull request Oct 31, 2025

feat(cherry-pick): fixes for 0.3.1 release #3998

Merged

ashwinb deleted the cherry-picks branch October 31, 2025 03:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(cherry-pick): fixes for 0.3.1 release #3991

feat(cherry-pick): fixes for 0.3.1 release #3991

Uh oh!

ashwinb commented Oct 30, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(cherry-pick): fixes for 0.3.1 release #3991

feat(cherry-pick): fixes for 0.3.1 release #3991

Uh oh!

Conversation

ashwinb commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Commits

Adaptations for 0.3.0

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ashwinb commented Oct 30, 2025 •

edited

Loading