Skip to content

Conversation

@ashwinb
Copy link
Contributor

@ashwinb ashwinb commented Oct 30, 2025

Summary

Cherry-picks 5 critical fixes from main to the release-0.3.x-maint branch for the v0.3.1 release, plus CI workflow updates.

Commits

  1. 2c56a85 - fix(context): prevent provider data leak between streaming requests (fix(context): prevent provider data leak between streaming requests #3924)

    • CRITICAL SECURITY FIX: Prevents provider credentials from leaking between requests
    • Fixed import path for 0.3.0 compatibility
  2. ddd32b1 - fix(inference): enable routing of models with provider_data alone (fix(inference): enable routing of models with provider_data alone #3928)

    • Enables routing for fully qualified model IDs with provider_data
    • Resolved merge conflicts, adapted for 0.3.0 structure
  3. f7c2973 - fix: Avoid BadRequestError due to invalid max_tokens (fix: Avoid BadRequestError due to invalid max_tokens #3667)

    • Fixes failures with Gemini and other providers that reject max_tokens=0
    • Non-breaking API change
  4. d7f9da6 - fix(responses): sync conversation before yielding terminal events in streaming (fix(responses): sync conversation before yielding terminal events in streaming #3888)

    • Ensures conversation sync executes even when streaming consumers break early
  5. 0ffa865 - fix(logging): ensure logs go to stderr, loggers obey levels (fix(logging): ensure logs go to stderr, loggers obey levels #3885)

    • Fixes logging infrastructure
  6. 90234d6 - ci: support release branches and match client branch (ci: support release branches and match client branch #3990)

    • Updates CI workflows to support release-X.Y.x-maint branches
    • Matches client branch from llama-stack-client-python for release testing
    • Fixes artifact name collisions

Adaptations for 0.3.0

  • Fixed import paths: llama_stack.core.telemetry.tracingllama_stack.providers.utils.telemetry.tracing
  • Fixed import paths: llama_stack.core.telemetry.telemetryllama_stack.apis.telemetry
  • Changed self.telemetry_enabledself.telemetry (0.3.0 attribute name)
  • Removed rerank() method that doesn't exist in 0.3.0

Testing

All imports verified and tests should pass once CI is set up.

ashwinb and others added 5 commits October 30, 2025 14:14
…lamastack#3924)

## Summary

- `preserve_contexts_async_generator` left `PROVIDER_DATA_VAR` (and
other context vars) populated after a streaming generator completed on
HEAD~1, so the asyncio context for request N+1 started with request N's
provider payload.
- FastAPI dependencies and middleware execute before
`request_provider_data_context` rebinds the header data, meaning
auth/logging hooks could observe a prior tenant's credentials or treat
them as authenticated. Traces and any background work that inspects the
context outside the `with` block leak as well—this is a real security
regression, not just a CLI artifact.
- The wrapper now restores each tracked `ContextVar` to the value it
held before the iteration (falling back to clearing when necessary)
after every yield and when the generator terminates, so provider data is
wiped while callers that set their own defaults keep them.

## Test Plan

- `uv run pytest tests/unit/core/test_provider_data_context.py -q`
- `uv run pytest tests/unit/distribution/test_context.py -q`

Both suites fail on HEAD~1 and pass with this change.
…amastack#3928)

This PR enables routing of fully qualified model IDs of the form
`provider_id/model_id` even when the models are not registered with the
Stack.

Here's the situation: assume a remote inference provider which works
only when users provide their own API keys via
`X-LlamaStack-Provider-Data` header. By definition, we cannot list
models and hence update our routing registry. But because we _require_ a
provider ID in the models now, we can identify which provider to route
to and let that provider decide.

Note that we still try to look up our registry since it may have a
pre-registered alias. Just that we don't outright fail when we are not
able to look it up.

Also, updated inference router so that the responses have the _exact_
model that the request had.

Added an integration test

Closes llamastack#3929

---------

Co-authored-by: ehhuang <ehhuang@users.noreply.github.com>
This patch ensures if max tokens is not defined, then is set to None
instead of 0 when calling openai_chat_completion. This way some
providers (like gemini) that cannot handle the `max_tokens = 0` will not
fail

Issue: llamastack#3666
…streaming (llamastack#3888)

Move conversation sync logic before yield to ensure it executes even
when
streaming consumers break early after receiving response.completed
event.

## Test Plan

```
OLLAMA_URL=http://localhost:11434 \
  pytest -sv tests/integration/responses/ \
  --stack-config server:ci-tests \
  --text-model ollama/llama3.2:3b-instruct-fp16 \
  --inference-mode live \
  -k conversation_multi
```

This test now passes.
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Oct 30, 2025
@ashwinb ashwinb changed the title Cherry-pick critical fixes for v0.3.1 release release-0.3.1: cherry-pick some fixes Oct 30, 2025
@ashwinb ashwinb changed the title release-0.3.1: cherry-pick some fixes release(0.3.1): cherry-pick some fixes Oct 30, 2025
@ashwinb ashwinb changed the title release(0.3.1): cherry-pick some fixes feat(cherry-pick): fixes for 0.3.1 release Oct 30, 2025
- Update workflows to trigger on release-X.Y.x-maint branches
- When PR targets release branch, fetch matching branch from
llama-stack-client-python
- Falls back to main if matching client branch doesn't exist
- Updated workflows:
  - integration-tests.yml
  - integration-auth-tests.yml
  - integration-sql-store-tests.yml
  - integration-vector-io-tests.yml
  - unit-tests.yml
  - backward-compat.yml
  - pre-commit.yml
@ashwinb ashwinb deleted the branch llamastack:release-0.3.x-maint October 31, 2025 03:50
@ashwinb ashwinb closed this Oct 31, 2025
@ashwinb ashwinb deleted the cherry-picks branch October 31, 2025 03:59
ashwinb added a commit that referenced this pull request Oct 31, 2025
## Summary

Cherry-picks 5 critical fixes from main to the release-0.3.x branch for
the v0.3.1 release, plus CI workflow updates.

**Note**: This recreates the cherry-picks from the closed PR #3991, now
targeting the renamed `release-0.3.x` branch (previously
`release-0.3.x-maint`).

## Commits

1. **2c56a8560** - fix(context): prevent provider data leak between
streaming requests (#3924)
- **CRITICAL SECURITY FIX**: Prevents provider credentials from leaking
between requests
   - Fixed import path for 0.3.0 compatibility

2. **ddd32b187** - fix(inference): enable routing of models with
provider_data alone (#3928)
   - Enables routing for fully qualified model IDs with provider_data
   - Resolved merge conflicts, adapted for 0.3.0 structure

3. **f7c2973aa** - fix: Avoid BadRequestError due to invalid max_tokens
(#3667)
- Fixes failures with Gemini and other providers that reject
max_tokens=0
   - Non-breaking API change

4. **d7f9da616** - fix(responses): sync conversation before yielding
terminal events in streaming (#3888)
- Ensures conversation sync executes even when streaming consumers break
early

5. **0ffa8658b** - fix(logging): ensure logs go to stderr, loggers obey
levels (#3885)
   - Fixes logging infrastructure

6. **75b49cb3c** - ci: support release branches and match client branch
(#3990)
   - Updates CI workflows to support release-X.Y.x branches
- Matches client branch from llama-stack-client-python for release
testing
   - Fixes artifact name collisions

## Adaptations for 0.3.0

- Fixed import paths: `llama_stack.core.telemetry.tracing` →
`llama_stack.providers.utils.telemetry.tracing`
- Fixed import paths: `llama_stack.core.telemetry.telemetry` →
`llama_stack.apis.telemetry`
- Changed `self.telemetry_enabled` → `self.telemetry` (0.3.0 attribute
name)
- Removed `rerank()` method that doesn't exist in 0.3.0

## Testing

All imports verified and tests should pass once CI is set up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants