Enable streaming usage metrics in OpenAIMixin for all OpenAI-compatible providers

### 🚀 Describe the new functionality needed

While implementing the Bedrock provider (#3748), I found that streaming requests don't collect token usage metrics by default. Fixed it for Bedrock by adding `stream_options = {"include_usage": True}` when telemetry is active. As pointed out by @mattf , this should be in the `OpenAIMixin` base class so all OpenAI-compatible providers get streaming metrics automatically - not just Bedrock. Right now the code's in `BedrockInferenceAdapter.openai_chat_completion()` :
## Enable streaming usage metrics when telemetry is active
```
if params.stream and get_current_span() is not None:
    if params.stream_options is None:
        params.stream_options = {"include_usage": True}
    elif "include_usage" not in params.stream_options:
        params.stream_options = {**params.stream_options, "include_usage": True}
```
Should move this into` OpenAIMixin.openai_chat_completion()` in `src/llama_stack/providers/utils/inference/openai_mixin.py` so others provide get it

### 💡 Why is this needed? What if we don't build it?

Without this, streaming requests have blind spots in telemetry - we can track tokens for non-streaming but not streaming.

 Makes it hard to:

- Monitor production costs accurately (streaming is common in chat apps)
- Debug performance issues (can't see if a streaming request is token-heavy)
- Set up proper rate limiting based on actual usage
- The include_usage parameter isn't obvious from OpenAI docs and easy to miss.


 If we don't standardize this, every new provider implementer has to discover it themselves. Also creates inconsistency - some providers would have streaming metrics, others won't. Since we already check get_current_span() is not None to detect if telemetry's enabled, there's no performance cost when telemetry is off. 

### Other thoughts

Thanks @mattf for pointing this out.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enable streaming usage metrics in OpenAIMixin for all OpenAI-compatible providers #3981

🚀 Describe the new functionality needed

Enable streaming usage metrics when telemetry is active

💡 Why is this needed? What if we don't build it?

Other thoughts

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enable streaming usage metrics in OpenAIMixin for all OpenAI-compatible providers #3981

Description

🚀 Describe the new functionality needed

Enable streaming usage metrics when telemetry is active

💡 Why is this needed? What if we don't build it?

Other thoughts

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions