Skip to content

Conversation

@rclayton-godaddy
Copy link

@rclayton-godaddy rclayton-godaddy commented Nov 13, 2025

Streaming results may not include usage data for providers like LiteLLM, causing "generation spans" (onSpanEnd) to not have usage results. Adding a simple check to provide usage data from the last stream event fixes the issue.

No usage data is on the response unless it's supplied via streaming chunks (LiteLLM doesn't do this):
image

However, the last event has the usage data:
image

This PR grabs the usage data from the response_done event on the stream, appending it to the response.

Note: there is a bit of friction in the model. The openaiChatCompletionStreaming.ts component converts the OAI usage from "snake case" variables to Pascal case. However, the response type requires "snake case". I chose to not do anything fancy, preferring a direct mapping.

Refer to: #638 (comment)

Streaming results may not include usage data for providers like LiteLLM, causing generation spans (onSpanEnd) to not have usage results.  Adding a simple check to provide usage data from the last stream event fixes the issue.
@changeset-bot
Copy link

changeset-bot bot commented Nov 13, 2025

🦋 Changeset detected

Latest commit: 83378e1

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 2 packages
Name Type
@openai/agents-openai Patch
@openai/agents Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@rclayton-godaddy rclayton-godaddy changed the title Add usage data from stream on response fix: Add usage data from stream on response Nov 13, 2025
@seratch seratch changed the title fix: Add usage data from stream on response fix: #638 Add usage data from stream on response Nov 13, 2025
@seratch seratch added this to the 0.3.x milestone Nov 13, 2025
@rclayton-godaddy
Copy link
Author

rclayton-godaddy commented Nov 13, 2025

@seratch another issue I encountered with the OpenAIChatCompletionsModel is that it doesn't provide the essential SpanData you would want with a Generation span. The response object is never updated. My change adds the usage, but then I realized I'm not getting back choices. I fixed this locally by casting the output as any, which at least allows me to report the output on the span. However, this obviously suboptimal. The choices model comes from OAI Chat Completions, but convertChatCompletionsStreamToResponses is returning AgentOutputItem.

The question is, should Generation SpanData be in the OAI Agent SDK model, or in the underlying model implementation's?

 const response = {
      id: FAKE_ID,
      created: Math.floor(Date.now() / 1000),
      model: this.#model,
      object: 'chat.completion',
      choices: [],
      usage: {
          prompt_tokens: 0,
          completion_tokens: 0,
          total_tokens: 0,
      },
  };
  
  for await (const event of convertChatCompletionsStreamToResponses(response, stream)) {
      if (
          event.type === 'response_done' &&
          response.usage?.total_tokens === 0
      ) {
          response.choices = event.response.output;
          response.usage = {
              prompt_tokens: event.response.usage.inputTokens,
              completion_tokens: event.response.usage.outputTokens,
              total_tokens: event.response.usage.totalTokens,
              prompt_tokens_details: event.response.usage.inputTokensDetails,
              completion_tokens_details: event.response.usage.outputTokensDetails,
          };
      }
      yield event;
  }
  
  if (span && response && request.tracing === true) {
      span.spanData.output = [response];
  }

@seratch
Copy link
Member

seratch commented Nov 14, 2025

@rclayton-godaddy I've verified the changes in this PR are sufficient for Chat Completions use cases.

I am not sure about the question in your last comment, but with the changes in this PR, the span data includes usage data as expected:

{
  "created": 1763096420,
  "id": "FAKE_ID",
  "usage": {
    "prompt_tokens_details": {
      "cached_tokens": 0
    },
    "completion_tokens_details": {
      "reasoning_tokens": 0
    },
    "completion_tokens": 100,
    "total_tokens": 124,
    "prompt_tokens": 24
  },
  "model": "gpt-4.1",
  "object": "chat.completion",
  "choices": []
}

The tracing dashboard is optimized for Responses API, so when you use Chat Completions API, the data can be found like the above JSON data.

I am fine to merge this PR, but if you have anything to clarify, let me know.

@seratch seratch changed the title fix: #638 Add usage data from stream on response fix: #638 Add usage data from stream on response (Chat Completions) Nov 14, 2025
@rclayton-godaddy
Copy link
Author

@seratch If you look at the result you posted, choices has no values, but should have the output from the model. The consequence is that "span end" doesn't receive the output, so implementers of TracingProvider have to implement workarounds to associate the outputs to the span.

@seratch
Copy link
Member

seratch commented Nov 14, 2025

Ah, I see. I got the point now.

@rclayton-godaddy
Copy link
Author

I would rather have the PR merged and can submit another PR for passing choices back from the streaming provider, so it's available on the span data.

@seratch
Copy link
Member

seratch commented Nov 14, 2025

Yeah, I agree. We can improve it separately!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants