-
Notifications
You must be signed in to change notification settings - Fork 469
fix: #638 Add usage data from stream on response (Chat Completions) #652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix: #638 Add usage data from stream on response (Chat Completions) #652
Conversation
Streaming results may not include usage data for providers like LiteLLM, causing generation spans (onSpanEnd) to not have usage results. Adding a simple check to provide usage data from the last stream event fixes the issue.
🦋 Changeset detectedLatest commit: 83378e1 The changes in this PR will be included in the next version bump. This PR includes changesets to release 2 packages
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
|
@seratch another issue I encountered with the The question is, should Generation SpanData be in the OAI Agent SDK model, or in the underlying model implementation's? const response = {
id: FAKE_ID,
created: Math.floor(Date.now() / 1000),
model: this.#model,
object: 'chat.completion',
choices: [],
usage: {
prompt_tokens: 0,
completion_tokens: 0,
total_tokens: 0,
},
};
for await (const event of convertChatCompletionsStreamToResponses(response, stream)) {
if (
event.type === 'response_done' &&
response.usage?.total_tokens === 0
) {
response.choices = event.response.output;
response.usage = {
prompt_tokens: event.response.usage.inputTokens,
completion_tokens: event.response.usage.outputTokens,
total_tokens: event.response.usage.totalTokens,
prompt_tokens_details: event.response.usage.inputTokensDetails,
completion_tokens_details: event.response.usage.outputTokensDetails,
};
}
yield event;
}
if (span && response && request.tracing === true) {
span.spanData.output = [response];
} |
|
@rclayton-godaddy I've verified the changes in this PR are sufficient for Chat Completions use cases. I am not sure about the question in your last comment, but with the changes in this PR, the span data includes usage data as expected: {
"created": 1763096420,
"id": "FAKE_ID",
"usage": {
"prompt_tokens_details": {
"cached_tokens": 0
},
"completion_tokens_details": {
"reasoning_tokens": 0
},
"completion_tokens": 100,
"total_tokens": 124,
"prompt_tokens": 24
},
"model": "gpt-4.1",
"object": "chat.completion",
"choices": []
}The tracing dashboard is optimized for Responses API, so when you use Chat Completions API, the data can be found like the above JSON data. I am fine to merge this PR, but if you have anything to clarify, let me know. |
|
@seratch If you look at the result you posted, |
|
Ah, I see. I got the point now. |
|
I would rather have the PR merged and can submit another PR for passing choices back from the streaming provider, so it's available on the span data. |
|
Yeah, I agree. We can improve it separately! |
Streaming results may not include usage data for providers like LiteLLM, causing "generation spans" (
onSpanEnd) to not have usage results. Adding a simple check to provide usage data from the last stream event fixes the issue.No usage data is on the response unless it's supplied via streaming chunks (LiteLLM doesn't do this):

However, the last event has the usage data:

This PR grabs the usage data from the
response_doneevent on the stream, appending it to the response.Note: there is a bit of friction in the model. The
openaiChatCompletionStreaming.tscomponent converts the OAI usage from "snake case" variables to Pascal case. However, the response type requires "snake case". I chose to not do anything fancy, preferring a direct mapping.Refer to: #638 (comment)