-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Open
Labels
Description
Description
For streaming mode, I could fetch ["usage"] from update.contents to get real time per tool call actual token usage for the inner chat/agent run.
But there is no clean way to get per tool call usage event in non-streaming mode. The only way to do it right now is via a monkey patch hook like below.
Any plan to support such feature in a public way?
Code Sample
def attach_per_call_usage_hook(
client: Any,
tracker: UsageTrackingMiddleware,
) -> None:
"""Monkey-patch a chat client to fire usage callbacks after each model call.
This wraps ``client._inner_get_response()`` so that for **non-streaming**
calls, after the underlying API returns a ``ChatResponse``, the tracker's
``_handle_usage`` is called immediately with ``dict(response.usage_details)``.
Also sets ``tracker._per_call_hook_active = True`` so the middleware's
``process()`` avoids firing duplicate callbacks for the final response.
In the agent-framework MRO::
ChatMiddlewareLayer [once]
→ FunctionInvocationLayer [tool loop]
→ ChatTelemetryLayer [per-call] ← same level
→ BaseChatClient.get_response()
→ _inner_get_response() ← we wrap here
This gives us the same per-iteration accuracy as OpenTelemetry spans
without requiring otel to be enabled.
Streaming calls are left unmodified — use ``ResponseStream`` hooks instead.
"""
tracker._per_call_hook_active = True
callback = tracker._handle_usage
original = client._inner_get_response
def _wrapped(*, messages: Any, stream: bool, options: Any, **kwargs: Any) -> Any:
result = original(messages=messages, stream=stream, options=options, **kwargs)
if stream:
return result # Streaming handled by ResponseStream hooks
# Wrap the awaitable to intercept the response
async def _intercept() -> Any:
response = await result
if response and hasattr(response, "usage_details") and response.usage_details:
callback(dict(response.usage_details))
return response
return _intercept()
client._inner_get_response = _wrappedLanguage/SDK
Python
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
In Review