Skip to content

Extra details returned by the model is not tracked by Usage class #1579

@ThachNgocTran

Description

@ThachNgocTran

Initial Checks

Description

Description

After a LLM finishes a call (request), it also returns some statistics, for example, number of prompt tokens, completion token. This is tracked by the class Usage (pydantic-ai/pydantic_ai_slim/pydantic_ai/usage.py (link))

According to the class' Documentation (link), as of 04.2025, the attribute details should contain "any extra details returned by the model." But this is not the case with the latest version of Pydantic AI (version 0.1.3).

I use llama-server (part of llama-cpp) as the backend to host a LLM model (in form of GGUF file). Using tcpflow (link) to capture the communication between Server and Client, I can see the last messsage sent from the Server as followed:

{
    "choices":[
        {
            "finish_reason":"stop",
            "index":0,
            "delta":{
                
            }
        }
    ],
    "created":1745457407,
    "id":"chatcmpl-G7Hmg3VGIPYO6hFk6nw7b4VtC4vqyliz",
    "model":"Qwen2.5-7B-Instruct-1M-q4_k_m-Finetuned",
    "system_fingerprint":"b5127-e959d32b",
    "object":"chat.completion.chunk",
    "usage":{
        "completion_tokens":32,
        "prompt_tokens":52,
        "total_tokens":84
    },
    "timings":{
        "prompt_n":17,
        "prompt_ms":2470.602,
        "prompt_per_token_ms":145.3295294117647,
        "prompt_per_second":6.880914044431277,
        "predicted_n":32,
        "predicted_ms":6924.775,
        "predicted_per_token_ms":216.39921875,
        "predicted_per_second":4.621088771837353
    }
}

The completion_tokens and prompt_tokens are well-captured by the Usage class (respectively, response_tokens and request_tokens). But all about the time taken to process, e.g. prompt_ms, prompt_per_token_ms are missed in the field details of Usage class. Unless I am mistaken, the field should contain any extra details returned by the model.

Expectation

The field details of Usage class should contain any extra details returned by the model, e.g. timings or prompt_per_token_ms.

Example Code

from pydantic_ai import Agent
from pydantic_ai.models.openai import OpenAIModel
from pydantic_ai.providers.openai import OpenAIProvider
...
agent  = Agent(
            OpenAIModel(
                "model_name",
                provider=OpenAIProvider(
                    api_key=os.environ["LLM_API_KEY"],
                    base_url=f"{LLM_URL}:8081/v1",
                    http_client=AsyncClient(headers={"Connection": "close"}),
                ),
            ),
            retries=3,
            deps_type=str,
        )
...
async with agent.run_stream(
            latest_user_message,
            message_history=message_history,
            deps=system_prompt,
        ) as result:
            async for chunk in result.stream_text(delta=True):
                writer(chunk)

print(result.usage())

Python, Pydantic AI & LLM client version

+ Windows 11, WSL2, Ubuntu 24.04
+ Pydantic AI v0.1.3
+ Python v3.12.7
+ llama-cli 5117
+ Langchain Core 0.3.49

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions