Skip to content

Streaming usage accounting returns zeros when tools/function calling are enabled. #9927

@fkfc

Description

@fkfc

LocalAI version:

LocalAI v4.2.6 (6a48157) using localai/localai:latest-gpu-hipblas, rocm-llama-cpp fa2699a2b168

Environment, CPU architecture, OS, and Version:

6.17.0-23-generic #23-Ubuntu SMP PREEMPT_DYNAMIC Sat Apr 11 23:29:57 UTC 2026 x86_64 GNU/Linux

Describe the bug

When using the OpenAI-compatible /v1/chat/completions endpoint with:

  • stream=true
  • stream_options.include_usage=true
  • tools

the final streamed usage chunk returns all token counts as zero:

"usage":{
  "prompt_tokens":0,
  "completion_tokens":0,
  "total_tokens":0
}

This only happens when tools/function calling are included in the request.

Without tools, usage accounting works correctly.

Non-stream requests also return correct usage metrics.

This issue breaks clients that depend on streamed usage accounting, such as OpenCode automatic context compaction/tracking.

The issue started after updating LocalAI + llama.cpp.

To Reproduce

  
curl http://servidor:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model":"Qwen3.6-35B-A3B-GGUF",
    "messages":[
      {
        "role":"user",
        "content":[
          {
            "type":"text",
            "text":"answer only the word test"
          }
        ]
      }
    ],
    "tools":[
      {
        "type":"function",
        "function":{
          "name":"test",
          "description":"test",
          "parameters":{
            "type":"object",
            "properties":{}
          }
        }
      }
    ],
    "stream":true,
    "stream_options":{
      "include_usage":true
    }
  }'

This will end with the lines:

data: {"id":"8a7e6695-7ed2-472e-af3b-fba4ce8e58ac","created":1779377922,"model":"Qwen3.6-35B-A3B-GGUF","object":"chat.completion.chunk","choices":[],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

data: [DONE]

Expected behavior

Number of tokens used, like for example when NOT including tools in the request:

curl http://servidor:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3.6-35B-A3B-GGUF",
    "messages": [{"role": "user", "content": "responda apenas com a palavra teste"}],
    "stream": true,
    "stream_options": {
      "include_usage": true
    }
  }'

Answer:

data: {"id":"d78e7313-c4b8-4b25-ad46-9c4cc44640a0","created":1779378028,"model":"Qwen3.6-35B-A3B-GGUF","object":"chat.completion.chunk","choices":[],"usage":{"prompt_tokens":18,"completion_tokens":213,"total_tokens":231}}

data: [DONE]

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions