LocalAI version:
LocalAI v4.2.6 (6a48157) using localai/localai:latest-gpu-hipblas, rocm-llama-cpp fa2699a2b168
Environment, CPU architecture, OS, and Version:
6.17.0-23-generic #23-Ubuntu SMP PREEMPT_DYNAMIC Sat Apr 11 23:29:57 UTC 2026 x86_64 GNU/Linux
Describe the bug
When using the OpenAI-compatible /v1/chat/completions endpoint with:
- stream=true
- stream_options.include_usage=true
- tools
the final streamed usage chunk returns all token counts as zero:
"usage":{
"prompt_tokens":0,
"completion_tokens":0,
"total_tokens":0
}
This only happens when tools/function calling are included in the request.
Without tools, usage accounting works correctly.
Non-stream requests also return correct usage metrics.
This issue breaks clients that depend on streamed usage accounting, such as OpenCode automatic context compaction/tracking.
The issue started after updating LocalAI + llama.cpp.
To Reproduce
curl http://servidor:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model":"Qwen3.6-35B-A3B-GGUF",
"messages":[
{
"role":"user",
"content":[
{
"type":"text",
"text":"answer only the word test"
}
]
}
],
"tools":[
{
"type":"function",
"function":{
"name":"test",
"description":"test",
"parameters":{
"type":"object",
"properties":{}
}
}
}
],
"stream":true,
"stream_options":{
"include_usage":true
}
}'
This will end with the lines:
data: {"id":"8a7e6695-7ed2-472e-af3b-fba4ce8e58ac","created":1779377922,"model":"Qwen3.6-35B-A3B-GGUF","object":"chat.completion.chunk","choices":[],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}
data: [DONE]
Expected behavior
Number of tokens used, like for example when NOT including tools in the request:
curl http://servidor:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "Qwen3.6-35B-A3B-GGUF",
"messages": [{"role": "user", "content": "responda apenas com a palavra teste"}],
"stream": true,
"stream_options": {
"include_usage": true
}
}'
Answer:
data: {"id":"d78e7313-c4b8-4b25-ad46-9c4cc44640a0","created":1779378028,"model":"Qwen3.6-35B-A3B-GGUF","object":"chat.completion.chunk","choices":[],"usage":{"prompt_tokens":18,"completion_tokens":213,"total_tokens":231}}
data: [DONE]
LocalAI version:
LocalAI v4.2.6 (6a48157) using localai/localai:latest-gpu-hipblas, rocm-llama-cpp fa2699a2b168
Environment, CPU architecture, OS, and Version:
6.17.0-23-generic #23-Ubuntu SMP PREEMPT_DYNAMIC Sat Apr 11 23:29:57 UTC 2026 x86_64 GNU/Linux
Describe the bug
When using the OpenAI-compatible /v1/chat/completions endpoint with:
the final streamed usage chunk returns all token counts as zero:
This only happens when tools/function calling are included in the request.
Without tools, usage accounting works correctly.
Non-stream requests also return correct usage metrics.
This issue breaks clients that depend on streamed usage accounting, such as OpenCode automatic context compaction/tracking.
The issue started after updating LocalAI + llama.cpp.
To Reproduce
This will end with the lines:
Expected behavior
Number of tokens used, like for example when NOT including tools in the request:
Answer: