Streaming usage accounting returns zeros when tools/function calling are enabled.

**LocalAI version:**

 LocalAI v4.2.6 (6a48157a804292a34d33ce98c5b33f01aec215d2) using localai/localai:latest-gpu-hipblas, rocm-llama-cpp fa2699a2b168


**Environment, CPU architecture, OS, and Version:** 

6.17.0-23-generic #23-Ubuntu SMP PREEMPT_DYNAMIC Sat Apr 11 23:29:57 UTC 2026 x86_64 GNU/Linux


**Describe the bug**

When using the OpenAI-compatible /v1/chat/completions endpoint with:

- stream=true
- stream_options.include_usage=true
- tools

the final streamed usage chunk returns all token counts as zero:

```
"usage":{
  "prompt_tokens":0,
  "completion_tokens":0,
  "total_tokens":0
}
```

This only happens when tools/function calling are included in the request.

Without tools, usage accounting works correctly.

Non-stream requests also return correct usage metrics.

This issue breaks clients that depend on streamed usage accounting, such as OpenCode automatic context compaction/tracking.

The issue started after updating LocalAI + llama.cpp.


**To Reproduce**
```
  
curl http://servidor:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model":"Qwen3.6-35B-A3B-GGUF",
    "messages":[
      {
        "role":"user",
        "content":[
          {
            "type":"text",
            "text":"answer only the word test"
          }
        ]
      }
    ],
    "tools":[
      {
        "type":"function",
        "function":{
          "name":"test",
          "description":"test",
          "parameters":{
            "type":"object",
            "properties":{}
          }
        }
      }
    ],
    "stream":true,
    "stream_options":{
      "include_usage":true
    }
  }'
```
This will end with the lines:
```
data: {"id":"8a7e6695-7ed2-472e-af3b-fba4ce8e58ac","created":1779377922,"model":"Qwen3.6-35B-A3B-GGUF","object":"chat.completion.chunk","choices":[],"usage":{"prompt_tokens":0,"completion_tokens":0,"total_tokens":0}}

data: [DONE]
```


**Expected behavior**

Number of tokens used, like for example when NOT including tools in the request:
```
curl http://servidor:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "Qwen3.6-35B-A3B-GGUF",
    "messages": [{"role": "user", "content": "responda apenas com a palavra teste"}],
    "stream": true,
    "stream_options": {
      "include_usage": true
    }
  }'
```
Answer:
```
data: {"id":"d78e7313-c4b8-4b25-ad46-9c4cc44640a0","created":1779378028,"model":"Qwen3.6-35B-A3B-GGUF","object":"chat.completion.chunk","choices":[],"usage":{"prompt_tokens":18,"completion_tokens":213,"total_tokens":231}}

data: [DONE]
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Streaming usage accounting returns zeros when tools/function calling are enabled. #9927

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Streaming usage accounting returns zeros when tools/function calling are enabled. #9927

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions