Skip to content

Regression: llama-server web interface doesn't show speed and progress during token generation since b6399 #15865

@lksj92hs

Description

@lksj92hs

Name and Version

version: 6399 (61bdfd5)
built with clang version 19.1.5 for x86_64-pc-windows-msvc

Operating systems

No response

Which llama.cpp modules do you know to be affected?

llama-server

Command line

.\llama-server.exe -m unsloth_Qwen3-0.6B-GGUF_Qwen3-0.6B-Q4_K_M.gguf -dev none

Problem description & steps to reproduce

  1. Start llama-server with any model
  2. Open web browser and go to http://127.0.0.1:8080
  3. Go to Settings -> Advanced and turn on the "Show tokens per second" toggle
  4. Click Save
  5. Type any prompt and see the generation

Before b6399 (61bdfd5) the web interface (http://localhost:8080) immediately showed a label Speed: xx t/s with the current speed, which was constantly updated during the generation and streaming (and when the mouse pointer was over it, it also showed details like pp tokens, generated tokens, times, speed, etc.).

Since b6399, the Speed label is shown only when the token generation is fully complete and is not visible during the streaming. This is especially annoying when the generation is slow or there is a big thinking part.

NOTE: Tested with both Firefox and Chrome, as well as on Windows and macOS, so the bug is not OS, or browser specific.

First Bad Commit

b6399 (61bdfd5)

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions