-
Couldn't load subscription status.
- Fork 13.4k
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Name and Version
version: 6821 (a2e0088)
Operating systems
Windows
Which llama.cpp modules do you know to be affected?
llama-server
Command line
llama-server --offline -m <some-model.gguf>Problem description & steps to reproduce
Starting from b6821, the webUI no longer correctly shows the model used to generate a response when running a model offline from a local GGUF file. The last release that this worked correctly was b6818.
Steps to reproduce:
- Run llama-server using a local GGUF file.
llama-server --offline -m .\gemma-3-4b-it-Q6_K.gguf - Launch the webUI. In the settings page, ensure that the checkbox for 'Show model information' under the 'General' tab is checked.
- Start a conversation.
- Close the server and restart it using a different local GGUF file.
llama-server --offline -m .\Qwen3-4B-Instruct-2507-Q6_K.gguf - Launch the webUI and either continue the previous conversation, regenerate the most recent response or start a new conversation.
Expected behavior:
The webUI should show the first message as generated by gemma-3-4b-it-Q6_K.gguf and the second message generated by Qwen3-4B-Instruct-2507-Q6_K.gguf.
However, since b6821, all responses are instead shown as being generated by whichever local model is loaded.
First Bad Commit
Seems to have been introduced by commit 9b9201f
Relevant log output
ServeurpersoCom
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working