Describe the bug
In our Discord, people mention the low speed of the second response or an empty response. I came across a model that causes all these problems in 0.4.6:
- Attempting to regenerate the response may hangs on the progress bar "Generating response" (thread and log of this issue in the attached zip archive)
- During the conversation, the speed decreases with each new response
- Sometimes instead of a response - an empty message
- Sometimes the response repeats infinitely without stop
This model: stable-code-3b.Q8_0.gguf
model.json is default for 0.4.6
It is clear that there is some incompatibility with the default Jan settings here, but it would be nice to somehow handle this or inform the user that something is wrong.
I hope this example will help diagnose and fix or somehow handle the described problems.
Steps to reproduce
Steps to reproduce the behavior:
Download my thread.json and try to generate a response with the model from the link above in this thread. The result is unpredictable, but in 9 out of 10 cases - it's an irrelevant answer to the query in the form of one or several paragraphs of text consisting of words or code.
Expected behavior
For this model, special prompt_template and probably retrieval_template are needed. Therefore, under the current conditions, an inadequate response is expected in terms of meaning. However, it should be at least some with stable speed and without freezing the progress bar for generating the answer.
Environment details
Operating System: Windows 11 Pro x64 23H2 (build 22631.3085)
Jan Version: 0.4.6
Processor: AMD Ryzen 5 5600
RAM: 64GB
Any additional relevant hardware specifics: RTX 3060 12GB, driver 551.23, rebar on, Cuda system fallback policy disabled
Logs
jan_1707226938.zip