You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To properly run Llama 3 models, you need to set stop token <|eot_id|>.
This is currently not configurable when running Jan in API server mode.
The model is automatically loaded by llama.cpp with
This causes the model to not stop generating when it should. It places <|eot_id|>assistant\n\n in its output and continues generating several responses/turns.
Of course a fix for llama.cpp is already being made, and surely coming to Nitro Cortex.
Still having this configurable would be nice.
The text was updated successfully, but these errors were encountered:
Right, so I found that you can actually specify the stop token in the API call below the messages array. "stop": ["<|eot_id|>"]
Too bad not all apps that use restful (Open AI) API calls allow this to be set.
Yes, it is quite confusing right now. We will work on the API server to have it directly communicate with model.json, so it can be set by default, and the chat/completion message settings can override this.
To properly run Llama 3 models, you need to set stop token
<|eot_id|>
.This is currently not configurable when running Jan in API server mode.
The model is automatically loaded by llama.cpp with
This causes the model to not stop generating when it should. It places
<|eot_id|>assistant\n\n
in its output and continues generating several responses/turns.Of course a fix for llama.cpp is already being made, and surely coming to
NitroCortex.Still having this configurable would be nice.
The text was updated successfully, but these errors were encountered: