-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ollama based models are "not" providing full tokens (stops around 150 or so) on self-hosted docker(dify) based environment #2461
Comments
🤖 Hey there, @ashmanpan! 👋 I'm Dosu, your friendly neighborhood bot here to help you tackle bugs, answer your questions, and guide you on contributing while we wait for a human maintainer to swing by. I'm diving into your issue with the Ollama models and will get back to you with a full answer shortly! |
detaiils and snpshot is avilable here - https://discord.com/channels/1082486657678311454/1206116751801327616 |
🤖 Based on the information you've provided, it seems like the issue you're experiencing might be related to the 'num_predict' parameter in the Ollama model configuration. This parameter controls the maximum number of tokens to predict when generating text. Even though you've set the maximum token limit to 4096, the 'num_predict' parameter might be limiting the number of tokens generated by the model. In the Dify version 0.5.4 codebase, the default 'num_predict' parameter value for Ollama based models is set to 128. Here's the relevant code snippet: ParameterRule(
name='num_predict',
use_template='max_tokens',
label=I18nObject(en_US="Num Predict"),
type=ParameterType.INT,
help=I18nObject(en_US="Maximum number of tokens to predict when generating text. "
"(Default: 128, -1 = infinite generation, -2 = fill context)"),
default=128,
min=-2,
max=int(credentials.get('max_tokens', 4096)),
), You can find this in the api/core/model_runtime/model_providers/ollama/llm/llm.py file. If you want the model to generate more tokens, you might need to adjust the 'num_predict' parameter to a higher value or set it to -1 for infinite generation. Unfortunately, I can't view the additional details and snapshots you've provided at the link as I'm unable to access URLs. If you could provide more information or any error messages you're encountering directly in this thread, it would be helpful in further diagnosing the issue. Please note that you're using Dify version 0.5.4, and the latest version is 0.5.5. It might be worth updating to the latest version to see if the issue persists, as it could have been addressed in the latest release. I hope this helps! If you have any more questions or need further clarification, feel free to ask. Sources
|
You can try adjusting the |
I have already deployed dify on docker containers, which specific container this file has, how can I edit, or what's the way to change this? Other question, why this kind low number as default , this is local llm no one is going to charge because of the large no of tokens so why ? |
Issue resolved . i will close soon after some testing
…On Mon, Feb 19, 2024 at 10:47 AM takatost ***@***.***> wrote:
image.png (view on web)
<https://github.com/langgenius/dify/assets/5485478/95059438-9120-4a49-adac-472f83fe672f>
This number is set according to the default values provided in the Ollama
API document, you can change this value here.
—
Reply to this email directly, view it on GitHub
<#2461 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABZJSQM4X2L4FDPHUSAT4Q3YUK4KTAVCNFSM6AAAAABDNZQ7HGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJRGYYDCNBUGE>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
When we tend to set a limit on num_predict, the sentences get cut off mid way, and the model does not complete the sentence. How can we ensure that it will complete the sentence whilst maintaining the maximum token limit? |
Did you figure out this? I have the same problem, cutting off mid sentence when setting the num_predict to 64. |
Self Checks
Dify version
0.5.4
Cloud or Self Hosted
Self Hosted (Docker)
Steps to reproduce
✔️ Expected Behavior
Ollama based model also should give full long response like ChatGPT , till it reaches to max token 4096
❌ Actual Behavior
Ollama based model using dify is stopping in between the response after 150 or so tokens .
I tried Ollama same model on the terminal with the same questions and i can see full response from the model. it's not Ollama issues.
The text was updated successfully, but these errors were encountered: