Ollama based models are "not" providing full tokens (stops around 150 or so) on self-hosted docker(dify) based environment #2461

ashmanpan · 2024-02-18T03:07:10Z

Self Checks

I have searched for existing issues search for existing issues, including closed ones.
I confirm that I am using English to file this report (我已阅读并同意 Language Policy).

Dify version

0.5.4

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

run models on Ollama , locally on Mac M3
Create app use Ollama based model, I used llama2
use 2 model debugging view, use chat-gpt-4 as 2nd model to compare
set max tokens on both models as 4096
ask questions which will relatively long answers, you will see ChatGPT giving full details of the question in the answer but Ollama based locale model stooped after 150 or so tokens

✔️ Expected Behavior

Ollama based model also should give full long response like ChatGPT , till it reaches to max token 4096

❌ Actual Behavior

Ollama based model using dify is stopping in between the response after 150 or so tokens .

I tried Ollama same model on the terminal with the same questions and i can see full response from the model. it's not Ollama issues.

dosubot · 2024-02-18T03:07:29Z

🤖

Hey there, @ashmanpan! 👋 I'm Dosu, your friendly neighborhood bot here to help you tackle bugs, answer your questions, and guide you on contributing while we wait for a human maintainer to swing by. I'm diving into your issue with the Ollama models and will get back to you with a full answer shortly!

ashmanpan · 2024-02-18T03:09:36Z

detaiils and snpshot is avilable here - https://discord.com/channels/1082486657678311454/1206116751801327616

dosubot · 2024-02-18T03:15:02Z

🤖

Based on the information you've provided, it seems like the issue you're experiencing might be related to the 'num_predict' parameter in the Ollama model configuration. This parameter controls the maximum number of tokens to predict when generating text. Even though you've set the maximum token limit to 4096, the 'num_predict' parameter might be limiting the number of tokens generated by the model.

In the Dify version 0.5.4 codebase, the default 'num_predict' parameter value for Ollama based models is set to 128. Here's the relevant code snippet:

ParameterRule(
    name='num_predict',
    use_template='max_tokens',
    label=I18nObject(en_US="Num Predict"),
    type=ParameterType.INT,
    help=I18nObject(en_US="Maximum number of tokens to predict when generating text. "
                          "(Default: 128, -1 = infinite generation, -2 = fill context)"),
    default=128,
    min=-2,
    max=int(credentials.get('max_tokens', 4096)),
),

You can find this in the api/core/model_runtime/model_providers/ollama/llm/llm.py file.

If you want the model to generate more tokens, you might need to adjust the 'num_predict' parameter to a higher value or set it to -1 for infinite generation.

Unfortunately, I can't view the additional details and snapshots you've provided at the link as I'm unable to access URLs. If you could provide more information or any error messages you're encountering directly in this thread, it would be helpful in further diagnosing the issue.

Please note that you're using Dify version 0.5.4, and the latest version is 0.5.5. It might be worth updating to the latest version to see if the issue persists, as it could have been addressed in the latest release.

I hope this helps! If you have any more questions or need further clarification, feel free to ask.

Sources

api/core/model_runtime/model_providers/ollama/llm/llm.py

About Dosu

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

takatost · 2024-02-18T03:48:45Z

You can try adjusting the Num Predict model parameter to the appropriate tokens size for generating texts.

ashmanpan · 2024-02-18T12:10:42Z

I have already deployed dify on docker containers, which specific container this file has, how can I edit, or what's the way to change this?

Other question, why this kind low number as default , this is local llm no one is going to charge because of the large no of tokens so why ?

ashmanpan · 2024-02-18T12:13:29Z

ashmanpan · 2024-02-18T12:31:18Z

below snapshot are after upgarde to 0.5.5

ashmanpan · 2024-02-18T12:33:33Z

fist 2 models are running locally on ollama - you can clearly see the diffrence.

takatost · 2024-02-19T02:46:54Z

This number is set according to the default values provided in the Ollama API document, you can change this value here.

ashmanpan · 2024-02-19T04:02:42Z

Issue resolved . i will close soon after some testing

…

On Mon, Feb 19, 2024 at 10:47 AM takatost ***@***.***> wrote: image.png (view on web) <https://github.com/langgenius/dify/assets/5485478/95059438-9120-4a49-adac-472f83fe672f> This number is set according to the default values provided in the Ollama API document, you can change this value here. — Reply to this email directly, view it on GitHub <#2461 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABZJSQM4X2L4FDPHUSAT4Q3YUK4KTAVCNFSM6AAAAABDNZQ7HGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJRGYYDCNBUGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

akhilmadhumenon · 2024-04-10T12:28:19Z

When we tend to set a limit on num_predict, the sentences get cut off mid way, and the model does not complete the sentence. How can we ensure that it will complete the sentence whilst maintaining the maximum token limit?

sivertheisholt · 2024-05-04T20:15:54Z

When we tend to set a limit on num_predict, the sentences get cut off mid way, and the model does not complete the sentence. How can we ensure that it will complete the sentence whilst maintaining the maximum token limit?

Did you figure out this? I have the same problem, cutting off mid sentence when setting the num_predict to 64.

ashmanpan added the 🐞 bug Something isn't working label Feb 18, 2024

ashmanpan changed the title ~~Ollama based models are "not" providing full tokens (stops around 150 or so) on self-hosted docker(dify) based envirnment~~ Ollama based models are "not" providing full tokens (stops around 150 or so) on self-hosted docker(dify) based environment Feb 18, 2024

crazywoola assigned crazywoola and takatost Feb 18, 2024

crazywoola closed this as completed Feb 23, 2024

mariomorvan mentioned this issue May 7, 2024

Unfinished sentences when setting num_predict parameter ollama/ollama#4230

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ollama based models are "not" providing full tokens (stops around 150 or so) on self-hosted docker(dify) based environment #2461

Ollama based models are "not" providing full tokens (stops around 150 or so) on self-hosted docker(dify) based environment #2461

ashmanpan commented Feb 18, 2024

dosubot bot commented Feb 18, 2024

ashmanpan commented Feb 18, 2024

dosubot bot commented Feb 18, 2024

About Dosu

takatost commented Feb 18, 2024

ashmanpan commented Feb 18, 2024 •

edited

ashmanpan commented Feb 18, 2024 •

edited

ashmanpan commented Feb 18, 2024

ashmanpan commented Feb 18, 2024 •

edited

takatost commented Feb 19, 2024

ashmanpan commented Feb 19, 2024 via email

akhilmadhumenon commented Apr 10, 2024

sivertheisholt commented May 4, 2024

Ollama based models are "not" providing full tokens (stops around 150 or so) on self-hosted docker(dify) based environment #2461

Ollama based models are "not" providing full tokens (stops around 150 or so) on self-hosted docker(dify) based environment #2461

Comments

ashmanpan commented Feb 18, 2024

Self Checks

Dify version

Cloud or Self Hosted

Steps to reproduce

✔️ Expected Behavior

❌ Actual Behavior

dosubot bot commented Feb 18, 2024

ashmanpan commented Feb 18, 2024

dosubot bot commented Feb 18, 2024

Sources

About Dosu

takatost commented Feb 18, 2024

ashmanpan commented Feb 18, 2024 • edited

ashmanpan commented Feb 18, 2024 • edited

ashmanpan commented Feb 18, 2024

ashmanpan commented Feb 18, 2024 • edited

takatost commented Feb 19, 2024

ashmanpan commented Feb 19, 2024 via email

akhilmadhumenon commented Apr 10, 2024

sivertheisholt commented May 4, 2024

ashmanpan commented Feb 18, 2024 •

edited

ashmanpan commented Feb 18, 2024 •

edited

ashmanpan commented Feb 18, 2024 •

edited