-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
30 seconds, sometimes even 1 minute before it can copy the text produced by the AI. #72
Comments
I've never had this happen to me, so might be tricky to debug. Can you give some insight? Does your machine have enough memory to run the models you are running? |
16 GB DDR3 avx only cpu, and RTX 3060 12GB VRAM, in reality the models are very fast to respond, for example dolphin starcoder2 has 15B parameters and runs very fast without problems. it's "when it downloads the memory" (it's my guess) that oterm waits for this process to finish... it only happens in some cases where the responses are very long (also very fast responses). If I have to add information, it similarly happens that initially when you don't use ollama for 5-10 minutes, then initially it takes 30 seconds to respond... (probably loads in the gpu vram) |
If you are referring to the delay that happens when you load a new model, then this is normal. Let me know if this is the case, so that I close the ticket. |
It's probably related to the fact that the model takes a while to load on vram (It's normal and probably unavoidable), also also takes a while to download from Vram (This is also normal I assume). |
I am afraid I can't reproduce it. Granted I have a pretty beefy M2 with 96gb available. |
I've unsuccessfully attempted to re-produce this behavior using KDE Plasma 6 Konsole with a 16G VRAM 64G DRAM configuration. Neither fast GPU inference nor slow CPU inference seem to make much of a difference for me. I've however noticed that if I click fast enough, it does indeed copy incomplete replies when the mouse click registers in-between rendering of the Ollama stream output. Could it be related to your terminal emulator software (or are you using a native console)? |
Will close this as there is no more feedback and it is not reproducible. |
When the bot completes a response, especially if it was long, at least 30 seconds and sometimes even a minute pass before i can copy the text into the clipboard by clicking on it. Scrolling is also very slow, it seems related to memory running out or something by the AI after it has made a response, but I don't understand what it has to do with not being able to interact with the text or even with scrolling it, given that the operation with oterm is completed by the AI. I was wondering whether the two things can be made independent, or in any case made so that once the writing operation by the AI is completed, oterm immediately becomes available to be able to carry out other operations by the user.
The text was updated successfully, but these errors were encountered: