Should all models now be chunked? #20

flatsiedatsie · 2024-05-12T09:44:58Z

I tried to load NeuralReyna, a relatively small model, but still got an out of memory issue.

Should ALL models be chunked, even ones smaller than 2Gb?

Somewhat off-topic, but perhaps useful for others: I tried to do this, and chunk NeuralReyna. interestingly, it didn't want to be chunked into very small (100Mb) parts:

./gguf-split --split-max-size 100M ./neuralreyna-mini-1.8b-v0.3.q5_k_m.gguf neuralreyna-mini-1.8b-v0.3.q5_k_m     
error: one of splits have 0 tensors. Maybe size or tensors limit is too small

Even 200Mb was too small. Luckily, 250Mb worked.

The text was updated successfully, but these errors were encountered:

ngxson · 2024-05-12T17:02:46Z

If you get OOM error from ggml, that means the browser doesn't want to give you more RAM. Probably you're also loading other models or instances of wllama at the same time.

Chunked model won't help in this case, since you're already used up all available RAM

ngxson · 2024-05-12T17:04:29Z

Also, context length n_ctx seems to be quite big, you should decrease it to save RAM.

felladrin · 2024-05-12T18:58:43Z

I noticed a significant benefit in splitting the models, mostly due to the cache size constraints of Safari.
Mobile Safari has a cache limit of 300MB, while Desktop Safari has a limit of less than 1GB.
If the model size exceeds the limit, the user has to re-download the model after refreshing the page.
Besides that, as mentioned in the Readme, it helps reduce the time required to download the model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should all models now be chunked? #20

Should all models now be chunked? #20

flatsiedatsie commented May 12, 2024 •

edited

ngxson commented May 12, 2024 •

edited

ngxson commented May 12, 2024

felladrin commented May 12, 2024

Should all models now be chunked? #20

Should all models now be chunked? #20

Comments

flatsiedatsie commented May 12, 2024 • edited

ngxson commented May 12, 2024 • edited

ngxson commented May 12, 2024

felladrin commented May 12, 2024

flatsiedatsie commented May 12, 2024 •

edited

ngxson commented May 12, 2024 •

edited