Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

65B model does not run #65

Closed
ghost opened this issue Mar 23, 2023 · 3 comments
Closed

65B model does not run #65

ghost opened this issue Mar 23, 2023 · 3 comments

Comments

@ghost
Copy link

ghost commented Mar 23, 2023

Doing my part.

With llama as of 69c9229, with patches from #59: https://github.com/jempabroni/llama-rs.

This is not a RAM issue. llama.cpp runs all models fine including 65B.

image

7B runs:

[pem@jabroni llama-rs]$ cargo run --release -- -m ../llama.cpp/models/7B/ggml-model-q4_0.bin -p "My name is Inigo Montoya. You killed my father. Prepare"
...
[2023-03-23T13:03:04Z INFO  llama_cli] Loaded tensor 288/291
[2023-03-23T13:03:04Z INFO  llama_cli] Loading of '../llama.cpp/models/7B/ggml-model-q4_0.bin' complete
[2023-03-23T13:03:04Z INFO  llama_cli] Model size = 4017.27 MB / num tensors = 291
[2023-03-23T13:03:04Z INFO  llama_cli] Model fully loaded!
My name is Inigo Montoya. You killed my father. Prepare to die, "Django"
In 1850s Georgia the Confederate South was in its infancy and undergoing drastic changes from a culture of slavery towards one where freedom for all people became paramount. The Civil War ensued as many^C

13B runs:

[pem@jabroni llama-rs]$ cargo run --release -- -m ../llama.cpp/models/13B/ggml-model-q4_0.bin -p "My name is Inigo Montoya. You killed my father. Prepare"
...
[2023-03-23T13:10:48Z INFO  llama_cli] Loaded tensor 360/363
[2023-03-23T13:10:48Z INFO  llama_cli] Loading of '../llama.cpp/models/13B/ggml-model-q4_0.bin.1' complete
[2023-03-23T13:10:48Z INFO  llama_cli] Model size = 3880.49 MB / num tensors = 363
[2023-03-23T13:10:48Z INFO  llama_cli] Model fully loaded!
My name is Inigo Montoya. You killed my father. Prepare para morir!
Escrito por: Sally Dixon, Michael McBain.
Producido por: Aaron Herbertsen, Kevin Loader for Channel^C

30B runs:

[pem@jabroni llama-rs]$ cargo run --release -- -m ../llama.cpp/models/30B/ggml-model-q4_0.bin -p "My name is Inigo Montoya. You killed my father. Prepare"
[2023-03-23T13:18:53Z INFO  llama_cli] Loaded tensor 536/543
[2023-03-23T13:18:53Z INFO  llama_cli] Loading of '../llama.cpp/models/30B/ggml-model-q4_0.bin.3' complete
[2023-03-23T13:18:53Z INFO  llama_cli] Model size = 4850.14 MB / num tensors = 543
[2023-03-23T13:18:53Z INFO  llama_cli] Model fully loaded!
My name is Inigo Montoya. You killed my father. Prepare para morir!
Sorry, it's been awhile since I watched The Princess Bride...but that quote kept popping up in my head while reading about the latest FDA crackdown on pharmaceutical ad spending targeting consumers: this time it was for J&J and Bayer with their Xarelto (direct-acting oral anticoagulant) blood thinner drug campaign -- which also included some slick print advertising (below). And, as usual these days, the company used social media to drive traffic online where they could "tell you more" about how taking their product can help prevent stroke among certain people who have atrial fibrillation.^C

65B does not:

[pem@jabroni llama-rs]$ cargo run --release -- -m ../llama.cpp/models/65B/ggml-model-q4_0.bin -p "My name is Inigo Montoya. You killed my father. Prepare"
...
[2023-03-23T13:02:10Z INFO  llama_cli] Loaded tensor 720/723
[2023-03-23T13:02:10Z INFO  llama_cli] Loading of '../llama.cpp/models/65B/ggml-model-q4_0.bin.7' complete
[2023-03-23T13:02:10Z INFO  llama_cli] Model size = 4869.09 MB / num tensors = 723
[2023-03-23T13:02:10Z INFO  llama_cli] Model fully loaded!
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 536930528, available 536870912)
thread 'main' panicked at 'Should not be null', llama-rs/src/ggml.rs:41:36
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
@mguinhos
Copy link

mguinhos commented Mar 23, 2023

Found the possible source of the problem:

let mut buf_size = 512 * 1024 * 1024; // <- this hardcoded value here = 536870912
if session.mem_per_token > 0 && session.mem_per_token * n > buf_size {
    // add 10% to account for ggml object overhead
    buf_size = (1.1f64 * session.mem_per_token as f64 * n as f64) as usize;
};
let ctx0 = ggml::Context::init(buf_size);

@mguinhos
Copy link

mguinhos commented Mar 23, 2023

Waiting to quantize the 65B version, to attempt a fix.

For now, you should join the discord server. The link is in the README (chat)

@ghost ghost mentioned this issue Mar 23, 2023
@setzer22
Copy link
Collaborator

setzer22 commented Mar 23, 2023

Hi! 👋 Thanks for reporting and fixing this. With the changed value, it appears 65B loads fine now after #66. But please reopen if you still have issues.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants