65B model does not run #65

ghost · 2023-03-23T13:23:14Z

Doing my part.

With llama as of 69c9229, with patches from #59: https://github.com/jempabroni/llama-rs.

This is not a RAM issue. llama.cpp runs all models fine including 65B.

7B runs:

[pem@jabroni llama-rs]$ cargo run --release -- -m ../llama.cpp/models/7B/ggml-model-q4_0.bin -p "My name is Inigo Montoya. You killed my father. Prepare"
...
[2023-03-23T13:03:04Z INFO  llama_cli] Loaded tensor 288/291
[2023-03-23T13:03:04Z INFO  llama_cli] Loading of '../llama.cpp/models/7B/ggml-model-q4_0.bin' complete
[2023-03-23T13:03:04Z INFO  llama_cli] Model size = 4017.27 MB / num tensors = 291
[2023-03-23T13:03:04Z INFO  llama_cli] Model fully loaded!
My name is Inigo Montoya. You killed my father. Prepare to die, "Django"
In 1850s Georgia the Confederate South was in its infancy and undergoing drastic changes from a culture of slavery towards one where freedom for all people became paramount. The Civil War ensued as many^C

13B runs:

[pem@jabroni llama-rs]$ cargo run --release -- -m ../llama.cpp/models/13B/ggml-model-q4_0.bin -p "My name is Inigo Montoya. You killed my father. Prepare"
...
[2023-03-23T13:10:48Z INFO  llama_cli] Loaded tensor 360/363
[2023-03-23T13:10:48Z INFO  llama_cli] Loading of '../llama.cpp/models/13B/ggml-model-q4_0.bin.1' complete
[2023-03-23T13:10:48Z INFO  llama_cli] Model size = 3880.49 MB / num tensors = 363
[2023-03-23T13:10:48Z INFO  llama_cli] Model fully loaded!
My name is Inigo Montoya. You killed my father. Prepare para morir!
Escrito por: Sally Dixon, Michael McBain.
Producido por: Aaron Herbertsen, Kevin Loader for Channel^C

30B runs:

[pem@jabroni llama-rs]$ cargo run --release -- -m ../llama.cpp/models/30B/ggml-model-q4_0.bin -p "My name is Inigo Montoya. You killed my father. Prepare"
[2023-03-23T13:18:53Z INFO  llama_cli] Loaded tensor 536/543
[2023-03-23T13:18:53Z INFO  llama_cli] Loading of '../llama.cpp/models/30B/ggml-model-q4_0.bin.3' complete
[2023-03-23T13:18:53Z INFO  llama_cli] Model size = 4850.14 MB / num tensors = 543
[2023-03-23T13:18:53Z INFO  llama_cli] Model fully loaded!
My name is Inigo Montoya. You killed my father. Prepare para morir!
Sorry, it's been awhile since I watched The Princess Bride...but that quote kept popping up in my head while reading about the latest FDA crackdown on pharmaceutical ad spending targeting consumers: this time it was for J&J and Bayer with their Xarelto (direct-acting oral anticoagulant) blood thinner drug campaign -- which also included some slick print advertising (below). And, as usual these days, the company used social media to drive traffic online where they could "tell you more" about how taking their product can help prevent stroke among certain people who have atrial fibrillation.^C

65B does not:

[pem@jabroni llama-rs]$ cargo run --release -- -m ../llama.cpp/models/65B/ggml-model-q4_0.bin -p "My name is Inigo Montoya. You killed my father. Prepare"
...
[2023-03-23T13:02:10Z INFO  llama_cli] Loaded tensor 720/723
[2023-03-23T13:02:10Z INFO  llama_cli] Loading of '../llama.cpp/models/65B/ggml-model-q4_0.bin.7' complete
[2023-03-23T13:02:10Z INFO  llama_cli] Model size = 4869.09 MB / num tensors = 723
[2023-03-23T13:02:10Z INFO  llama_cli] Model fully loaded!
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 536930528, available 536870912)
thread 'main' panicked at 'Should not be null', llama-rs/src/ggml.rs:41:36
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

The text was updated successfully, but these errors were encountered:

mguinhos · 2023-03-23T14:28:12Z

Found the possible source of the problem:

let mut buf_size = 512 * 1024 * 1024; // <- this hardcoded value here = 536870912
if session.mem_per_token > 0 && session.mem_per_token * n > buf_size {
    // add 10% to account for ggml object overhead
    buf_size = (1.1f64 * session.mem_per_token as f64 * n as f64) as usize;
};
let ctx0 = ggml::Context::init(buf_size);

mguinhos · 2023-03-23T14:44:24Z

Waiting to quantize the 65B version, to attempt a fix.

For now, you should join the discord server. The link is in the README (chat)

setzer22 · 2023-03-23T21:30:50Z

Hi! 👋 Thanks for reporting and fixing this. With the changed value, it appears 65B loads fine now after #66. But please reopen if you still have issues.

ghost mentioned this issue Mar 23, 2023

fix 65B model #66

Merged

setzer22 closed this as completed Mar 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

65B model does not run #65

65B model does not run #65

ghost commented Mar 23, 2023

mguinhos commented Mar 23, 2023 •

edited

Loading

mguinhos commented Mar 23, 2023 •

edited

Loading

setzer22 commented Mar 23, 2023 •

edited

Loading

65B model does not run #65

65B model does not run #65

Comments

ghost commented Mar 23, 2023

mguinhos commented Mar 23, 2023 • edited Loading

mguinhos commented Mar 23, 2023 • edited Loading

setzer22 commented Mar 23, 2023 • edited Loading

mguinhos commented Mar 23, 2023 •

edited

Loading

mguinhos commented Mar 23, 2023 •

edited

Loading

setzer22 commented Mar 23, 2023 •

edited

Loading