Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

llama-cli: Could not load model: InvalidMagic { path: ... } #59

Closed
mguinhos opened this issue Mar 22, 2023 · 11 comments
Closed

llama-cli: Could not load model: InvalidMagic { path: ... } #59

mguinhos opened this issue Mar 22, 2023 · 11 comments

Comments

@mguinhos
Copy link

mguinhos commented Mar 22, 2023

Model sucessfully runs on llama.cpp but not in llama-rs

Command:

cargo run --release -- -m C:\Users\Usuário\Downloads\LLaMA\7B\ggml-model-q4_0.bin -p "Tell me how cool the Rust programming language is:"
PS C:\Users\Usuário\Desktop\llama-rs> cargo run --release -- -m C:\Users\Usuário\Downloads\LLaMA\7B\ggml-model-q4_0.bin -p "Tell me how cool the Rust programming language is:"
    Finished release [optimized] target(s) in 2.83s
     Running `target\release\llama-cli.exe -m C:\Users\Usuário\Downloads\LLaMA\7B\ggml-model-q4_0.bin -p "Tell me how cool the Rust programming language is:"`
thread 'main' panicked at 'Could not load model: InvalidMagic { path: "C:\\Users\\Usuário\\Downloads\\LLaMA\\7B\\ggml-model-q4_0.bin" }', llama-cli\src\main.rs:147:10
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: process didn't exit successfully: `target\release\llama-cli.exe -m C:\Users\Usuário\Downloads\LLaMA\7B\ggml-model-q4_0.bin -p "Tell me how cool the Rust programming language is:"` (exit code: 101)
@mguinhos mguinhos changed the title llama-cli: thread 'main' panicked at 'Could not load model: InvalidMagic { path: "C:\\Users\\Usuário\\Downloads\\LLaMA\\7B\\ggml-model-q4_0.bin" }', llama-cli\src\main.rs:147:10 llama-cli: Could not load model: InvalidMagic { path: ... }', llama-cli\src\main.rs:147:10 Mar 22, 2023
@mguinhos mguinhos changed the title llama-cli: Could not load model: InvalidMagic { path: ... }', llama-cli\src\main.rs:147:10 llama-cli: Could not load model: InvalidMagic { path: ... } Mar 22, 2023
@philpax
Copy link
Collaborator

philpax commented Mar 22, 2023

ggerganov/llama.cpp#252 changed the model format, and we're not compatible with it yet. Thanks for spotting this - we'll need to expedite the fix.

In the meantime, you can re-quantize the model with a version of llama.cpp that predates that, or find a quantized model floating around the internet from before then.

@mguinhos
Copy link
Author

Humm... thanks i will try to re-quantize the model to a previous version!

@mguinhos
Copy link
Author

ggerganov/llama.cpp#252 changed the model format, and we're not compatible with it yet. Thanks for spotting this - we'll need to expedite the fix.

In the meantime, you can re-quantize the model with a version of llama.cpp that predates that, or find a quantized model floating around the internet from before then.

Got it! Tried with the previous alpaca version that i had!

@philpax
Copy link
Collaborator

philpax commented Mar 22, 2023

Great! We'll leave this issue open as a reminder that we'll need to update to handle the new format.

@mguinhos
Copy link
Author

Changing the code a bit is sufficient for running the new versioned file formats.

  #[error("file is pre-versioned, generate another please! at {path:?}")]
  PreVersioned { path: PathBuf },
  #[error("invalid magic number for {path:?}")]
  InvalidMagic { path: PathBuf },
  #[error("invalid version number for {path:?}")]
  InvalidVersion { path: PathBuf },

...

// Verify magic
{
    let magic = read_i32(&mut reader)?;
    if magic == 0x67676d6c {
        return Err(LoadError::PreVersioned {
            path: main_path.to_owned(),
        });
    }
    
    if magic != 0x67676d66 {
        return Err(LoadError::InvalidMagic {
            path: main_path.to_owned(),
        });
    }
}

// Verify the version
{
    let format_version = read_i32(&mut reader)?;
    if format_version != 1 {
        return Err(LoadError::InvalidVersion {
            path: main_path.to_owned(),
        });
    }
}

...

// Load vocabulary
let mut vocab = Vocabulary::default();
for i in 0..hparams.n_vocab {
    let len = read_i32(&mut reader)?;
    if let Ok(word) = read_string(&mut reader, len as usize) {
        vocab.mapping.push(word);
        
    } else {
        load_progress_callback(LoadProgress::BadToken {
            index: i.try_into()?,
        });
        vocab.mapping.push("�".to_string());
    }

    let score: f32 = read_i32(&mut reader)? as f32;
    vocab.score.push(score);
}

It works without issues.
But i dont know if its sufficiente, nothing panicked and it did the inference.

@mguinhos
Copy link
Author

mguinhos commented Mar 22, 2023

I think that the change in the binary format was just the inclusion of the version number, and the score in the load vocabulary. but i am not sure.

@ghost
Copy link

ghost commented Mar 23, 2023

After fixing this bug, and adding score: Vec<f32> to the Vocabulary struct, the 7B model works, but 65B does not. It crashes with an allocation error in the ggml library.

@mguinhos
Copy link
Author

Related pull request: #61

@RoyVorster
Copy link
Contributor

RoyVorster commented Mar 23, 2023

@mguinhos thanks for referencing. Hadn't even seen the issue. Feel free to modify the PR. Was just running llama-rs for the first time and running into this issue, figured it'd be best to share the small fixes.

This was referenced Mar 23, 2023
@RoyVorster
Copy link
Contributor

Can probably close this issue now?

@philpax philpax closed this as completed Mar 24, 2023
@vv9k
Copy link

vv9k commented Apr 5, 2023

I'm using current main branch of llama-rs, got the 7B model and used the python script to convert it and then quantized it with the latest commit of llama.cpp and getting this error:

thread 'main' panicked at 'Could not load model: InvalidMagic { path: "LLaMA/7B/ggml-model-q4_0.bin" }', llama-cli/src/main.rs:206:6

llama.cpp works with the same model:

❯ ./main -m LLaMA/7B/ggml-model-q4_0.bin -p "test"
main: seed = 1680698608
llama_model_load: loading model from 'LLaMA/7B/ggml-model-q4_0.bin' - please wait ...
llama_model_load: n_vocab = 32000
llama_model_load: n_ctx   = 512
llama_model_load: n_embd  = 4096
llama_model_load: n_mult  = 256
llama_model_load: n_head  = 32
llama_model_load: n_layer = 32
llama_model_load: n_rot   = 128
llama_model_load: f16     = 2
llama_model_load: n_ff    = 11008
llama_model_load: n_parts = 1
llama_model_load: type    = 1
llama_model_load: ggml map size = 4017.70 MB
llama_model_load: ggml ctx size =  81.25 KB
llama_model_load: mem required  = 5809.78 MB (+ 1026.00 MB per state)
llama_model_load: loading tensors from '/home/wojtek/special_downloads/LLaMA/7B/ggml-model-q4_0.bin'
llama_model_load: model size =  4017.27 MB / num tensors = 291
llama_init_from_file: kv self size  =  256.00 MB

system_info: n_threads = 12 / 12 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
sampling: temp = 0.800000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.100000
generate: n_ctx = 512, n_batch = 8, n_predict = 128, n_keep = 0


 test_suite = True

# The suite of tests to see running.
test_suite( 'TestSuite' ) [end of text]

llama_print_timings:        load time =  1165.30 ms
llama_print_timings:      sample time =    14.24 ms /    27 runs   (    0.53 ms per run)
llama_print_timings: prompt eval time =   763.14 ms /     2 tokens (  381.57 ms per token)
llama_print_timings:        eval time =  4288.87 ms /    26 runs   (  164.96 ms per run)
llama_print_timings:       total time =  5468.84 ms

EDIT:

I managed to get it to work reconverting and requantizing the model with llama.cpp commit 5cb63e2 before change of format. I'm using current main of llama-rs so it should work with the newer format but I'm getting the above error with InvalidMagic

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants