Skip to content

EXAONE Deep 2 unsupported? #12448

@0wwafa

Description

@0wwafa
build: 4910 (d9a14523) with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: llama backend init
main: load the model and apply lora adapter, if any
llama_model_loader: loaded meta data with 35 key-value pairs and 273 tensors from /content/EXAONE-Deep-2.4B.q8_0.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = exaone
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = EXAONE Deep 2.4B
llama_model_loader: - kv   3:                           general.basename str              = EXAONE-Deep
llama_model_loader: - kv   4:                         general.size_label str              = 2.4B
llama_model_loader: - kv   5:                            general.license str              = other
llama_model_loader: - kv   6:                       general.license.name str              = exaone
llama_model_loader: - kv   7:                       general.license.link str              = LICENSE
llama_model_loader: - kv   8:                   general.base_model.count u32              = 1
llama_model_loader: - kv   9:                  general.base_model.0.name str              = EXAONE 3.5 2.4B Instruct
llama_model_loader: - kv  10:          general.base_model.0.organization str              = LGAI EXAONE
llama_model_loader: - kv  11:              general.base_model.0.repo_url str              = https://huggingface.co/LGAI-EXAONE/EX...
llama_model_loader: - kv  12:                               general.tags arr[str,4]       = ["lg-ai", "exaone", "exaone-deep", "t...
llama_model_loader: - kv  13:                          general.languages arr[str,2]       = ["en", "ko"]
llama_model_loader: - kv  14:                    exaone.embedding_length u32              = 2560
llama_model_loader: - kv  15:                exaone.attention.head_count u32              = 32
llama_model_loader: - kv  16:             exaone.attention.head_count_kv u32              = 8
llama_model_loader: - kv  17:                      exaone.context_length u32              = 32768
llama_model_loader: - kv  18:    exaone.attention.layer_norm_rms_epsilon f32              = 0.000010
llama_model_loader: - kv  19:                 exaone.feed_forward_length u32              = 7168
llama_model_loader: - kv  20:                         exaone.block_count u32              = 30
llama_model_loader: - kv  21:                      exaone.rope.freq_base f32              = 1000000.000000
llama_model_loader: - kv  22:                exaone.rope.dimension_count u32              = 80
llama_model_loader: - kv  23:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  24:                         tokenizer.ggml.pre str              = exaone
llama_model_loader: - kv  25:                      tokenizer.ggml.tokens arr[str,102400]  = ["[PAD]", "[BOS]", "[EOS]", "[UNK]", ...
llama_model_loader: - kv  26:                  tokenizer.ggml.token_type arr[i32,102400]  = [3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, ...
llama_model_loader: - kv  27:                      tokenizer.ggml.merges arr[str,101782]  = ["t h", "Ġ a", "Ġ í", "i n", "Ġ t...
llama_model_loader: - kv  28:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  29:                tokenizer.ggml.eos_token_id u32              = 361
llama_model_loader: - kv  30:            tokenizer.ggml.unknown_token_id u32              = 3
llama_model_loader: - kv  31:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  32:                    tokenizer.chat_template str              = {% for message in messages %}{% if lo...
llama_model_loader: - kv  33:               general.quantization_version u32              = 2
llama_model_loader: - kv  34:                          general.file_type u32              = 7
llama_model_loader: - type  f32:   62 tensors
llama_model_loader: - type  f16:    1 tensors
llama_model_loader: - type q8_0:  210 tensors
print_info: file format = GGUF V3 (latest)
print_info: file type   = Q8_0
print_info: file size   = 2.61 GiB (9.32 BPW) 
load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
load: special tokens cache size = 362
load: token to piece cache size = 0.6622 MB
print_info: arch             = exaone
print_info: vocab_only       = 0
print_info: n_ctx_train      = 32768
print_info: n_embd           = 2560
print_info: n_layer          = 30
print_info: n_head           = 32
print_info: n_head_kv        = 8
print_info: n_rot            = 80
print_info: n_swa            = 0
print_info: n_swa_pattern    = 1
print_info: n_embd_head_k    = 80
print_info: n_embd_head_v    = 80
print_info: n_gqa            = 4
print_info: n_embd_k_gqa     = 640
print_info: n_embd_v_gqa     = 640
print_info: f_norm_eps       = 0.0e+00
print_info: f_norm_rms_eps   = 1.0e-05
print_info: f_clamp_kqv      = 0.0e+00
print_info: f_max_alibi_bias = 0.0e+00
print_info: f_logit_scale    = 0.0e+00
print_info: f_attn_scale     = 0.0e+00
print_info: n_ff             = 7168
print_info: n_expert         = 0
print_info: n_expert_used    = 0
print_info: causal attn      = 1
print_info: pooling type     = 0
print_info: rope type        = 2
print_info: rope scaling     = linear
print_info: freq_base_train  = 1000000.0
print_info: freq_scale_train = 1
print_info: n_ctx_orig_yarn  = 32768
print_info: rope_finetuned   = unknown
print_info: ssm_d_conv       = 0
print_info: ssm_d_inner      = 0
print_info: ssm_d_state      = 0
print_info: ssm_dt_rank      = 0
print_info: ssm_dt_b_c_rms   = 0
print_info: model type       = ?B
print_info: model params     = 2.41 B
print_info: general.name     = EXAONE Deep 2.4B
print_info: vocab type       = BPE
print_info: n_vocab          = 102400
print_info: n_merges         = 101782
print_info: BOS token        = 1 '[BOS]'
print_info: EOS token        = 361 '[|endofturn|]'
print_info: EOT token        = 42 '<|endoftext|>'
print_info: UNK token        = 3 '[UNK]'
print_info: PAD token        = 0 '[PAD]'
print_info: LF token         = 560 'Ċ'
print_info: EOG token        = 42 '<|endoftext|>'
print_info: EOG token        = 361 '[|endofturn|]'
print_info: max token length = 48
load_tensors: loading model tensors, this can take a while... (mmap = true)
llama_model_load: error loading model: missing tensor 'output.weight'
llama_model_load_from_file_impl: failed to load model
common_init_from_params: failed to load model '/content/EXAONE-Deep-2.4B.q8_0.gguf'
main: error: unable to load model

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions