Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to make imatrix (and likely quant) for nvidia's ChatQA-1.5 8B #7046

Closed
bartowski1182 opened this issue May 2, 2024 · 8 comments · Fixed by #7051
Closed

Unable to make imatrix (and likely quant) for nvidia's ChatQA-1.5 8B #7046

bartowski1182 opened this issue May 2, 2024 · 8 comments · Fixed by #7051

Comments

@bartowski1182
Copy link
Contributor

bartowski1182 commented May 2, 2024

This model: https://huggingface.co/nvidia/ChatQA-1.5-8B

Conversion worked no issue, but then when it's time to calculate the imatrix I see:

llama_model_load: error loading model: done_getting_tensors: wrong number of tensors; expected 323, got 291

Doing a search, last time slaren mentioned doing a gguf-dump.py, here's the output:

* Loading: /models/ChatQA-1.5-8B-GGUF/ChatQA-1.5-8B-fp16.gguf
* File is LITTLE endian, script is running on a LITTLE endian host.
llama_cpp-1  |
* Dumping 24 key/value pair(s)
      1: UINT32     |        1 | GGUF.version = 3
      2: UINT64     |        1 | GGUF.tensor_count = 323
      3: UINT64     |        1 | GGUF.kv_count = 21
      4: STRING     |        1 | general.architecture = 'llama'
      5: STRING     |        1 | general.name = 'ChatQA-1.5-8B'
      6: UINT32     |        1 | llama.block_count = 32
      7: UINT32     |        1 | llama.context_length = 8192
      8: UINT32     |        1 | llama.embedding_length = 4096
      9: UINT32     |        1 | llama.feed_forward_length = 14336
     10: UINT32     |        1 | llama.attention.head_count = 32
     11: UINT32     |        1 | llama.attention.head_count_kv = 8
     12: FLOAT32    |        1 | llama.rope.freq_base = 500000.0
     13: FLOAT32    |        1 | llama.attention.layer_norm_rms_epsilon = 9.999999747378752e-06
     14: UINT32     |        1 | general.file_type = 1
     15: UINT32     |        1 | llama.vocab_size = 128256
     16: UINT32     |        1 | llama.rope.dimension_count = 128
     17: STRING     |        1 | tokenizer.ggml.model = 'gpt2'
     18: STRING     |        1 | tokenizer.ggml.pre = 'llama-bpe'
     19: [STRING]   |   128256 | tokenizer.ggml.tokens
     20: [INT32]    |   128256 | tokenizer.ggml.token_type
     21: [STRING]   |   280147 | tokenizer.ggml.merges
     22: UINT32     |        1 | tokenizer.ggml.bos_token_id = 128000
     23: UINT32     |        1 | tokenizer.ggml.eos_token_id = 128001
     24: STRING     |        1 | tokenizer.chat_template = '{% set loop_messages = messages %}{% for message in loop_mes'
llama_cpp-1  |
* Dumping 323 tensor(s)
      1:  525336576 |  4096, 128256,     1,     1 | F16     | token_embd.weight
      2:       4096 |  4096,     1,     1,     1 | F32     | blk.0.attn_norm.weight
      3:   58720256 | 14336,  4096,     1,     1 | F16     | blk.0.ffn_down.weight
      4:   58720256 |  4096, 14336,     1,     1 | F16     | blk.0.ffn_gate.weight
      5:   58720256 |  4096, 14336,     1,     1 | F16     | blk.0.ffn_up.weight
      6:       4096 |  4096,     1,     1,     1 | F32     | blk.0.ffn_norm.weight
      7:    4194304 |  4096,  1024,     1,     1 | F16     | blk.0.attn_k.weight
      8:   16777216 |  4096,  4096,     1,     1 | F16     | blk.0.attn_output.weight
      9:   16777216 |  4096,  4096,     1,     1 | F16     | blk.0.attn_q.weight
     10:         64 |    64,     1,     1,     1 | F32     | blk.0.attn_rot_embd
     11:    4194304 |  4096,  1024,     1,     1 | F16     | blk.0.attn_v.weight
     12:       4096 |  4096,     1,     1,     1 | F32     | blk.1.attn_norm.weight
     13:   58720256 | 14336,  4096,     1,     1 | F16     | blk.1.ffn_down.weight
     14:   58720256 |  4096, 14336,     1,     1 | F16     | blk.1.ffn_gate.weight
     15:   58720256 |  4096, 14336,     1,     1 | F16     | blk.1.ffn_up.weight
     16:       4096 |  4096,     1,     1,     1 | F32     | blk.1.ffn_norm.weight
     17:    4194304 |  4096,  1024,     1,     1 | F16     | blk.1.attn_k.weight
     18:   16777216 |  4096,  4096,     1,     1 | F16     | blk.1.attn_output.weight
     19:   16777216 |  4096,  4096,     1,     1 | F16     | blk.1.attn_q.weight
     20:         64 |    64,     1,     1,     1 | F32     | blk.1.attn_rot_embd
     21:    4194304 |  4096,  1024,     1,     1 | F16     | blk.1.attn_v.weight
     22:       4096 |  4096,     1,     1,     1 | F32     | blk.10.attn_norm.weight
     23:   58720256 | 14336,  4096,     1,     1 | F16     | blk.10.ffn_down.weight
     24:   58720256 |  4096, 14336,     1,     1 | F16     | blk.10.ffn_gate.weight
     25:   58720256 |  4096, 14336,     1,     1 | F16     | blk.10.ffn_up.weight
     26:       4096 |  4096,     1,     1,     1 | F32     | blk.10.ffn_norm.weight
     27:    4194304 |  4096,  1024,     1,     1 | F16     | blk.10.attn_k.weight
     28:   16777216 |  4096,  4096,     1,     1 | F16     | blk.10.attn_output.weight
     29:   16777216 |  4096,  4096,     1,     1 | F16     | blk.10.attn_q.weight
     30:         64 |    64,     1,     1,     1 | F32     | blk.10.attn_rot_embd
     31:    4194304 |  4096,  1024,     1,     1 | F16     | blk.10.attn_v.weight
     32:       4096 |  4096,     1,     1,     1 | F32     | blk.11.attn_norm.weight
     33:   58720256 | 14336,  4096,     1,     1 | F16     | blk.11.ffn_down.weight
     34:   58720256 |  4096, 14336,     1,     1 | F16     | blk.11.ffn_gate.weight
     35:   58720256 |  4096, 14336,     1,     1 | F16     | blk.11.ffn_up.weight
     36:       4096 |  4096,     1,     1,     1 | F32     | blk.11.ffn_norm.weight
     37:    4194304 |  4096,  1024,     1,     1 | F16     | blk.11.attn_k.weight
     38:   16777216 |  4096,  4096,     1,     1 | F16     | blk.11.attn_output.weight
     39:   16777216 |  4096,  4096,     1,     1 | F16     | blk.11.attn_q.weight
     40:         64 |    64,     1,     1,     1 | F32     | blk.11.attn_rot_embd
     41:    4194304 |  4096,  1024,     1,     1 | F16     | blk.11.attn_v.weight
     42:       4096 |  4096,     1,     1,     1 | F32     | blk.12.attn_norm.weight
     43:   58720256 | 14336,  4096,     1,     1 | F16     | blk.12.ffn_down.weight
     44:   58720256 |  4096, 14336,     1,     1 | F16     | blk.12.ffn_gate.weight
     45:   58720256 |  4096, 14336,     1,     1 | F16     | blk.12.ffn_up.weight
     46:       4096 |  4096,     1,     1,     1 | F32     | blk.12.ffn_norm.weight
     47:    4194304 |  4096,  1024,     1,     1 | F16     | blk.12.attn_k.weight
     48:   16777216 |  4096,  4096,     1,     1 | F16     | blk.12.attn_output.weight
     49:   16777216 |  4096,  4096,     1,     1 | F16     | blk.12.attn_q.weight
     50:         64 |    64,     1,     1,     1 | F32     | blk.12.attn_rot_embd
     51:    4194304 |  4096,  1024,     1,     1 | F16     | blk.12.attn_v.weight
     52:       4096 |  4096,     1,     1,     1 | F32     | blk.13.attn_norm.weight
     53:   58720256 | 14336,  4096,     1,     1 | F16     | blk.13.ffn_down.weight
     54:   58720256 |  4096, 14336,     1,     1 | F16     | blk.13.ffn_gate.weight
     55:   58720256 |  4096, 14336,     1,     1 | F16     | blk.13.ffn_up.weight
     56:       4096 |  4096,     1,     1,     1 | F32     | blk.13.ffn_norm.weight
     57:    4194304 |  4096,  1024,     1,     1 | F16     | blk.13.attn_k.weight
     58:   16777216 |  4096,  4096,     1,     1 | F16     | blk.13.attn_output.weight
     59:   16777216 |  4096,  4096,     1,     1 | F16     | blk.13.attn_q.weight
     60:         64 |    64,     1,     1,     1 | F32     | blk.13.attn_rot_embd
     61:    4194304 |  4096,  1024,     1,     1 | F16     | blk.13.attn_v.weight
     62:       4096 |  4096,     1,     1,     1 | F32     | blk.14.attn_norm.weight
     63:   58720256 | 14336,  4096,     1,     1 | F16     | blk.14.ffn_down.weight
     64:   58720256 |  4096, 14336,     1,     1 | F16     | blk.14.ffn_gate.weight
     65:   58720256 |  4096, 14336,     1,     1 | F16     | blk.14.ffn_up.weight
     66:       4096 |  4096,     1,     1,     1 | F32     | blk.14.ffn_norm.weight
     67:    4194304 |  4096,  1024,     1,     1 | F16     | blk.14.attn_k.weight
     68:   16777216 |  4096,  4096,     1,     1 | F16     | blk.14.attn_output.weight
     69:   16777216 |  4096,  4096,     1,     1 | F16     | blk.14.attn_q.weight
     70:         64 |    64,     1,     1,     1 | F32     | blk.14.attn_rot_embd
     71:    4194304 |  4096,  1024,     1,     1 | F16     | blk.14.attn_v.weight
     72:       4096 |  4096,     1,     1,     1 | F32     | blk.15.attn_norm.weight
     73:   58720256 | 14336,  4096,     1,     1 | F16     | blk.15.ffn_down.weight
     74:   58720256 |  4096, 14336,     1,     1 | F16     | blk.15.ffn_gate.weight
     75:   58720256 |  4096, 14336,     1,     1 | F16     | blk.15.ffn_up.weight
     76:       4096 |  4096,     1,     1,     1 | F32     | blk.15.ffn_norm.weight
     77:    4194304 |  4096,  1024,     1,     1 | F16     | blk.15.attn_k.weight
     78:   16777216 |  4096,  4096,     1,     1 | F16     | blk.15.attn_output.weight
     79:   16777216 |  4096,  4096,     1,     1 | F16     | blk.15.attn_q.weight
     80:         64 |    64,     1,     1,     1 | F32     | blk.15.attn_rot_embd
     81:    4194304 |  4096,  1024,     1,     1 | F16     | blk.15.attn_v.weight
     82:       4096 |  4096,     1,     1,     1 | F32     | blk.16.attn_norm.weight
     83:   58720256 | 14336,  4096,     1,     1 | F16     | blk.16.ffn_down.weight
     84:   58720256 |  4096, 14336,     1,     1 | F16     | blk.16.ffn_gate.weight
     85:   58720256 |  4096, 14336,     1,     1 | F16     | blk.16.ffn_up.weight
     86:       4096 |  4096,     1,     1,     1 | F32     | blk.16.ffn_norm.weight
     87:    4194304 |  4096,  1024,     1,     1 | F16     | blk.16.attn_k.weight
     88:   16777216 |  4096,  4096,     1,     1 | F16     | blk.16.attn_output.weight
     89:   16777216 |  4096,  4096,     1,     1 | F16     | blk.16.attn_q.weight
     90:         64 |    64,     1,     1,     1 | F32     | blk.16.attn_rot_embd
     91:    4194304 |  4096,  1024,     1,     1 | F16     | blk.16.attn_v.weight
     92:       4096 |  4096,     1,     1,     1 | F32     | blk.17.attn_norm.weight
     93:   58720256 | 14336,  4096,     1,     1 | F16     | blk.17.ffn_down.weight
     94:   58720256 |  4096, 14336,     1,     1 | F16     | blk.17.ffn_gate.weight
     95:   58720256 |  4096, 14336,     1,     1 | F16     | blk.17.ffn_up.weight
     96:       4096 |  4096,     1,     1,     1 | F32     | blk.17.ffn_norm.weight
     97:    4194304 |  4096,  1024,     1,     1 | F16     | blk.17.attn_k.weight
     98:   16777216 |  4096,  4096,     1,     1 | F16     | blk.17.attn_output.weight
     99:   16777216 |  4096,  4096,     1,     1 | F16     | blk.17.attn_q.weight
    100:         64 |    64,     1,     1,     1 | F32     | blk.17.attn_rot_embd
    101:    4194304 |  4096,  1024,     1,     1 | F16     | blk.17.attn_v.weight
    102:       4096 |  4096,     1,     1,     1 | F32     | blk.18.attn_norm.weight
    103:   58720256 | 14336,  4096,     1,     1 | F16     | blk.18.ffn_down.weight
    104:   58720256 |  4096, 14336,     1,     1 | F16     | blk.18.ffn_gate.weight
    105:   58720256 |  4096, 14336,     1,     1 | F16     | blk.18.ffn_up.weight
    106:       4096 |  4096,     1,     1,     1 | F32     | blk.18.ffn_norm.weight
    107:    4194304 |  4096,  1024,     1,     1 | F16     | blk.18.attn_k.weight
    108:   16777216 |  4096,  4096,     1,     1 | F16     | blk.18.attn_output.weight
    109:   16777216 |  4096,  4096,     1,     1 | F16     | blk.18.attn_q.weight
    110:         64 |    64,     1,     1,     1 | F32     | blk.18.attn_rot_embd
    111:    4194304 |  4096,  1024,     1,     1 | F16     | blk.18.attn_v.weight
    112:       4096 |  4096,     1,     1,     1 | F32     | blk.19.attn_norm.weight
    113:   58720256 | 14336,  4096,     1,     1 | F16     | blk.19.ffn_down.weight
    114:   58720256 |  4096, 14336,     1,     1 | F16     | blk.19.ffn_gate.weight
    115:   58720256 |  4096, 14336,     1,     1 | F16     | blk.19.ffn_up.weight
    116:       4096 |  4096,     1,     1,     1 | F32     | blk.19.ffn_norm.weight
    117:    4194304 |  4096,  1024,     1,     1 | F16     | blk.19.attn_k.weight
    118:   16777216 |  4096,  4096,     1,     1 | F16     | blk.19.attn_output.weight
    119:   16777216 |  4096,  4096,     1,     1 | F16     | blk.19.attn_q.weight
    120:         64 |    64,     1,     1,     1 | F32     | blk.19.attn_rot_embd
    121:    4194304 |  4096,  1024,     1,     1 | F16     | blk.19.attn_v.weight
    122:       4096 |  4096,     1,     1,     1 | F32     | blk.2.attn_norm.weight
    123:   58720256 | 14336,  4096,     1,     1 | F16     | blk.2.ffn_down.weight
    124:   58720256 |  4096, 14336,     1,     1 | F16     | blk.2.ffn_gate.weight
    125:   58720256 |  4096, 14336,     1,     1 | F16     | blk.2.ffn_up.weight
    126:       4096 |  4096,     1,     1,     1 | F32     | blk.2.ffn_norm.weight
    127:    4194304 |  4096,  1024,     1,     1 | F16     | blk.2.attn_k.weight
    128:   16777216 |  4096,  4096,     1,     1 | F16     | blk.2.attn_output.weight
    129:   16777216 |  4096,  4096,     1,     1 | F16     | blk.2.attn_q.weight
    130:         64 |    64,     1,     1,     1 | F32     | blk.2.attn_rot_embd
    131:    4194304 |  4096,  1024,     1,     1 | F16     | blk.2.attn_v.weight
    132:       4096 |  4096,     1,     1,     1 | F32     | blk.20.attn_norm.weight
    133:   58720256 |  4096, 14336,     1,     1 | F16     | blk.20.ffn_gate.weight
    134:       4096 |  4096,     1,     1,     1 | F32     | blk.20.ffn_norm.weight
    135:    4194304 |  4096,  1024,     1,     1 | F16     | blk.20.attn_k.weight
    136:   16777216 |  4096,  4096,     1,     1 | F16     | blk.20.attn_output.weight
    137:   16777216 |  4096,  4096,     1,     1 | F16     | blk.20.attn_q.weight
    138:         64 |    64,     1,     1,     1 | F32     | blk.20.attn_rot_embd
    139:    4194304 |  4096,  1024,     1,     1 | F16     | blk.20.attn_v.weight
    140:       4096 |  4096,     1,     1,     1 | F32     | blk.3.attn_norm.weight
    141:   58720256 | 14336,  4096,     1,     1 | F16     | blk.3.ffn_down.weight
    142:   58720256 |  4096, 14336,     1,     1 | F16     | blk.3.ffn_gate.weight
    143:   58720256 |  4096, 14336,     1,     1 | F16     | blk.3.ffn_up.weight
    144:       4096 |  4096,     1,     1,     1 | F32     | blk.3.ffn_norm.weight
    145:    4194304 |  4096,  1024,     1,     1 | F16     | blk.3.attn_k.weight
    146:   16777216 |  4096,  4096,     1,     1 | F16     | blk.3.attn_output.weight
    147:   16777216 |  4096,  4096,     1,     1 | F16     | blk.3.attn_q.weight
    148:         64 |    64,     1,     1,     1 | F32     | blk.3.attn_rot_embd
    149:    4194304 |  4096,  1024,     1,     1 | F16     | blk.3.attn_v.weight
    150:       4096 |  4096,     1,     1,     1 | F32     | blk.4.attn_norm.weight
    151:   58720256 | 14336,  4096,     1,     1 | F16     | blk.4.ffn_down.weight
    152:   58720256 |  4096, 14336,     1,     1 | F16     | blk.4.ffn_gate.weight
    153:   58720256 |  4096, 14336,     1,     1 | F16     | blk.4.ffn_up.weight
    154:       4096 |  4096,     1,     1,     1 | F32     | blk.4.ffn_norm.weight
    155:    4194304 |  4096,  1024,     1,     1 | F16     | blk.4.attn_k.weight
    156:   16777216 |  4096,  4096,     1,     1 | F16     | blk.4.attn_output.weight
    157:   16777216 |  4096,  4096,     1,     1 | F16     | blk.4.attn_q.weight
    158:         64 |    64,     1,     1,     1 | F32     | blk.4.attn_rot_embd
    159:    4194304 |  4096,  1024,     1,     1 | F16     | blk.4.attn_v.weight
    160:       4096 |  4096,     1,     1,     1 | F32     | blk.5.attn_norm.weight
    161:   58720256 | 14336,  4096,     1,     1 | F16     | blk.5.ffn_down.weight
    162:   58720256 |  4096, 14336,     1,     1 | F16     | blk.5.ffn_gate.weight
    163:   58720256 |  4096, 14336,     1,     1 | F16     | blk.5.ffn_up.weight
    164:       4096 |  4096,     1,     1,     1 | F32     | blk.5.ffn_norm.weight
    165:    4194304 |  4096,  1024,     1,     1 | F16     | blk.5.attn_k.weight
    166:   16777216 |  4096,  4096,     1,     1 | F16     | blk.5.attn_output.weight
    167:   16777216 |  4096,  4096,     1,     1 | F16     | blk.5.attn_q.weight
    168:         64 |    64,     1,     1,     1 | F32     | blk.5.attn_rot_embd
    169:    4194304 |  4096,  1024,     1,     1 | F16     | blk.5.attn_v.weight
    170:       4096 |  4096,     1,     1,     1 | F32     | blk.6.attn_norm.weight
    171:   58720256 | 14336,  4096,     1,     1 | F16     | blk.6.ffn_down.weight
    172:   58720256 |  4096, 14336,     1,     1 | F16     | blk.6.ffn_gate.weight
    173:   58720256 |  4096, 14336,     1,     1 | F16     | blk.6.ffn_up.weight
    174:       4096 |  4096,     1,     1,     1 | F32     | blk.6.ffn_norm.weight
    175:    4194304 |  4096,  1024,     1,     1 | F16     | blk.6.attn_k.weight
    176:   16777216 |  4096,  4096,     1,     1 | F16     | blk.6.attn_output.weight
    177:   16777216 |  4096,  4096,     1,     1 | F16     | blk.6.attn_q.weight
    178:         64 |    64,     1,     1,     1 | F32     | blk.6.attn_rot_embd
    179:    4194304 |  4096,  1024,     1,     1 | F16     | blk.6.attn_v.weight
    180:       4096 |  4096,     1,     1,     1 | F32     | blk.7.attn_norm.weight
    181:   58720256 | 14336,  4096,     1,     1 | F16     | blk.7.ffn_down.weight
    182:   58720256 |  4096, 14336,     1,     1 | F16     | blk.7.ffn_gate.weight
    183:   58720256 |  4096, 14336,     1,     1 | F16     | blk.7.ffn_up.weight
    184:       4096 |  4096,     1,     1,     1 | F32     | blk.7.ffn_norm.weight
    185:    4194304 |  4096,  1024,     1,     1 | F16     | blk.7.attn_k.weight
    186:   16777216 |  4096,  4096,     1,     1 | F16     | blk.7.attn_output.weight
    187:   16777216 |  4096,  4096,     1,     1 | F16     | blk.7.attn_q.weight
    188:         64 |    64,     1,     1,     1 | F32     | blk.7.attn_rot_embd
    189:    4194304 |  4096,  1024,     1,     1 | F16     | blk.7.attn_v.weight
    190:       4096 |  4096,     1,     1,     1 | F32     | blk.8.attn_norm.weight
    191:   58720256 | 14336,  4096,     1,     1 | F16     | blk.8.ffn_down.weight
    192:   58720256 |  4096, 14336,     1,     1 | F16     | blk.8.ffn_gate.weight
    193:   58720256 |  4096, 14336,     1,     1 | F16     | blk.8.ffn_up.weight
    194:       4096 |  4096,     1,     1,     1 | F32     | blk.8.ffn_norm.weight
    195:    4194304 |  4096,  1024,     1,     1 | F16     | blk.8.attn_k.weight
    196:   16777216 |  4096,  4096,     1,     1 | F16     | blk.8.attn_output.weight
    197:   16777216 |  4096,  4096,     1,     1 | F16     | blk.8.attn_q.weight
    198:         64 |    64,     1,     1,     1 | F32     | blk.8.attn_rot_embd
    199:    4194304 |  4096,  1024,     1,     1 | F16     | blk.8.attn_v.weight
    200:       4096 |  4096,     1,     1,     1 | F32     | blk.9.attn_norm.weight
    201:   58720256 | 14336,  4096,     1,     1 | F16     | blk.9.ffn_down.weight
    202:   58720256 |  4096, 14336,     1,     1 | F16     | blk.9.ffn_gate.weight
    203:   58720256 |  4096, 14336,     1,     1 | F16     | blk.9.ffn_up.weight
    204:       4096 |  4096,     1,     1,     1 | F32     | blk.9.ffn_norm.weight
    205:    4194304 |  4096,  1024,     1,     1 | F16     | blk.9.attn_k.weight
    206:   16777216 |  4096,  4096,     1,     1 | F16     | blk.9.attn_output.weight
    207:   16777216 |  4096,  4096,     1,     1 | F16     | blk.9.attn_q.weight
    208:         64 |    64,     1,     1,     1 | F32     | blk.9.attn_rot_embd
    209:    4194304 |  4096,  1024,     1,     1 | F16     | blk.9.attn_v.weight
    210:  525336576 |  4096, 128256,     1,     1 | F16     | output.weight
    211:   58720256 | 14336,  4096,     1,     1 | F16     | blk.20.ffn_down.weight
    212:   58720256 |  4096, 14336,     1,     1 | F16     | blk.20.ffn_up.weight
    213:       4096 |  4096,     1,     1,     1 | F32     | blk.21.attn_norm.weight
    214:   58720256 | 14336,  4096,     1,     1 | F16     | blk.21.ffn_down.weight
    215:   58720256 |  4096, 14336,     1,     1 | F16     | blk.21.ffn_gate.weight
    216:   58720256 |  4096, 14336,     1,     1 | F16     | blk.21.ffn_up.weight
    217:       4096 |  4096,     1,     1,     1 | F32     | blk.21.ffn_norm.weight
    218:    4194304 |  4096,  1024,     1,     1 | F16     | blk.21.attn_k.weight
    219:   16777216 |  4096,  4096,     1,     1 | F16     | blk.21.attn_output.weight
    220:   16777216 |  4096,  4096,     1,     1 | F16     | blk.21.attn_q.weight
    221:         64 |    64,     1,     1,     1 | F32     | blk.21.attn_rot_embd
    222:    4194304 |  4096,  1024,     1,     1 | F16     | blk.21.attn_v.weight
    223:       4096 |  4096,     1,     1,     1 | F32     | blk.22.attn_norm.weight
    224:   58720256 | 14336,  4096,     1,     1 | F16     | blk.22.ffn_down.weight
    225:   58720256 |  4096, 14336,     1,     1 | F16     | blk.22.ffn_gate.weight
    226:   58720256 |  4096, 14336,     1,     1 | F16     | blk.22.ffn_up.weight
    227:       4096 |  4096,     1,     1,     1 | F32     | blk.22.ffn_norm.weight
    228:    4194304 |  4096,  1024,     1,     1 | F16     | blk.22.attn_k.weight
    229:   16777216 |  4096,  4096,     1,     1 | F16     | blk.22.attn_output.weight
    230:   16777216 |  4096,  4096,     1,     1 | F16     | blk.22.attn_q.weight
    231:         64 |    64,     1,     1,     1 | F32     | blk.22.attn_rot_embd
    232:    4194304 |  4096,  1024,     1,     1 | F16     | blk.22.attn_v.weight
    233:       4096 |  4096,     1,     1,     1 | F32     | blk.23.attn_norm.weight
    234:   58720256 | 14336,  4096,     1,     1 | F16     | blk.23.ffn_down.weight
    235:   58720256 |  4096, 14336,     1,     1 | F16     | blk.23.ffn_gate.weight
    236:   58720256 |  4096, 14336,     1,     1 | F16     | blk.23.ffn_up.weight
    237:       4096 |  4096,     1,     1,     1 | F32     | blk.23.ffn_norm.weight
    238:    4194304 |  4096,  1024,     1,     1 | F16     | blk.23.attn_k.weight
    239:   16777216 |  4096,  4096,     1,     1 | F16     | blk.23.attn_output.weight
    240:   16777216 |  4096,  4096,     1,     1 | F16     | blk.23.attn_q.weight
    241:         64 |    64,     1,     1,     1 | F32     | blk.23.attn_rot_embd
    242:    4194304 |  4096,  1024,     1,     1 | F16     | blk.23.attn_v.weight
    243:       4096 |  4096,     1,     1,     1 | F32     | blk.24.attn_norm.weight
    244:   58720256 | 14336,  4096,     1,     1 | F16     | blk.24.ffn_down.weight
    245:   58720256 |  4096, 14336,     1,     1 | F16     | blk.24.ffn_gate.weight
    246:   58720256 |  4096, 14336,     1,     1 | F16     | blk.24.ffn_up.weight
    247:       4096 |  4096,     1,     1,     1 | F32     | blk.24.ffn_norm.weight
    248:    4194304 |  4096,  1024,     1,     1 | F16     | blk.24.attn_k.weight
    249:   16777216 |  4096,  4096,     1,     1 | F16     | blk.24.attn_output.weight
    250:   16777216 |  4096,  4096,     1,     1 | F16     | blk.24.attn_q.weight
    251:         64 |    64,     1,     1,     1 | F32     | blk.24.attn_rot_embd
    252:    4194304 |  4096,  1024,     1,     1 | F16     | blk.24.attn_v.weight
    253:       4096 |  4096,     1,     1,     1 | F32     | blk.25.attn_norm.weight
    254:   58720256 | 14336,  4096,     1,     1 | F16     | blk.25.ffn_down.weight
    255:   58720256 |  4096, 14336,     1,     1 | F16     | blk.25.ffn_gate.weight
    256:   58720256 |  4096, 14336,     1,     1 | F16     | blk.25.ffn_up.weight
    257:       4096 |  4096,     1,     1,     1 | F32     | blk.25.ffn_norm.weight
    258:    4194304 |  4096,  1024,     1,     1 | F16     | blk.25.attn_k.weight
    259:   16777216 |  4096,  4096,     1,     1 | F16     | blk.25.attn_output.weight
    260:   16777216 |  4096,  4096,     1,     1 | F16     | blk.25.attn_q.weight
    261:         64 |    64,     1,     1,     1 | F32     | blk.25.attn_rot_embd
    262:    4194304 |  4096,  1024,     1,     1 | F16     | blk.25.attn_v.weight
    263:       4096 |  4096,     1,     1,     1 | F32     | blk.26.attn_norm.weight
    264:   58720256 | 14336,  4096,     1,     1 | F16     | blk.26.ffn_down.weight
    265:   58720256 |  4096, 14336,     1,     1 | F16     | blk.26.ffn_gate.weight
    266:   58720256 |  4096, 14336,     1,     1 | F16     | blk.26.ffn_up.weight
    267:       4096 |  4096,     1,     1,     1 | F32     | blk.26.ffn_norm.weight
    268:    4194304 |  4096,  1024,     1,     1 | F16     | blk.26.attn_k.weight
    269:   16777216 |  4096,  4096,     1,     1 | F16     | blk.26.attn_output.weight
    270:   16777216 |  4096,  4096,     1,     1 | F16     | blk.26.attn_q.weight
    271:         64 |    64,     1,     1,     1 | F32     | blk.26.attn_rot_embd
    272:    4194304 |  4096,  1024,     1,     1 | F16     | blk.26.attn_v.weight
    273:       4096 |  4096,     1,     1,     1 | F32     | blk.27.attn_norm.weight
    274:   58720256 | 14336,  4096,     1,     1 | F16     | blk.27.ffn_down.weight
    275:   58720256 |  4096, 14336,     1,     1 | F16     | blk.27.ffn_gate.weight
    276:   58720256 |  4096, 14336,     1,     1 | F16     | blk.27.ffn_up.weight
    277:       4096 |  4096,     1,     1,     1 | F32     | blk.27.ffn_norm.weight
    278:    4194304 |  4096,  1024,     1,     1 | F16     | blk.27.attn_k.weight
    279:   16777216 |  4096,  4096,     1,     1 | F16     | blk.27.attn_output.weight
    280:   16777216 |  4096,  4096,     1,     1 | F16     | blk.27.attn_q.weight
    281:         64 |    64,     1,     1,     1 | F32     | blk.27.attn_rot_embd
    282:    4194304 |  4096,  1024,     1,     1 | F16     | blk.27.attn_v.weight
    283:       4096 |  4096,     1,     1,     1 | F32     | blk.28.attn_norm.weight
    284:   58720256 | 14336,  4096,     1,     1 | F16     | blk.28.ffn_down.weight
    285:   58720256 |  4096, 14336,     1,     1 | F16     | blk.28.ffn_gate.weight
    286:   58720256 |  4096, 14336,     1,     1 | F16     | blk.28.ffn_up.weight
    287:       4096 |  4096,     1,     1,     1 | F32     | blk.28.ffn_norm.weight
    288:    4194304 |  4096,  1024,     1,     1 | F16     | blk.28.attn_k.weight
    289:   16777216 |  4096,  4096,     1,     1 | F16     | blk.28.attn_output.weight
    290:   16777216 |  4096,  4096,     1,     1 | F16     | blk.28.attn_q.weight
    291:         64 |    64,     1,     1,     1 | F32     | blk.28.attn_rot_embd
    292:    4194304 |  4096,  1024,     1,     1 | F16     | blk.28.attn_v.weight
    293:       4096 |  4096,     1,     1,     1 | F32     | blk.29.attn_norm.weight
    294:   58720256 | 14336,  4096,     1,     1 | F16     | blk.29.ffn_down.weight
    295:   58720256 |  4096, 14336,     1,     1 | F16     | blk.29.ffn_gate.weight
    296:   58720256 |  4096, 14336,     1,     1 | F16     | blk.29.ffn_up.weight
    297:       4096 |  4096,     1,     1,     1 | F32     | blk.29.ffn_norm.weight
    298:    4194304 |  4096,  1024,     1,     1 | F16     | blk.29.attn_k.weight
    299:   16777216 |  4096,  4096,     1,     1 | F16     | blk.29.attn_output.weight
    300:   16777216 |  4096,  4096,     1,     1 | F16     | blk.29.attn_q.weight
    301:         64 |    64,     1,     1,     1 | F32     | blk.29.attn_rot_embd
    302:    4194304 |  4096,  1024,     1,     1 | F16     | blk.29.attn_v.weight
    303:       4096 |  4096,     1,     1,     1 | F32     | blk.30.attn_norm.weight
    304:   58720256 | 14336,  4096,     1,     1 | F16     | blk.30.ffn_down.weight
    305:   58720256 |  4096, 14336,     1,     1 | F16     | blk.30.ffn_gate.weight
    306:   58720256 |  4096, 14336,     1,     1 | F16     | blk.30.ffn_up.weight
    307:       4096 |  4096,     1,     1,     1 | F32     | blk.30.ffn_norm.weight
    308:    4194304 |  4096,  1024,     1,     1 | F16     | blk.30.attn_k.weight
    309:   16777216 |  4096,  4096,     1,     1 | F16     | blk.30.attn_output.weight
    310:   16777216 |  4096,  4096,     1,     1 | F16     | blk.30.attn_q.weight
    311:         64 |    64,     1,     1,     1 | F32     | blk.30.attn_rot_embd
    312:    4194304 |  4096,  1024,     1,     1 | F16     | blk.30.attn_v.weight
    313:       4096 |  4096,     1,     1,     1 | F32     | blk.31.attn_norm.weight
    314:   58720256 | 14336,  4096,     1,     1 | F16     | blk.31.ffn_down.weight
    315:   58720256 |  4096, 14336,     1,     1 | F16     | blk.31.ffn_gate.weight
    316:   58720256 |  4096, 14336,     1,     1 | F16     | blk.31.ffn_up.weight
    317:       4096 |  4096,     1,     1,     1 | F32     | blk.31.ffn_norm.weight
    318:    4194304 |  4096,  1024,     1,     1 | F16     | blk.31.attn_k.weight
    319:   16777216 |  4096,  4096,     1,     1 | F16     | blk.31.attn_output.weight
    320:   16777216 |  4096,  4096,     1,     1 | F16     | blk.31.attn_q.weight
    321:         64 |    64,     1,     1,     1 | F32     | blk.31.attn_rot_embd
    322:    4194304 |  4096,  1024,     1,     1 | F16     | blk.31.attn_v.weight
    323:       4096 |  4096,     1,     1,     1 | F32     | output_norm.weight
@slaren
Copy link
Collaborator

slaren commented May 2, 2024

The extra tensors are the attn_rot_embd in every layer. I think these are a RoPE lookup table that shouldn't be exported, but for some reason they have a mapping in gguf. Not sure why is that.

@bartowski1182
Copy link
Contributor Author

@slaren thanks for the insight, is there anything I can do as a temporary fix? even if it's hardcoding some values here and there

Interestingly, if I use convert.py, it converts it in a way that it's happy with (besides the missing pretokenizer)

@slaren
Copy link
Collaborator

slaren commented May 2, 2024

This should work. I guess you also added the tokenizer hash as llama3?

--- a/convert-hf-to-gguf.py
+++ b/convert-hf-to-gguf.py
@@ -1426,8 +1426,9 @@ class LlamaModel(Model):
         n_experts = self.hparams.get("num_local_experts")
         experts = dict()
         for name, data_torch in self.get_tensors():
             # we don't need these
-            if name.endswith((".attention.masked_bias", ".attention.bias", ".attention.rotary_emb.inv_freq")):
+            if name.endswith((".attention.masked_bias", ".attention.bias", ".rotary_emb.inv_freq")):
                 continue

             old_dtype = data_torch.dtype

@bartowski1182
Copy link
Contributor Author

Actually the tokenizer hash lined up already (after I fixed their tokenizer.json to match latest version from meta)

@bartowski1182
Copy link
Contributor Author

with that change, so far so good! imatrix started up no issue and I seem to be have the right pre-tokenizer

Any reason to replace .attention.rotary_emb.inv_freq with .rotary_emb.inv_freq vs just adding it to the list?

@slaren
Copy link
Collaborator

slaren commented May 2, 2024

The name of the tensor is model.layers.x.self_attn.rotary_emb.inv_freq, not .attention.. Removing the wrong prefix from the check seemed like the easiest solution.

@bartowski1182
Copy link
Contributor Author

Everything looks good, it's generating properly and tokenizing correctly based on the results of ./tokenize

Thanks for your help @slaren, would you like me to open a PR to make your proposed change? Or would you rather get to it yourself

@slaren
Copy link
Collaborator

slaren commented May 2, 2024

A PR would be very welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants