llama : support StableLM 2 1.6B #5052

compilade · 2024-01-20T23:41:08Z

Stable LM 2 1.6B was recently released (see https://stability.ai/news/introducing-stable-lm-2). It's different enough from their older 3B model that it requires some changes in llama.cpp in order to work.

It's mostly the same model architecture as stablelm-3b-4e1t, but they seem to have added bias tensors (or whatever they are called) for Q, K, and V, so this is now also handled for the LLM_ARCH_STABLELM model type.

The tokenizer is also different from the stablelm-3b-4e1t; in StableLM 2, it is defined in the tiktoken format, in a very similar way than with the Qwen models.
To avoid unnecessary code duplication, I added _set_vocab_qwen to the Model class so that both Qwen and StableLM 2 could make their vocab in the same way.

In doing so, I noticed a bug in the previous implementation: all special tokens were named [PAD{id}]. This is because unlike in tokenizers.json, the special tokens for Qwen-style tokenizers are not a subset of the vocab. So special tokens could not be found in the reverse_vocab and were always named like padding tokens. Combining the added_vocab with the vocab when making the reverse_vocab fixes this. (this is not necessarily relevant for _set_vocab_gpt2, because in tokenizer.json, the vocab usually contains all tokens, including special ones)

In convert-hf-to-gguf.py, to know which kind of tokenizer to look for when converting a StableLMModel, I used the vocab size instead of something like the number of layers because Qwen-style tokenizers seem to have a lot more tokens than others, so it seems like a good enough heuristic for at least this specific case. A better way would perhaps be to check for the absence of tokenizer.json. (EDIT: now implemented this way (with the tokenizer.json presence check). It should have the same behavior as with the vocab size check (nothing in the actual conversion was changed, so resulting converted models are the same as before))

Oh, and since the tiktoken library is used when converting, I added it to the llama-python-extra package list in the nix package so that it's included when using a devShell like with nix develop .#default-extra.

Since I moved the code for Qwen's set_vocab, I recommend using git log -p --color-moved when reviewing this.

* convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}] * convert : refactor Qwen's set_vocab to use it for StableLM 2 too * nix : add tiktoken to llama-python-extra

brittlewis12 · 2024-01-21T04:18:37Z

great work! 🙌 worked great for me, was able to generate a full suite of k-quants + 8_0 & fp16, on huggingface!

fp16 conversion output

Loading model: stablelm-2-zephyr-1_6b
gguf: This GGUF file is for Little Endian only
Set model parameters
Set model tokenizer
gguf: Adding 100000 merge(s).
gguf: Setting special token type bos to 100257
gguf: Setting special token type eos to 100257
gguf: Setting special token type unk to 100257
gguf: Setting chat_template to {% for message in messages %}
{% if message['role'] == 'user' %}
{{ '<|user|>
' + message['content'] + eos_token }}
{% elif message['role'] == 'system' %}
{{ '<|system|>
' + message['content'] + eos_token }}
{% elif message['role'] == 'assistant' %}
{{ '<|assistant|>
'  + message['content'] + eos_token }}
{% endif %}
{% if loop.last and add_generation_prompt %}
{{ '<|assistant|>' }}
{% endif %}
{% endfor %}
Exporting model to 'stablelm-2-zephyr-1_6b/stablelm-2-zephyr-1_6b.fp16.gguf'
gguf: loading model part 'model.safetensors'
output.weight, n_dims = 2, torch.float16 --> float16
token_embd.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.0.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.0.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.0.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.0.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.0.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.0.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.0.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.0.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.0.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.0.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.0.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.1.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.1.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.1.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.1.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.1.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.1.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.1.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.1.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.1.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.1.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.1.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.10.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.10.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.10.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.10.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.10.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.10.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.10.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.10.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.10.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.10.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.10.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.10.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.10.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.10.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.11.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.11.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.11.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.11.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.11.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.11.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.11.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.11.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.11.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.11.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.11.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.11.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.11.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.11.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.12.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.12.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.12.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.12.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.12.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.12.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.12.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.12.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.12.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.12.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.12.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.12.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.12.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.12.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.13.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.13.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.13.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.13.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.13.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.13.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.13.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.13.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.13.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.13.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.13.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.13.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.13.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.13.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.14.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.14.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.14.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.14.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.14.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.14.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.14.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.14.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.14.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.14.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.14.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.14.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.14.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.14.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.15.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.15.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.15.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.15.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.15.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.15.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.15.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.15.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.15.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.15.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.15.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.15.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.15.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.15.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.16.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.16.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.16.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.16.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.16.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.16.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.16.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.16.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.16.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.16.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.16.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.16.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.16.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.16.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.17.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.17.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.17.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.17.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.17.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.17.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.17.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.17.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.17.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.17.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.17.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.17.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.17.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.17.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.18.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.18.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.18.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.18.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.18.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.18.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.18.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.18.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.18.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.18.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.18.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.18.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.18.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.18.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.19.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.19.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.19.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.19.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.19.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.19.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.19.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.19.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.19.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.19.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.19.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.19.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.19.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.19.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.2.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.2.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.2.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.2.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.2.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.2.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.2.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.2.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.2.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.2.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.2.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.2.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.2.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.2.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.20.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.20.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.20.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.20.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.20.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.20.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.20.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.20.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.20.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.20.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.20.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.20.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.20.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.20.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.21.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.21.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.21.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.21.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.21.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.21.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.21.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.21.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.21.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.21.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.21.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.21.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.21.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.21.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.22.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.22.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.22.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.22.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.22.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.22.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.22.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.22.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.22.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.22.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.22.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.22.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.22.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.22.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.23.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.23.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.23.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.23.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.23.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.23.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.23.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.23.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.23.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.23.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.23.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.23.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.23.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.23.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.3.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.3.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.3.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.3.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.3.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.3.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.3.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.3.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.3.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.3.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.3.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.3.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.3.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.3.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.4.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.4.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.4.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.4.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.4.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.4.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.4.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.4.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.4.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.4.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.4.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.4.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.4.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.4.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.5.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.5.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.5.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.5.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.5.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.5.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.5.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.5.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.5.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.5.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.5.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.5.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.5.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.5.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.6.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.6.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.6.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.6.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.6.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.6.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.6.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.6.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.6.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.6.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.6.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.6.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.6.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.6.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.7.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.7.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.7.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.7.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.7.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.7.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.7.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.7.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.7.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.7.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.7.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.7.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.7.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.7.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.8.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.8.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.8.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.8.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.8.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.8.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.8.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.8.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.8.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.8.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.8.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.8.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.8.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.8.attn_v.weight, n_dims = 2, torch.float16 --> float16
blk.9.attn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.9.attn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.9.ffn_down.weight, n_dims = 2, torch.float16 --> float16
blk.9.ffn_gate.weight, n_dims = 2, torch.float16 --> float16
blk.9.ffn_up.weight, n_dims = 2, torch.float16 --> float16
blk.9.ffn_norm.bias, n_dims = 1, torch.float16 --> float32
blk.9.ffn_norm.weight, n_dims = 1, torch.float16 --> float32
blk.9.attn_k.bias, n_dims = 1, torch.float16 --> float32
blk.9.attn_k.weight, n_dims = 2, torch.float16 --> float16
blk.9.attn_output.weight, n_dims = 2, torch.float16 --> float16
blk.9.attn_q.bias, n_dims = 1, torch.float16 --> float32
blk.9.attn_q.weight, n_dims = 2, torch.float16 --> float16
blk.9.attn_v.bias, n_dims = 1, torch.float16 --> float32
blk.9.attn_v.weight, n_dims = 2, torch.float16 --> float16
output_norm.bias, n_dims = 1, torch.float16 --> float32
output_norm.weight, n_dims = 1, torch.float16 --> float32
Model successfully exported to 'stablelm-2-zephyr-1_6b/stablelm-2-zephyr-1_6b.fp16.gguf'

ran conversions on colab

Separately, does it make sense to add tiktoken to requirements/requirements-convert.txt in this case?

I’m not sure what the typical approach is for model-specific dependencies like this, but it would seem if this is a new requirement for model conversion, perhaps it should be declared here. Or maybe a new file like persimmon?

thanks again!

cebtenzzre · 2024-01-21T15:25:56Z

Separately, does it make sense to add tiktoken to requirements/requirements-convert.txt in this case?

I’m not sure what the typical approach is for model-specific dependencies like this, but it would seem if this is a new requirement for model conversion, perhaps it should be declared here. Or maybe a new file like persimmon?

I think dependencies should only be added to requirements.txt if they are unconditionally required - conditional requirements should simply throw a clear exception if they are needed but not found. And persimmon is only a separate file because it wasn't working when the convert scripts were merged; new code should go in convert-hf-to-gguf.py.

…zer loader It's a less arbitrary heuristic than the vocab size.

compilade · 2024-01-21T17:08:27Z

I think dependencies should only be added to requirements.txt if they are unconditionally required - conditional requirements should simply throw a clear exception if they are needed but not found.

Agreed, and it already throws an helpful exception when tiktoken is not installed (thanks to transformers which checks the imports) :
ImportError: This modeling file requires the following packages that were not found in your environment: tiktoken. Run `pip install tiktoken`

And to be clear, I added the tiktoken package to the *-extra devShells because with nix, the only way to add a package to a Python environment is to rebuild that environment with the new package, unlike with venv where a simple pip install tiktoken is doable when an error is encountered.

Running nix shell nixpkgs-unstable#python3Packages.tiktoken does not make it available to Python; a Python package has to be included when the Python environment is built (with python3.withPackages as in llama-python-extra).

I assume the *-extra devShells (which include the llama-python-extra Python environment) are for Nix users who want all possibly required dependencies for the convert scripts, or else they would be using the leaner llama-python environment. At least, that's how I use it.

Green-Sky · 2024-01-22T13:16:31Z

I assume the *-extra devShells (which include the llama-python-extra Python environment) are for Nix users who want all possibly required dependencies for the convert scripts, or else they would be using the leaner llama-python environment. At least, that's how I use it.

Yes, that is why it exists. Originally ~~transformer~~torch also pulled in cuda etc, so it was very heavy.

* llama : support StableLM 2 1.6B * convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}] * convert : refactor Qwen's set_vocab to use it for StableLM 2 too * nix : add tiktoken to llama-python-extra * convert : use presence of tokenizer.json to determine StableLM tokenizer loader It's a less arbitrary heuristic than the vocab size.

llama : support StableLM 2 1.6B

a11f149

* convert : fix Qwen's set_vocab wrongly naming all special tokens [PAD{id}] * convert : refactor Qwen's set_vocab to use it for StableLM 2 too * nix : add tiktoken to llama-python-extra

apepkuss mentioned this pull request Jan 21, 2024

Model list on https://huggingface.co/second-state LlamaEdge/LlamaEdge#71

Open

convert : use presence of tokenizer.json to determine StableLM tokeni…

b7b53a5

…zer loader It's a less arbitrary heuristic than the vocab size.

ggerganov approved these changes Jan 22, 2024

View reviewed changes

ggerganov merged commit d6bd4d4 into ggerganov:master Jan 22, 2024
40 of 44 checks passed

compilade mentioned this pull request Jan 22, 2024

Qwen-72B-Chat conversion script does not treat <|im_start|> and <|im_end|> correctly. #4331

Closed

4 tasks

compilade mentioned this pull request May 21, 2024

convert.py fails importing a new model architecture #7406

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : support StableLM 2 1.6B #5052

llama : support StableLM 2 1.6B #5052

compilade commented Jan 20, 2024 •

edited

Loading

brittlewis12 commented Jan 21, 2024 •

edited

Loading

cebtenzzre commented Jan 21, 2024

compilade commented Jan 21, 2024

Green-Sky commented Jan 22, 2024 •

edited

Loading

llama : support StableLM 2 1.6B #5052

llama : support StableLM 2 1.6B #5052

Conversation

compilade commented Jan 20, 2024 • edited Loading

brittlewis12 commented Jan 21, 2024 • edited Loading

cebtenzzre commented Jan 21, 2024

compilade commented Jan 21, 2024

Green-Sky commented Jan 22, 2024 • edited Loading

compilade commented Jan 20, 2024 •

edited

Loading

brittlewis12 commented Jan 21, 2024 •

edited

Loading

Green-Sky commented Jan 22, 2024 •

edited

Loading