Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Add model vocab support #7117

Closed
wants to merge 70 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
1a9cf92
feat: Add stablelm vocab to gguf update
teleprint-me May 7, 2024
1355c24
chore: Apply update to get_vocab_base_pre method
teleprint-me May 7, 2024
e71789e
feat: Add stablelm vocab
teleprint-me May 7, 2024
8490705
feat: Add generate vocab shell script
teleprint-me May 7, 2024
d8694af
refactor: Clean up and organize url and dir paths
teleprint-me May 8, 2024
9d2fcd0
tests: Add test for qwen tokenizer
teleprint-me May 8, 2024
b8f8a96
feat: Add qwen pattern and tokenizer
teleprint-me May 8, 2024
3ae6c17
chore: Add missing command-r gguf vocab
teleprint-me May 8, 2024
4155e86
feat: Add support for qwen tokenizer
teleprint-me May 8, 2024
cbfed5b
chore: Update generate-vocab.sh script
teleprint-me May 8, 2024
f7dda38
note: Time of check to time of use
teleprint-me May 8, 2024
670e1c3
fix: Attempt to remove potential TOCTOU
teleprint-me May 8, 2024
69efb59
fix: Apply proper paths for handling qwen
teleprint-me May 8, 2024
906c3f7
fix: Apply fix to generate-vocab.sh script
teleprint-me May 8, 2024
0478552
chore: Add tiktoken to convert requirements
teleprint-me May 8, 2024
ccafb87
chore: Add model vocab
teleprint-me May 8, 2024
a6c5d5d
Merge branch 'master' into add-stablelm-hash
teleprint-me May 8, 2024
ca8acea
chore: Group qwen models together
teleprint-me May 8, 2024
c05d2a2
chore: Fix enumeration for qwen, olmo, and dbrx
teleprint-me May 8, 2024
17f2243
patch: Apply patch to fix config and SPM retrieval
teleprint-me May 8, 2024
de3d9e3
patch: Apply fix for downloading related model files
teleprint-me May 8, 2024
bc924e0
Merge branch 'master' into add-stablelm-hash
teleprint-me May 8, 2024
fc0007e
Merge branch 'master' into add-stablelm-hash
teleprint-me May 13, 2024
932ab05
Remove qwen and fix mauled imports
teleprint-me May 13, 2024
58551d0
chore: Apply updates to vocab models
teleprint-me May 13, 2024
4067536
change default temperature of OAI compat API from 0 to 1 (#7226)
Kartoffelsaft May 13, 2024
cfeb962
convert.py: Outfile default name change and additional metadata suppo…
mofosyne May 13, 2024
eaa8457
llama : rename jina tokenizers to v2 (#7249)
JoanFM May 13, 2024
3fa36ac
[SYCL] rm wait() (#7233)
arthw May 13, 2024
89550bb
perplexity: add BF16 vs. FP16 results (#7150)
JohannesGaessler May 13, 2024
d8b6869
llava-cli: fix base64 prompt (#7248)
Adriankhl May 13, 2024
7d85ea8
llama : less KV padding when FA is off (#7257)
ggerganov May 13, 2024
3dfaa1f
convert-hf : support direct Q8_0 conversion (#7234)
compilade May 13, 2024
95390eb
docs: Fix typo and update description for --embeddings flag (#7026)
louixs May 14, 2024
c7b8254
Add left recursion check: quit early instead of going into an infinit…
nuchi May 14, 2024
a94019b
move ndk code to a new library (#6951)
eltonkola May 14, 2024
e30a369
llama : disable pipeline parallelism with nkvo (#7265)
slaren May 14, 2024
7a2f768
ggml : add RPC backend (#6829)
rgerganov May 14, 2024
04a7f32
Revert "move ndk code to a new library (#6951)" (#7282)
mofosyne May 14, 2024
58962a2
server: free sampling contexts on exit (#7264)
stevegrubb May 14, 2024
37e2593
ggml : optimize for ppc64le using VSX intrinsics (ggml/784)
penghongbo May 12, 2024
b95c202
ggml : expose SSE3 and SSSE3 for MSVC when AVX is available (whisper/…
przemoc May 8, 2024
da894f9
ggml : try fix ppc64 (whisper/0)
ggerganov May 12, 2024
48296bf
metal : tune soft_max number of threads (whisper/0)
ggerganov May 13, 2024
2022675
sync : ggml
ggerganov May 14, 2024
4bc6f6e
metal : support FA without mask + add asserts (#7278)
ggerganov May 14, 2024
02f4122
script : sync ggml-rpc
ggerganov May 14, 2024
53332ff
server bench: fix bench not waiting for model load (#7284)
JohannesGaessler May 15, 2024
79bc1ea
ggml : add `ggml_upscale_ext` (ggml/814)
balisujohn May 15, 2024
4aae3a5
sync : ggml
ggerganov May 15, 2024
f3e8fc1
embedding : free the batch after execution (#7297)
dm4 May 15, 2024
da26e4d
Add missing " (#7303)
AidanBeltonS May 15, 2024
6fb91c1
ggml : tag ggml_tensor::backend as deprecated (#7290)
slaren May 15, 2024
dda1347
Avoid unnecessarily disabling CUDA graphs (#7302)
agray3 May 15, 2024
d1e2b6e
ggml : use dynamic thread scheduling for matrix multiplication (#6915)
kunnis May 15, 2024
41b9e5c
readme : remove stray double quote (#7310)
danbev May 15, 2024
b953ca3
Add support for properly optimized Windows ARM64 builds with LLVM and…
max-krasnyansky May 16, 2024
ad34bee
ci: fix bin/Release path for windows-arm64 builds (#7317)
max-krasnyansky May 16, 2024
a8d948c
doc: add references to hugging face GGUF-my-repo quantisation web too…
Vaibhavs10 May 16, 2024
d0a9c31
grammar, json, llama: replace push on emplace if it possible (#7273)
GermanAizek May 16, 2024
c7a926f
convert : get general.name from model dir, not its parent (#5615)
cebtenzzre May 16, 2024
3d210da
rpc : add command line arg for specifying backend memory
rgerganov May 15, 2024
99d5b28
rpc : get available mem for the CPU backend
rgerganov May 15, 2024
657f980
Revert "server bench: fix bench not waiting for model load (#7284)" (…
phymbert May 16, 2024
cd0e3d5
[Server] Added --verbose option to README [no ci] (#7335)
reuank May 17, 2024
e7c7ae8
patch: Add pre-tokenizer metadata to phi-2
teleprint-me May 17, 2024
9a81faf
patch: Fix jina vocab generation
teleprint-me May 17, 2024
8aa4937
feat: Make number of experts configurable
teleprint-me May 17, 2024
a7e0042
chore: Update gguf vocabularies
teleprint-me May 17, 2024
9269594
Merge branch 'master' into add-stablelm-hash
teleprint-me May 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 20 additions & 9 deletions convert-hf-to-gguf-update.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,17 +22,16 @@
# TODO: generate tokenizer tests for llama.cpp
#

import json
import logging
import os
import pathlib
import re

import requests
import sys
import json

from hashlib import sha256
from enum import IntEnum, auto
from hashlib import sha256

import requests
from transformers import AutoTokenizer

logging.basicConfig(level=logging.DEBUG)
Expand Down Expand Up @@ -72,6 +71,12 @@ class TOKENIZER_TYPE(IntEnum):
{"name": "mpt", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/mosaicml/mpt-7b", },
{"name": "starcoder", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/bigcode/starcoder2-3b", },
{"name": "gpt-2", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/openai-community/gpt2", },
{"name": "phi", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/microsoft/phi-1", },
{"name": "stablelm", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b", },
{"name": "mistral-bpe", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2", },
{"name": "mistral-spm", "tokt": TOKENIZER_TYPE.SPM, "repo": "https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2", },
{"name": "mixtral-bpe", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1", },
{"name": "mixtral-spm", "tokt": TOKENIZER_TYPE.SPM, "repo": "https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1", },
{"name": "refact", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/smallcloudai/Refact-1_6-base", },
{"name": "command-r", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/CohereForAI/c4ai-command-r-v01", },
{"name": "qwen2", "tokt": TOKENIZER_TYPE.BPE, "repo": "https://huggingface.co/Qwen/Qwen1.5-7B", },
Expand Down Expand Up @@ -314,12 +319,18 @@ def get_vocab_base_pre(self, tokenizer) -> str:
logger.info(f"Tests for {name} written in ./models/ggml-vocab-{name}.gguf.*")

# generate commands for creating vocab files

logger.info("\nRun the following commands to generate the vocab files for testing:\n")
shscript = "#!/usr/bin/env bash\n\n"

for model in models:
name = model["name"]
tmpline = f"python3 convert-hf-to-gguf.py models/tokenizers/{name} --outfile models/ggml-vocab-{name}.gguf --vocab-only\n"
shscript += tmpline
logging.info(tmpline.strip())

print(f"python3 convert-hf-to-gguf.py models/tokenizers/{name}/ --outfile models/ggml-vocab-{name}.gguf --vocab-only") # noqa: NP100
with open("generate-vocab.sh", "w", encoding="utf-8") as f:
f.writelines(shscript)
logging.info(f"Wrote {len(shscript)} bytes to generate-vocab.sh")

logger.info("\n")
logging.info("Run the following command to generate the vocab files for testing:")
logging.info("Enable execution: chmod +x generate-vocab.sh")
logging.info("Execute with ./generate-vocab.sh")
29 changes: 26 additions & 3 deletions convert-hf-to-gguf.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,17 +2,27 @@

from __future__ import annotations

import logging
import argparse
import contextlib
import json
import logging
import os
import re
import sys
from enum import IntEnum
from pathlib import Path
from hashlib import sha256
from typing import TYPE_CHECKING, Any, Callable, ContextManager, Iterable, Iterator, Sequence, TypeVar, cast
from pathlib import Path
from typing import (
TYPE_CHECKING,
Any,
Callable,
ContextManager,
Iterable,
Iterator,
Sequence,
TypeVar,
cast,
)

import numpy as np
import torch
Expand Down Expand Up @@ -446,6 +456,18 @@ def get_vocab_base_pre(self, tokenizer) -> str:
if chkhsh == "3ce83efda5659b07b1ad37ca97ca5797ea4285d9b9ab0dc679e4a720c9da7454":
# ref: https://huggingface.co/openai-community/gpt2
res = "gpt-2"
if chkhsh == "fcace8b9cac38ce847670c970cd5892031a753a1ef381abd1d9af00f713da085":
# ref: https://huggingface.co/microsoft/phi-1
res = "phi"
if chkhsh == "32d85c31273f8019248f2559fed492d929ea28b17e51d81d3bb36fff23ca72b3":
# ref: https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b
res = "stablelm"
if chkhsh == "e750a9b14dfed9b73287639bd1ecda50c38fa6011138f2f609804c6dab9ed5c2":
# ref: https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2
res = "mistral-bpe"
if chkhsh == "e750a9b14dfed9b73287639bd1ecda50c38fa6011138f2f609804c6dab9ed5c2":
# ref: https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1
res = "mixtral-bpe"
if chkhsh == "6221ad2852e85ce96f791f476e0b390cf9b474c9e3d1362f53a24a06dc8220ff":
# ref: https://huggingface.co/smallcloudai/Refact-1_6-base
res = "refact"
Expand Down Expand Up @@ -1703,6 +1725,7 @@ def set_gguf_parameters(self):
n_head = self.find_hparam(["num_attention_heads", "n_head"])

self.gguf_writer.add_name("Phi2")
self.gguf_writer.add_tokenizer_pre("gpt-2")
self.gguf_writer.add_context_length(self.find_hparam(["n_positions", "max_position_embeddings"]))

self.gguf_writer.add_embedding_length(n_embd)
Expand Down
26 changes: 26 additions & 0 deletions generate-vocab.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/usr/bin/env bash

python3 convert-hf-to-gguf.py models/tokenizers/llama-spm --outfile models/ggml-vocab-llama-spm.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/llama-bpe --outfile models/ggml-vocab-llama-bpe.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/phi-3 --outfile models/ggml-vocab-phi-3.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/deepseek-llm --outfile models/ggml-vocab-deepseek-llm.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/deepseek-coder --outfile models/ggml-vocab-deepseek-coder.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/falcon --outfile models/ggml-vocab-falcon.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/bert-bge --outfile models/ggml-vocab-bert-bge.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/mpt --outfile models/ggml-vocab-mpt.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/starcoder --outfile models/ggml-vocab-starcoder.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/gpt-2 --outfile models/ggml-vocab-gpt-2.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/phi --outfile models/ggml-vocab-phi.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/stablelm --outfile models/ggml-vocab-stablelm.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/mistral-bpe --outfile models/ggml-vocab-mistral-bpe.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/mistral-spm --outfile models/ggml-vocab-mistral-spm.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/mixtral-bpe --outfile models/ggml-vocab-mixtral-bpe.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/mixtral-spm --outfile models/ggml-vocab-mixtral-spm.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/refact --outfile models/ggml-vocab-refact.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/command-r --outfile models/ggml-vocab-command-r.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/qwen2 --outfile models/ggml-vocab-qwen2.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/olmo --outfile models/ggml-vocab-olmo.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/dbrx --outfile models/ggml-vocab-dbrx.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/jina-v2-en --outfile models/ggml-vocab-jina-v2-en.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/jina-v2-es --outfile models/ggml-vocab-jina-v2-es.gguf --vocab-only
python3 convert-hf-to-gguf.py models/tokenizers/jina-v2-de --outfile models/ggml-vocab-jina-v2-de.gguf --vocab-only
3 changes: 1 addition & 2 deletions gguf-py/gguf/tensor_mapping.py
Original file line number Diff line number Diff line change
Expand Up @@ -384,7 +384,7 @@ class TensorNameMap:

mapping: dict[str, tuple[MODEL_TENSOR, str]]

def __init__(self, arch: MODEL_ARCH, n_blocks: int):
def __init__(self, arch: MODEL_ARCH, n_blocks: int, n_experts: int = 60):
self.mapping = {}
for tensor, keys in self.mappings_cfg.items():
if tensor not in MODEL_TENSORS[arch]:
Expand All @@ -398,7 +398,6 @@ def __init__(self, arch: MODEL_ARCH, n_blocks: int):
if tensor not in MODEL_TENSORS[arch]:
continue
# TODO: make this configurable
n_experts = 60
for xid in range(n_experts):
tensor_name = TENSOR_NAMES[tensor].format(bid = bid, xid = xid)
self.mapping[tensor_name] = (tensor, tensor_name)
Expand Down
9 changes: 9 additions & 0 deletions llama.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4458,6 +4458,9 @@ static void llm_load_vocab(
} else if (
tokenizer_pre == "command-r") {
vocab.type_pre = LLAMA_VOCAB_PRE_TYPE_COMMAND_R;
} else if (
tokenizer_pre == "qwen") {
vocab.type_pre = LLAMA_VOCAB_PRE_TYPE_QWEN;
} else if (
tokenizer_pre == "qwen2") {
vocab.type_pre = LLAMA_VOCAB_PRE_TYPE_QWEN2;
Expand Down Expand Up @@ -12354,6 +12357,12 @@ struct llm_tokenizer_bpe {
"'s|'t|'re|'ve|'m|'ll|'d| ?\\p{L}+| ?\\p{N}+| ?[^\\s\\p{L}\\p{N}]+|\\s+(?!\\S)",
});
break;
case LLAMA_VOCAB_PRE_TYPE_QWEN:
word_collection = unicode_regex_split(text, {
// original regex from tokenization_qwen.py
"(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\\r\\n\\p{L}\\p{N}]?\\p{L}+|\\p{N}| ?[^\\s\\p{L}\\p{N}]+[\\r\\n]*|\\s*[\\r\\n]+|\\s+(?!\\S)|\\s+",
});
break;
case LLAMA_VOCAB_PRE_TYPE_QWEN2:
word_collection = unicode_regex_split(text, {
// original regex from tokenizer.json
Expand Down
7 changes: 4 additions & 3 deletions llama.h
Original file line number Diff line number Diff line change
Expand Up @@ -81,9 +81,10 @@ extern "C" {
LLAMA_VOCAB_PRE_TYPE_GPT2 = 7,
LLAMA_VOCAB_PRE_TYPE_REFACT = 8,
LLAMA_VOCAB_PRE_TYPE_COMMAND_R = 9,
LLAMA_VOCAB_PRE_TYPE_QWEN2 = 10,
LLAMA_VOCAB_PRE_TYPE_OLMO = 11,
LLAMA_VOCAB_PRE_TYPE_DBRX = 12,
LLAMA_VOCAB_PRE_TYPE_QWEN = 10,
LLAMA_VOCAB_PRE_TYPE_QWEN2 = 11,
LLAMA_VOCAB_PRE_TYPE_OLMO = 12,
LLAMA_VOCAB_PRE_TYPE_DBRX = 13,
};

// note: these values should be synchronized with ggml_rope
Expand Down
Binary file modified models/ggml-vocab-bert-bge.gguf
Binary file not shown.
Binary file modified models/ggml-vocab-command-r.gguf
Binary file not shown.
Binary file modified models/ggml-vocab-deepseek-coder.gguf
Binary file not shown.
Binary file modified models/ggml-vocab-deepseek-llm.gguf
Binary file not shown.
Binary file modified models/ggml-vocab-falcon.gguf
Binary file not shown.
Binary file modified models/ggml-vocab-gpt-2.gguf
Binary file not shown.
Binary file modified models/ggml-vocab-llama-bpe.gguf
Binary file not shown.
2 changes: 0 additions & 2 deletions models/ggml-vocab-llama-bpe.gguf.inp
Original file line number Diff line number Diff line change
Expand Up @@ -104,5 +104,3 @@ __ggml_vocab_test__

🚀 (normal) 😶‍🌫️ (multiple emojis concatenated) ✅ 🦙🦙 3 33 333 3333 33333 333333 3333333 33333333 3.3 3..3 3...3 កាន់តែពិសេសអាច😁 ?我想在apple工作1314151天~ ------======= нещо на Български ''''''```````""""......!!!!!!?????? I've been 'told he's there, 'RE you sure? 'M not sure I'll make it, 'D you like some tea? We'Ve a'lL
__ggml_vocab_test__
Việt
__ggml_vocab_test__
1 change: 0 additions & 1 deletion models/ggml-vocab-llama-bpe.gguf.out
Original file line number Diff line number Diff line change
Expand Up @@ -41,4 +41,3 @@
8765 8765 1644
8765 8765 8765
198 4815 15073 66597 8004 1602 2355 79772 11187 9468 248 222 320 8416 8 27623 114 102470 9468 234 104 31643 320 36773 100166 98634 8 26602 227 11410 99 247 9468 99 247 220 18 220 1644 220 8765 220 8765 18 220 8765 1644 220 8765 8765 220 8765 8765 18 220 8765 8765 1644 220 18 13 18 220 18 497 18 220 18 1131 18 220 21549 222 98629 241 45358 233 21549 237 45358 224 21549 244 21549 115 21549 253 45358 223 21549 253 21549 95 98629 227 76460 223 949 37046 101067 19000 23182 102301 9263 18136 16 36827 21909 56560 54337 19175 102118 13373 64571 34694 3114 112203 80112 3436 106451 14196 14196 74694 3089 3089 29249 17523 3001 27708 7801 358 3077 1027 364 83 820 568 596 1070 11 364 793 499 2771 30 364 44 539 2771 358 3358 1304 433 11 364 35 499 1093 1063 15600 30 1226 6 43712 264 64966 43
101798
Binary file modified models/ggml-vocab-llama-spm.gguf
Binary file not shown.
Binary file modified models/ggml-vocab-mpt.gguf
Binary file not shown.
Binary file modified models/ggml-vocab-phi-3.gguf
Binary file not shown.
Binary file modified models/ggml-vocab-qwen2.gguf
Binary file not shown.
Binary file modified models/ggml-vocab-refact.gguf
Binary file not shown.
Binary file modified models/ggml-vocab-stablelm.gguf
Binary file not shown.
Binary file modified models/ggml-vocab-starcoder.gguf
Binary file not shown.
1 change: 1 addition & 0 deletions requirements/requirements-convert-hf-to-gguf-update.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
-r ./requirements-convert.txt
torch~=2.1.1
tiktoken~=0.6.0
Loading