gguf-py: Refactor and allow reading/modifying existing GGUF files #3981

KerfuffleV2 · 2023-11-07T21:02:32Z

This is a big one.

Splits gguf Python package into logical modules.
Makes the constant keys hierarchical (old constants left for compatibility)
Some cleanups in the writer, especially when dealing with the endian struct format stuff.
I realized the local GGUF thing in the scripts adding gguf-py/gguf was silly because that makes it load gguf.py specifically, rather than gguf as a package. Fixed that.
Finally, GGUF file reading support!

The GGUF file reading is done using numpy's memmap feature. It also supports endian swapped views. In addition, you can map the file read/write and it's possible to use GGUFReader to get a writeable view of the GGUF data (obviously it's necessary to be very careful when writing it, and currently doing anything that changes the size of a field isn't really possible). However, doing something like being able to change a scalar value without rewriting the whole file is nice.

There is an example of dumping a GGUF file at the end of gguf_reader.py. Here is example output:

edit: This is now a separate script in gguf-py/scripts/gguf-dump.py

$ python gguf-py/gguf/gguf_reader.py /blah/yi-q4_k_m.gguf
* Loading: /blah/yi-q4_k_m.gguf

* Dumping 23 key/value pair(s)
      1: UINT32     |        1 | GGUF.version = 3
      2: UINT64     |        1 | GGUF.tensor_count = 543
      3: UINT64     |        1 | GGUF.kv_count = 20
      4: STRING     |        1 | general.architecture = 'llama'
      5: STRING     |        1 | general.name = 'LLaMA v2'
      6: UINT32     |        1 | llama.context_length = 4096
      7: UINT32     |        1 | llama.embedding_length = 7168
      8: UINT32     |        1 | llama.block_count = 60
      9: UINT32     |        1 | llama.feed_forward_length = 20480
     10: UINT32     |        1 | llama.rope.dimension_count = 128
     11: UINT32     |        1 | llama.attention.head_count = 56
     12: UINT32     |        1 | llama.attention.head_count_kv = 8
     13: FLOAT32    |        1 | llama.attention.layer_norm_rms_epsilon = 9.999999747378752e-06
     14: FLOAT32    |        1 | llama.rope.freq_base = 5000000.0
     15: UINT32     |        1 | general.file_type = 15
     16: STRING     |        1 | tokenizer.ggml.model = 'llama'
     17: [STRING]   |    64000 | tokenizer.ggml.tokens
     18: [FLOAT32]  |    64000 | tokenizer.ggml.scores
     19: [INT32]    |    64000 | tokenizer.ggml.token_type
     20: UINT32     |        1 | tokenizer.ggml.bos_token_id = 1
     21: UINT32     |        1 | tokenizer.ggml.eos_token_id = 2
     22: UINT32     |        1 | tokenizer.ggml.padding_token_id = 0
     23: UINT32     |        1 | general.quantization_version = 2

* Dumping 543 tensor(s)
      1:  458752000 |  7168, 64000,     1,     1 | Q4_K    | token_embd.weight
      2:   51380224 |  7168,  7168,     1,     1 | Q4_K    | blk.0.attn_q.weight
      3:    7340032 |  7168,  1024,     1,     1 | Q4_K    | blk.0.attn_k.weight
      4:    7340032 |  7168,  1024,     1,     1 | Q6_K    | blk.0.attn_v.weight
      5:   51380224 |  7168,  7168,     1,     1 | Q4_K    | blk.0.attn_output.weight
      6:  146800640 |  7168, 20480,     1,     1 | Q4_K    | blk.0.ffn_gate.weight
      7:  146800640 | 20480,  7168,     1,     1 | Q6_K    | blk.0.ffn_down.weight
      8:  146800640 |  7168, 20480,     1,     1 | Q4_K    | blk.0.ffn_up.weight
      9:       7168 |  7168,     1,     1,     1 | F32     | blk.0.attn_norm.weight
     10:       7168 |  7168,     1,     1,     1 | F32     | blk.0.ffn_norm.weight

edit: I wanted to change the BOS token in my Yi model to 2 instead of 1 (their tokenizer config actually says don't add BOS but we don't respect it). Making that change is pretty easy now:

import gguf
reader = gguf.GGUFReader('/path/models/yi-q4_k_m.gguf', 'r+')
reader.fields['tokenizer.ggml.bos_token_id'].parts[-1][0] = 2

edit: I added some examples demonstrating the reader stuff. With this pull checked out, you can run:

python gguf-py/examples/dump_gguf.py /path/model.gguf

to dump the metadata in a GGUF file.

There is also an example to allow changing simple metadata values in a GGUF file (I.E. integers, floats). Here's an example of changing the BOS token id to 1:

python gguf-py/examples/modify_gguf.py /path/model.gguf tokenizer.ggml.bos_token_id 1

Note: It will prompt you if you're sure before actually making any changes. That example only supports simple values, however the GGUFReader API will let you change anything (although the GGUF format it self currently prevents changes where the length of a field would be different).

KerfuffleV2 · 2023-11-07T21:25:24Z

@cebtenzzre I'm not really sure how to resolve these conflicts without basically reverting then manually editing in your changes since I moved the file content and then changed it. Would you be okay with that?

Or is there a better way?

cebtenzzre · 2023-11-07T21:34:46Z

I'm not really sure how to resolve these conflicts without basically reverting then manually editing in your changes since I moved the file content and then changed it. Would you be okay with that?

Yeah, that's basically the only way to do it.

gguf-py/gguf/gguf.py

@cebtenzzre

Credit to @cebtenzzre for that pull

gguf-py/gguf/gguf.py

gguf-py/gguf/constants.py

gguf-py/gguf/gguf_reader.py

gguf-py/gguf/gguf_writer.py

gguf-py/gguf/constants.py

gguf-py/gguf/gguf_reader.py

Move examples to an examples/ directory Clean up examples Add an example of modifying keys in a GGUF file Update documentation with info on examples Try to support people importing gguf/gguf.py directly

Galunid · 2023-11-08T23:38:01Z

Hey, we are going to have a conflict once #3838 gets merged. You'll need to revert your changes for convert-*-hf-to-gguf.py scripts, since they'll be deleted. Instead you need to add that change to convert-hf-to-gguf.py to resolve it ;)

Thanks for GGUFReader! It was definitely needed #3838 (comment)

KerfuffleV2 · 2023-11-09T00:06:17Z

@Galunid

Hey, we are going to have a conflict once #3838 gets merged.

Not a problem. Thanks for the heads up! If you wanted to, you could just make that change in #3838, it isn't specific to the other GGUF changes. Just something weird I noticed when I was looking at those scripts. Whichever way you prefer is fine.

KerfuffleV2 · 2023-11-09T01:20:59Z

@chenqiny Hi, not sure if you're interested but this pull is supposed to add the ability to read/modify GGUF files and transparently support working with GGUF files created on a different endian than the machine where the script is running. So you should be able to use this to open and change a little endian GGUF file on your big endian machine and vice versa.

I don't really have a good way to test that though since all my machines are LE and I'm not sure where to even find a BE GGUF model.

gguf-py/examples/writer.py

gguf-py/gguf/gguf_reader.py

cebtenzzre · 2023-11-09T01:58:49Z

gguf-py/examples/dump_gguf.py

+from gguf import GGUFReader, GGUFValueType  # noqa: E402
+
+def dump_gguf(filename: str) -> None:


There should be two blank lines before and after a top-level function. Same with the other two examples.

Also, the examples should be marked executable - otherwise, the shebang lines don't do anything.

Thanks for all the suggestions (and especially the actual bugs you caught). I really appreciate the time you've spent helping improve this pull!

What are you using for formatting and would you be able to share your configuration? I'd be perfectly happy to turn on Python auto formatting if there's a standard for the Python code in this repo to follow.

The linters I'm using are:

mypy for type checking (there is a mypy.ini in the repo and all new python scripts should pass mypy)

isort for import sorting (for this repo, basically isort **/*.py -l 120 --tc -m VERTICAL_HANGING_INDENT)

and flake8 for PEP 8 style checking.

My flake8 configuration is messy, but I've done pip install wemake-python-styleguide and then turned off everything I don't care about. This ridiculous command should reproduce the way I'm using flake8 for llama.cpp (most of this is hidden behind a shell alias):

flake8 **/*.py --max-line-length=120 --ignore=D,DAR,I,S,A003,E121,E123,E126,E127,E201,E202,E203,E211,E221,E222,E226,E241,E251,E261,E266,E272,E306,E402,E704,E731,E741,E800,F403,F811,N400,N801,N803,N806,N812,N813,P101,P103,P205,Q000,T001,U101,W503,W504,WPS102,WPS110,WPS111,WPS113,WPS114,WPS115,WPS117,WPS120,WPS122,WPS125,WPS201,WPS202,WPS203,WPS204,WPS210,WPS211,WPS212,WPS213,WPS214,WPS218,WPS220,WPS221,WPS222,WPS223,WPS224,WPS225,WPS226,WPS229,WPS230,WPS231,WPS232,WPS234,WPS235,WPS236,WPS237,WPS238,WPS300,WPS301,WPS302,WPS304,WPS305,WPS306,WPS316,WPS317,WPS318,WPS319,WPS320,WPS322,WPS323,WPS326,WPS331,WPS332,WPS336,WPS337,WPS347,WPS348,WPS352,WPS360,WPS361,WPS362,WPS400,WPS405,WPS407,WPS412,WPS414,WPS420,WPS421,WPS422,WPS427,WPS428,WPS429,WPS430,WPS431,WPS432,WPS433,WPS434,WPS435,WPS436,WPS437,WPS440,WPS441,WPS442,WPS450,WPS457,WPS458,WPS459,WPS460,WPS463,WPS464,WPS501,WPS504,WPS508,WPS509,WPS510,WPS513,WPS518,WPS526,WPS602,WPS604,WPS605,WPS606,WPS608,WPS609,WPS611,WPS613

There is a lot of subjectivity with flake8, even that command leaves some checks enabled that don't really matter IMO. And normally I leave E251 enabled, but the style in this repo seems to use spaces around '=' in keyword arguments.

gguf-py/examples/modify_gguf.py

chenqiny · 2023-11-09T02:05:14Z

@chenqiny Hi, not sure if you're interested but this pull is supposed to add the ability to read/modify GGUF files and transparently support working with GGUF files created on a different endian than the machine where the script is running. So you should be able to use this to open and change a little endian GGUF file on your big endian machine and vice versa.

I don't really have a good way to test that though since all my machines are LE and I'm not sure where to even find a BE GGUF model.

Sure. I will test it in weekend.

Galunid · 2023-11-09T03:38:44Z

If you wanted to, you could just make that change in #3838, it isn't specific to the other GGUF changes. Just something weird I noticed when I was looking at those scripts. Whichever way you prefer is fine.

Sure, I included the path fix in #3838

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

gguf-py/examples/modify_gguf.py

Which I kind of regret now

gguf-py/scripts/gguf-dump.py

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

monatis

lgtm --thanks for taking this! I think it's good to merge.

KerfuffleV2 · 2023-11-10T08:20:04Z

thanks for taking this! I think it's good to merge.

Thanks for taking the time to review it, and also many thanks to cebtenzzre who found a bunch of bugs and cleaned up a lot of stuff! I'll go ahead and merge it later today unless anyone finds problems in the mean time.

KerfuffleV2 · 2023-11-10T08:23:03Z

Actually, I found one more thing I'm going to change (after testing that it doesn't break anything). convert.py's Q8_0 quantization code has:

quantized_dtype = np.dtype([('d', '<f2'), ('qs', 'i1', (32,))])

Forcing the value to little endian. I'm pretty sure that means using convert.py to Q8_0 quantize files is going to be broken on big endian machines. Not sure why I did that, but there's no reason to not let it be the native BO.

… ids gguf-py: SpecialVocab: Try to load merges from merges.txt if not in tokenizer.json gguf-py: SpecialVocab: Add 'add_bos_token' type bools to GGUF metadata u

KerfuffleV2 · 2023-11-10T13:04:39Z

@cebtenzzre @monatis

I made a few extra changes since you approved so I want to double check. I was looking at the code for big endian and Q8_0 and realized it can't really work unless the byteswapping happens before the quantization part. However, currently that's in gguf and occurs too late. Q8_0 would have been broken on big endian systems: I just made it so it doesn't even appear as an option in convert.py

There are also some changes to ggufs SpecialVocab class. I was trying to convert a Yi model and it didn't find the special token ids at all. I realized this was because it would only look at config.json for bos_token_id: 1, etc if tokenizer.json or tokenizer_config.json did not exist. However, those may only be defined in config.json. So I just changed it to check both places. If it found an id already, it won't overwrite that from config.json so this change should only have an effect when those special token ids wouldn't have been found previously. I'd call this a bugfix.

I also made it possible to load merges from merges.txt. This only occurs if merges are requested (BPE vocab type) and they weren't found in tokenizer.json. So this is only a fallback as well.

The last thing I changed is possibly more controversial. I changed SpecialVocab to read the add_bos_token, etc booleans in config.json and add them to the GGUF file metadata like tokenizer.ggml.add_bos_token (a boolean). This is because some models don't want you to add a BOS or whatever token, however we currently just always add BOS for SPM models. Right now, that metadata field is just informational. Actually handling it on the C++ side would have to be a different pull.

I also didn't add it to the official GGUF keys constants. (But I can, if people think it's a good idea.) I think we should support at least including that information in the metadata. Models that don't want something like a BOS token prepended to their prompt can have severely degraded quality when you do it anyway.

gguf-py/gguf/vocab.py

KerfuffleV2 · 2023-11-11T01:22:18Z

@monatis This is just waiting for your review, so if you don't have any issues with the latest changes once you see this you can go ahead and just merge it.

monatis · 2023-11-11T05:06:33Z

Thanks again @KerfuffleV2 for the good job in this PR and taking into account every review patiently.

chenqiny · 2023-11-11T09:03:32Z

@KerfuffleV2 I got following error.

chenqiny@datalake:/database/llama.cpp/gguf-py$ /home/chenqiny/.local/bin/gguf-convert-endian /database/models--meta-llama--Llama-2-7b/snapshots/365ffa8f1a6c455d3e2028ae658236b4b85ba824/ggml-model-f16-little.gguf big

Loading: /database/models--meta-llama--Llama-2-7b/snapshots/365ffa8f1a6c455d3e2028ae658236b4b85ba824/ggml-model-f16-little.gguf
Host is LITTLE endian, GGUF file seems to be LITTLE endian
Traceback (most recent call last):
File "/home/chenqiny/.local/bin/gguf-convert-endian", line 8, in
sys.exit(gguf_convert_endian_entrypoint())
File "/home/chenqiny/.local/lib/python3.8/site-packages/scripts/gguf-convert-endian.py", line 109, in main
convert_byteorder(reader, args)
File "/home/chenqiny/.local/lib/python3.8/site-packages/scripts/gguf-convert-endian.py", line 34, in convert_byteorder
if file_endian == order:
UnboundLocalError: local variable 'order' referenced before assignment

@cebtenzzre

…erganov#3981) * gguf-py: Refactor and add file reading support * Replay changes from ggerganov#3871 Credit to @cebtenzzre for that pull * Various type annotation fixes. * sort imports with isort (again) * Fix missing return statement in add_tensor * style cleanup with flake8 * fix NamedTuple and Enum usage * Fix an issue with state init in GGUFReader Move examples to an examples/ directory Clean up examples Add an example of modifying keys in a GGUF file Update documentation with info on examples Try to support people importing gguf/gguf.py directly * Damagage is not a word. * Clean up gguf-py/examples/modify_gguf.py whitespace Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update gguf-py/examples/modify_gguf.py formatting Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Update gguf-py/gguf/gguf_reader.py type hint Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * Make examples executable, formatting changes * Add more information to GGUFReader and examples comments * Include a gguf Python package version bump * Add convert-gguf-endian.py script * cleanup * gguf-py : bump minor version * Reorganize scripts * Make GGUFReader endian detection less arbitrary * Add JSON dumping support to gguf-dump.py Which I kind of regret now * A few for gguf-dump.py cleanups * Murder accidental tuple in gguf-py/scripts/gguf-dump.py Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com> * cleanup * constants : remove unneeded type annotations * fix python 3.8 compat * Set up gguf- scripts in pyproject.toml * And include scripts/__init__.py, derp * convert.py: We can't currently support Q8_0 on big endian. * gguf-py: SpecialVocab: Always try available sources for special token ids gguf-py: SpecialVocab: Try to load merges from merges.txt if not in tokenizer.json gguf-py: SpecialVocab: Add 'add_bos_token' type bools to GGUF metadata u * cleanup * Promote add_X_token to GGUF metadata for BOS and EOS --------- Co-authored-by: Jared Van Bortel <jared@nomic.ai> Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

KerfuffleV2 added enhancement New feature or request script Script related labels Nov 7, 2023

gguf-py: Refactor and add file reading support

b8c80df

monatis reviewed Nov 7, 2023

View reviewed changes

gguf-py/gguf/gguf.py Outdated Show resolved Hide resolved

Replay changes from ggerganov#3871

8047aa1

Credit to @cebtenzzre for that pull

KerfuffleV2 force-pushed the feat-gguf-py-read-refactor branch from 0475e44 to 8047aa1 Compare November 7, 2023 22:02

cebtenzzre reviewed Nov 7, 2023

View reviewed changes

gguf-py/gguf/gguf.py Outdated Show resolved Hide resolved

gguf-py/gguf/constants.py Outdated Show resolved Hide resolved

cebtenzzre reviewed Nov 7, 2023

View reviewed changes

gguf-py/gguf/gguf_reader.py Outdated Show resolved Hide resolved

Various type annotation fixes.

d7688dc

KerfuffleV2 force-pushed the feat-gguf-py-read-refactor branch from 4ab9105 to d7688dc Compare November 8, 2023 00:34

sort imports with isort (again)

a6f5742

cebtenzzre reviewed Nov 8, 2023

View reviewed changes

gguf-py/gguf/gguf_writer.py Show resolved Hide resolved

cebtenzzre reviewed Nov 8, 2023

View reviewed changes

gguf-py/gguf/constants.py Outdated Show resolved Hide resolved

KerfuffleV2 and others added 2 commits November 7, 2023 18:43

Fix missing return statement in add_tensor

ce865b3

style cleanup with flake8

f364636

cebtenzzre reviewed Nov 8, 2023

View reviewed changes

gguf-py/gguf/gguf_reader.py Outdated Show resolved Hide resolved

cebtenzzre and others added 3 commits November 7, 2023 21:12

fix NamedTuple and Enum usage

f2292fc

Fix an issue with state init in GGUFReader

fffdac3

Move examples to an examples/ directory Clean up examples Add an example of modifying keys in a GGUF file Update documentation with info on examples Try to support people importing gguf/gguf.py directly

Damagage is not a word.

b56ed66

KerfuffleV2 changed the title ~~gguf-py: Refactor and add file reading support~~ gguf-py: Refactor and allow reading/modifying existing GGUF files Nov 8, 2023

cebtenzzre reviewed Nov 9, 2023

View reviewed changes

KerfuffleV2 and others added 2 commits November 9, 2023 00:21

Clean up gguf-py/examples/modify_gguf.py whitespace

4a5cd69

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

Update gguf-py/examples/modify_gguf.py formatting

2af29ff

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

monatis requested changes Nov 9, 2023

View reviewed changes

gguf-py/examples/modify_gguf.py Outdated Show resolved Hide resolved

KerfuffleV2 added 4 commits November 9, 2023 14:52

Reorganize scripts

52bdc7e

Make GGUFReader endian detection less arbitrary

a04f048

Add JSON dumping support to gguf-dump.py

bd241db

Which I kind of regret now

A few for gguf-dump.py cleanups

382f975

cebtenzzre reviewed Nov 10, 2023

View reviewed changes

gguf-py/scripts/gguf-dump.py Outdated Show resolved Hide resolved

KerfuffleV2 and others added 6 commits November 9, 2023 17:50

Murder accidental tuple in gguf-py/scripts/gguf-dump.py

7d3580d

Co-authored-by: Jared Van Bortel <cebtenzzre@gmail.com>

cleanup

5608cd8

constants : remove unneeded type annotations

795dc0f

fix python 3.8 compat

a21e9e7

Set up gguf- scripts in pyproject.toml

eff662d

And include scripts/__init__.py, derp

0b0e726

monatis approved these changes Nov 10, 2023

View reviewed changes

KerfuffleV2 added 2 commits November 10, 2023 05:50

convert.py: We can't currently support Q8_0 on big endian.

960f912

gguf-py: SpecialVocab: Always try available sources for special token…

9ce51b6

… ids gguf-py: SpecialVocab: Try to load merges from merges.txt if not in tokenizer.json gguf-py: SpecialVocab: Add 'add_bos_token' type bools to GGUF metadata u

KerfuffleV2 requested review from cebtenzzre and monatis November 10, 2023 13:04

cleanup

f22b2f2

cebtenzzre reviewed Nov 10, 2023

View reviewed changes

gguf-py/gguf/vocab.py Outdated Show resolved Hide resolved

Promote add_X_token to GGUF metadata for BOS and EOS

4814b4b

KerfuffleV2 mentioned this pull request Nov 10, 2023

gguf-py: Add support for loading merges.txt #3743

Closed

monatis merged commit 34b0a08 into ggerganov:master Nov 11, 2023
6 checks passed

monatis mentioned this pull request Nov 11, 2023

Fix gguf-convert-endian script #4037

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gguf-py: Refactor and allow reading/modifying existing GGUF files #3981

gguf-py: Refactor and allow reading/modifying existing GGUF files #3981

KerfuffleV2 commented Nov 7, 2023 •

edited

KerfuffleV2 commented Nov 7, 2023

cebtenzzre commented Nov 7, 2023

Galunid commented Nov 8, 2023

KerfuffleV2 commented Nov 9, 2023

KerfuffleV2 commented Nov 9, 2023

cebtenzzre Nov 9, 2023 •

edited

KerfuffleV2 Nov 9, 2023

cebtenzzre Nov 9, 2023

chenqiny commented Nov 9, 2023

Galunid commented Nov 9, 2023

monatis left a comment

KerfuffleV2 commented Nov 10, 2023

KerfuffleV2 commented Nov 10, 2023 •

edited

KerfuffleV2 commented Nov 10, 2023

KerfuffleV2 commented Nov 11, 2023

monatis commented Nov 11, 2023

chenqiny commented Nov 11, 2023

		from gguf import GGUFReader, GGUFValueType # noqa: E402

		def dump_gguf(filename: str) -> None:

gguf-py: Refactor and allow reading/modifying existing GGUF files #3981

gguf-py: Refactor and allow reading/modifying existing GGUF files #3981

Conversation

KerfuffleV2 commented Nov 7, 2023 • edited

KerfuffleV2 commented Nov 7, 2023

cebtenzzre commented Nov 7, 2023

Galunid commented Nov 8, 2023

KerfuffleV2 commented Nov 9, 2023

KerfuffleV2 commented Nov 9, 2023

cebtenzzre Nov 9, 2023 • edited

Choose a reason for hiding this comment

KerfuffleV2 Nov 9, 2023

Choose a reason for hiding this comment

cebtenzzre Nov 9, 2023

Choose a reason for hiding this comment

chenqiny commented Nov 9, 2023

Galunid commented Nov 9, 2023

monatis left a comment

Choose a reason for hiding this comment

KerfuffleV2 commented Nov 10, 2023

KerfuffleV2 commented Nov 10, 2023 • edited

KerfuffleV2 commented Nov 10, 2023

KerfuffleV2 commented Nov 11, 2023

monatis commented Nov 11, 2023

chenqiny commented Nov 11, 2023

KerfuffleV2 commented Nov 7, 2023 •

edited

cebtenzzre Nov 9, 2023 •

edited

KerfuffleV2 commented Nov 10, 2023 •

edited