Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gguf-py : decouple adding metadata from writing in GGUFWriter #7827

Merged
merged 2 commits into from
Jun 9, 2024

Conversation

compilade
Copy link
Collaborator

This is an attempt to simplify both #7499 and #6942 by slightly changing how GGUFWriter works internally.

@mofosyne, @christianazinn, I think this might be interesting for you.

Summary of changes

  • The only breaking change is that GGUFWriter.add_key and GGUFWriter.add_val have been replaced by GGUFWriter.add_key_value.
    • add_key should never have been used alone anyway; this was error-prone.
    • Only gguf-py/scripts/gguf-new-metadata.py used this, and was easy enough to adapt.
  • use_temp_file is now opt-in instead of opt-out. (it now defaults to False)
    • It's not necessary in most cases, because Lazy tensors, and because it was otherwise unnecessarily enabled when not thinking about it, like in gguf-new-metadata.py which uses the default use_temp_file value but which already uses mmaped Numpy tensors from GGUFReader, so a temporary file is redundant when it's there by surprise.
  • GGUFWriter doesn't really need the output file name until when actually writing to it
  • GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata
    • It's possible to choose what to include and in what order, by modifying the tensors and kv_data dicts of a GGUFWriter instance before writing to a file.
      • These are now dict because they map tensor names to tensor info, and keys to values, respectively, and duplicates of either tensor names and keys are detected and should not happen anyway. Since Python 3.7, the iteration order of dict is guaranteed to be the same as the insertion order (and gguf-py already depends on Python >=3.8 in its pyproject.toml).
      • This should help Option to split during conversion #6942 with integrating splitting in GGUFWriter.

Testing

I already listed checksums of many models in #7234, so I'll be testing some of these same models to ensure nothing changed.

  • Tinyllama https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0:
    • $ sha256sum tinyllama-lazy-decouple.*
      5af5abc9122fd3cd113b5bdd828e0f1c1737a432e3de0b80cd403a32659f07a6  tinyllama-lazy-decouple.bf16.gguf
      06d3a29d46ac6d2a128a4c357544f24e311a582eb9af18134db76fe3044111e8  tinyllama-lazy-decouple.q8_0.gguf
      $ sha256sum tinyllama-*-tempfile.q8_0.gguf
      06d3a29d46ac6d2a128a4c357544f24e311a582eb9af18134db76fe3044111e8  tinyllama-eager-decouple-tempfile.q8_0.gguf
      06d3a29d46ac6d2a128a4c357544f24e311a582eb9af18134db76fe3044111e8  tinyllama-lazy-decouple-tempfile.q8_0.gguf    
      Matches the older PR.
  • TinyMistral MoE https://huggingface.co/jtatman/TinyMistral-248m-v2.5-4x-Moe
    • $ sha256sum tinymistral-moe-lazy-decouple.*
      d6e6e1977ffa9cc365d425c107d7a79770b9425cab9db04d695e092fedd00d72  tinymistral-moe-lazy-decouple.bf16.gguf
      316c22ee97cd3717736ff7cd4ee6cabfdb811b389bc23c4d7cf071c0b514144b  tinymistral-moe-lazy-decouple.q8_0.gguf
      Matches the older PR.

Since gguf-new-metadata.py was also changed, why not also test it:

  • Tinyllama: round-trip with --general-description "Hello world", then removing it with --remove-metadata general.description
    • $ sha256sum tinyllama-lazy-decouple-*
      44fda7dd804d827a0e03f1bffc3d734f41426a29fe5c4359d133423a8430b809  tinyllama-lazy-decouple-description.q8_0.gguf
      06d3a29d46ac6d2a128a4c357544f24e311a582eb9af18134db76fe3044111e8  tinyllama-lazy-decouple-no-description.q8_0.gguf
      Matches the above

@github-actions github-actions bot added the python python script changes label Jun 8, 2024
@compilade compilade added refactoring Refactoring Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level labels Jun 8, 2024
@christianazinn
Copy link
Contributor

christianazinn commented Jun 8, 2024

Thanks for notifying me here @compilade. Not sure how working on two "competing" PRs at once works - should I merge this branch into #6942 and add support, or can we merge that first and write GGUFSplitWriter support into this PR? #6942 is ready to merge whenever you give the green light, and I think I've resolved everything from your last review, so I'd lean towards the latter. The changes would be minimal anyway, just changing the override signatures slightly.

I think I already resolved most of the integration/split management nicely-ish in GGUFSplitWriter (and don't really want to integrate splitting directly into GGUFWriter for a couple reasons, primarily that GGUFWriter should only write a single file), and unfortunately this PR doesn't allow for the last ideal change - only adding KV data/tensors in their respective write_to_file functions rather than in an init_shards setup function - because both the self.tensors list and self.kv_data dict must be fully populated by the call to write_header_to_file for the correct ti/kv data counts to be written (and there's no good way around this anyway).

That being said, as a general thing, I like these changes. add_key being essentially an alias for a particular call to add_value had confused me, and re: tempfile, good change - most users will never encounter nor need this. One thing you didn't note that I might add is that delaying the naming of the GGUFWriter outfile also delays the file creation - which has some small benefits, e.g. not creating or overwriting a previous output file with a blank file if the script is stopped before it actually needs to write anything.

Also, not very familiar with the inner workings of the GGUF specification, but does it matter which order the metadata is included? In theory, you could have two identical model GGUFs that only differ in metadata ordering, where they would appear different by SHA256 checksum but are, well, identical. Is that a possibility worth concern?

@compilade
Copy link
Collaborator Author

Thanks for notifying me here @compilade. Not sure how working on two "competing" PRs at once works - should I merge this branch into #6942 and add support, or can we merge that first and write GGUFSplitWriter support into this PR?

@christianazinn I use git worktree in a bunch of folders :)

I'm fine with either PR being merged into master first. (merges across PRs can otherwise be weird unless well prepared)

#6942 is ready to merge whenever you give the green light, and I think I've resolved everything from your last review, so I'd lean towards the latter.

Yeah, I'll test #6942 tomorrow (it's night for me right now) and will likely approve after that. But I think it would be cleaner to refactor first to make the feature simpler, although it's not strictly necessary (the simplifications can come after too).

I think I already resolved most of the integration/split management nicely-ish in GGUFSplitWriter (and don't really want to integrate splitting directly into GGUFWriter for a couple reasons, primarily that GGUFWriter should only write a single file)

There's also another way to see GGUFWriter, that it writes a single model. But I agree it's likely cleaner to keep it to a single GGUFWriter per output file. We'll see.

Also, not very familiar with the inner workings of the GGUF specification, but does it matter which order the metadata is included?

The order of the metadata doesn't really matter, and it's currently governed by the insertion order from convert-hf-to-gguf.py. It was like this before this PR, and this behavior is kept in this PR too.

In theory, you could have two identical model GGUFs that only differ in metadata ordering, where they would appear different by SHA256 checksum but are, well, identical. Is that a possibility worth concern?

Hmm, you're right that this could sometimes be problematic, like when adding (for example) tokenizer.ggml.pre to a model which did not have it, with gguf-new-metadata.py. Might be worth it eventually to sort the keys, but it's out of scope for this PR.

@mofosyne
Copy link
Collaborator

mofosyne commented Jun 8, 2024

I like this PR moving towards decoupling the writing and the metadata. Was wishing for a way to not have to keep making new functions for each metadata entry

@compilade compilade added merge ready indicates that this may be ready to merge soon and is just holding out in case of objections and removed merge ready indicates that this may be ready to merge soon and is just holding out in case of objections labels Jun 8, 2024
Changing what happens when the output file is opened will be easier,
since this reduces the cases to consider.

* gguf-py : prevent GGUFWriter from writing all tensors multiple times

It was already checked with an assertion before, but using WriterState
should make the error message slightly less cryptic.
@compilade
Copy link
Collaborator Author

Did some further tests with 32d11db,

(using https://huggingface.co/jtatman/TinyMistral-248m-v2.5-4x-Moe)

I've tested --vocab-only to make sure it still works

$ sha256sum tinymistral-moe-*defer*
316c22ee97cd3717736ff7cd4ee6cabfdb811b389bc23c4d7cf071c0b514144b  tinymistral-moe-lazy-defer.q8_0.gguf
ff2a7bec22e7691ab4c1b9533731f20a5a44218791a9d7d7225e977f22ad085d  tinymistral-moe-lazy-defer-vocab.q8_0.gguf
$ sha256sum tinymistral-moe-master-vocab.q8_0.gguf 
ff2a7bec22e7691ab4c1b9533731f20a5a44218791a9d7d7225e977f22ad085d  tinymistral-moe-master-vocab.q8_0.gguf

(the checksum with all tensors matches the one in this PR's main post)

I've also tested to see whether convert-legacy-llama.py still works, and it does!

$ sha256sum tinymistral-moe-*legacy*
a877bbc20528db4d7c9986629cc0edbe5abe7751da454b34e163cd9313210b64  tinymistral-moe-legacy.f16.gguf
aeb30a6b3e560cb0cb24ac2793906fda57e0d6688bafdf628d21af52cbecae59  tinymistral-moe-legacy-vocab.f16.gguf
a877bbc20528db4d7c9986629cc0edbe5abe7751da454b34e163cd9313210b64  tinymistral-moe-master-legacy.f16.gguf
aeb30a6b3e560cb0cb24ac2793906fda57e0d6688bafdf628d21af52cbecae59  tinymistral-moe-master-legacy-vocab.f16.gguf

@compilade compilade added the merge ready indicates that this may be ready to merge soon and is just holding out in case of objections label Jun 8, 2024
Copy link
Collaborator

@mofosyne mofosyne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanity checking before merging. Looks alright

@mofosyne mofosyne merged commit ed9f252 into master Jun 9, 2024
22 checks passed
christianazinn added a commit to christianazinn/llama.cpp that referenced this pull request Jun 9, 2024
@mofosyne mofosyne deleted the compilade/gguf-py-decouple-writer-meta branch June 9, 2024 04:09
mofosyne added a commit that referenced this pull request Jun 24, 2024
* support splits in convert.py

* Support split by size and dry run to write estimated shards/filesizes

* Move split functionality to new GGUFManager class

* fix improper function signature

* tentative push of convert-hf-to-gguf support

* resolve merge + SplitArguments for easier parsing

* Fix eager tensor memory leak and remove convert.py changes

Removed a memory leak caused by unexpected reference retention to eager tensors.

Also removed GGUFManager functionality in convert.py in favor of specializing for convert-hf-to-gguf.py.

* refactor SplitStrategy to be a deque

Instead of having SplitStrategy have a `data` field that is a deque, just have SplitStrategy be a subclass of deque itself.

* fix Q8 quantization

* remove unnecessary imports in gguf_manager

* fix final? merge issue

* fix gguf_writer placement and remove comments

* oops, actually fix gguf_writer placement

* reduce duplicated code from gguf_writer

* further simplify GGUFManager

* simplify even further and standardize with GGUFWriter

* reduce diffs with master

* form shards while adding tensors, SHA256 sums agree with master

* re-add type hint

Co-authored-by: compilade <git@compilade.net>

* GGUFWriter compatibility fix

Co-authored-by: compilade <git@compilade.net>

* Shard dataclass and un-negative dont_add_architecture

* type consistency in format_n_bytes_to_str

* move kv keys to constants.py

* make pathlib explicit

* base-1024 bytes to base-1000

* rename GGUFManager to GGUFWriterSplit

* Update gguf-py/gguf/constants.py

Co-authored-by: compilade <git@compilade.net>

* fix convert-hf-to-gguf.py permissions

* fix line endings

* Update gguf-py/gguf/gguf_writer_split.py

Co-authored-by: compilade <git@compilade.net>

* convert-hf : restore executable file permission

* examples/convert-legacy-llama.py: restore executable file permission

* reinstate original gguf package import and fix type annotation

* attempt to appease the linter

* attempt 2 to appease the linter

* attempt 3 to appease the linter

* comma consistency

* Update convert-hf-to-gguf.py

Co-authored-by: compilade <git@compilade.net>

* edit cmd line args

* use simplification from #7827

* kv/ti data are still wrong

* try to refactor kv data (still fails)

* fix ti data messiness

* tidy up

* fix linting

* actually make the linter happy

* cleanup round 1

* remove SplitStrategy, SplitArguments

* appease linter

* fix typing and clean up

* fix linting

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* progress bar, fix split logic

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* catch oversights

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* swap bar orders

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* compatibility fix

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update convert-hf-to-gguf.py

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: Brian <mofosyne@gmail.com>
Co-authored-by: compilade <git@compilade.net>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jun 30, 2024
* support splits in convert.py

* Support split by size and dry run to write estimated shards/filesizes

* Move split functionality to new GGUFManager class

* fix improper function signature

* tentative push of convert-hf-to-gguf support

* resolve merge + SplitArguments for easier parsing

* Fix eager tensor memory leak and remove convert.py changes

Removed a memory leak caused by unexpected reference retention to eager tensors.

Also removed GGUFManager functionality in convert.py in favor of specializing for convert-hf-to-gguf.py.

* refactor SplitStrategy to be a deque

Instead of having SplitStrategy have a `data` field that is a deque, just have SplitStrategy be a subclass of deque itself.

* fix Q8 quantization

* remove unnecessary imports in gguf_manager

* fix final? merge issue

* fix gguf_writer placement and remove comments

* oops, actually fix gguf_writer placement

* reduce duplicated code from gguf_writer

* further simplify GGUFManager

* simplify even further and standardize with GGUFWriter

* reduce diffs with master

* form shards while adding tensors, SHA256 sums agree with master

* re-add type hint

Co-authored-by: compilade <git@compilade.net>

* GGUFWriter compatibility fix

Co-authored-by: compilade <git@compilade.net>

* Shard dataclass and un-negative dont_add_architecture

* type consistency in format_n_bytes_to_str

* move kv keys to constants.py

* make pathlib explicit

* base-1024 bytes to base-1000

* rename GGUFManager to GGUFWriterSplit

* Update gguf-py/gguf/constants.py

Co-authored-by: compilade <git@compilade.net>

* fix convert-hf-to-gguf.py permissions

* fix line endings

* Update gguf-py/gguf/gguf_writer_split.py

Co-authored-by: compilade <git@compilade.net>

* convert-hf : restore executable file permission

* examples/convert-legacy-llama.py: restore executable file permission

* reinstate original gguf package import and fix type annotation

* attempt to appease the linter

* attempt 2 to appease the linter

* attempt 3 to appease the linter

* comma consistency

* Update convert-hf-to-gguf.py

Co-authored-by: compilade <git@compilade.net>

* edit cmd line args

* use simplification from ggerganov#7827

* kv/ti data are still wrong

* try to refactor kv data (still fails)

* fix ti data messiness

* tidy up

* fix linting

* actually make the linter happy

* cleanup round 1

* remove SplitStrategy, SplitArguments

* appease linter

* fix typing and clean up

* fix linting

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* progress bar, fix split logic

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* catch oversights

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* swap bar orders

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* compatibility fix

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update convert-hf-to-gguf.py

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: Brian <mofosyne@gmail.com>
Co-authored-by: compilade <git@compilade.net>
MagnusS0 pushed a commit to MagnusS0/llama.cpp-normistral-tokenizer that referenced this pull request Jul 1, 2024
* support splits in convert.py

* Support split by size and dry run to write estimated shards/filesizes

* Move split functionality to new GGUFManager class

* fix improper function signature

* tentative push of convert-hf-to-gguf support

* resolve merge + SplitArguments for easier parsing

* Fix eager tensor memory leak and remove convert.py changes

Removed a memory leak caused by unexpected reference retention to eager tensors.

Also removed GGUFManager functionality in convert.py in favor of specializing for convert-hf-to-gguf.py.

* refactor SplitStrategy to be a deque

Instead of having SplitStrategy have a `data` field that is a deque, just have SplitStrategy be a subclass of deque itself.

* fix Q8 quantization

* remove unnecessary imports in gguf_manager

* fix final? merge issue

* fix gguf_writer placement and remove comments

* oops, actually fix gguf_writer placement

* reduce duplicated code from gguf_writer

* further simplify GGUFManager

* simplify even further and standardize with GGUFWriter

* reduce diffs with master

* form shards while adding tensors, SHA256 sums agree with master

* re-add type hint

Co-authored-by: compilade <git@compilade.net>

* GGUFWriter compatibility fix

Co-authored-by: compilade <git@compilade.net>

* Shard dataclass and un-negative dont_add_architecture

* type consistency in format_n_bytes_to_str

* move kv keys to constants.py

* make pathlib explicit

* base-1024 bytes to base-1000

* rename GGUFManager to GGUFWriterSplit

* Update gguf-py/gguf/constants.py

Co-authored-by: compilade <git@compilade.net>

* fix convert-hf-to-gguf.py permissions

* fix line endings

* Update gguf-py/gguf/gguf_writer_split.py

Co-authored-by: compilade <git@compilade.net>

* convert-hf : restore executable file permission

* examples/convert-legacy-llama.py: restore executable file permission

* reinstate original gguf package import and fix type annotation

* attempt to appease the linter

* attempt 2 to appease the linter

* attempt 3 to appease the linter

* comma consistency

* Update convert-hf-to-gguf.py

Co-authored-by: compilade <git@compilade.net>

* edit cmd line args

* use simplification from ggerganov#7827

* kv/ti data are still wrong

* try to refactor kv data (still fails)

* fix ti data messiness

* tidy up

* fix linting

* actually make the linter happy

* cleanup round 1

* remove SplitStrategy, SplitArguments

* appease linter

* fix typing and clean up

* fix linting

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* progress bar, fix split logic

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* catch oversights

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* swap bar orders

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* compatibility fix

* Update gguf-py/gguf/gguf_writer.py

Co-authored-by: compilade <git@compilade.net>

* Update convert-hf-to-gguf.py

Co-authored-by: compilade <git@compilade.net>

---------

Co-authored-by: Brian <mofosyne@gmail.com>
Co-authored-by: compilade <git@compilade.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merge ready indicates that this may be ready to merge soon and is just holding out in case of objections python python script changes refactoring Refactoring Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants