Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model metadata editor and hashing update #45

Closed
wants to merge 11 commits into from

Conversation

space-nuko
Copy link
Contributor

@space-nuko space-nuko commented Jan 19, 2023

Adds a metadata editor for LoRA models. Now people won't have to put the instructions or cover images in separate files where they can get lost. It can also save ratings, activation keywords and tags, for a future model browser component

Also updates the hashing methodology to be more resiliant like webui's new algorithm. This hash change should not affect old seeds since the "legacy" hash is also calculated for each model upfront, unlike what webui is currently doing (only calculating/caching the hash if a model is loaded)

Some deficits:

  • Getting the hash for all models takes a very long time (about 15 minutes on an HDD with 1500 files), but I'm not sure what else to do when it comes to keeping backward compatibility for seeds. It's probably faster with an SSD though, the number of hashing threads is user-configurable
  • After the hashes are calculated they're cached so future startups are instant. But if the directory structure of the LoRAs folder is changed without the models having an embedded hash, all the hashes have to be recalculated. This can be solved for future models by precalculating the hash with sd-scripts first
  • By nature of the hashing technique, it requires the whole file to be serialized so the tensors region can be scanned through, so in practice new models have to be serialized twice to get and save the hash to the metadata
    • I tried a method that hashes the torch.Tensors in-memory, alleviating the need for serializing the extra time, but it was 80% slower than hashing the raw file's bytes. Again, might be different if a SSD is used, but it's a difference between 15 minutes and 40 minutes, and the two types of hashes are incompatible with each other
  • Updating metadata is an expensive operation due to the nature of the safetensors format and the big files. Ideally it should be done when an author is editing their own models for distribution so it's only ever done once
  • Only one cover image is supported right now because gr.Gallery() doesn't support uploading: Allow Gallery to be used as in input component and upload multiple images gradio-app/gradio#1379
  • Model ratings are saved to the model file, but of course ratings are subjective, not a property of the model itself. Maybe keeping a separate database for ratings would be better

@kohya-ss
Copy link
Owner

Thank you for great PRs!
The feature seems to be quite useful. The use of a model is not known by the model only. So the metadata editor will help many users.

However, I am a little worried about my ability to maintain it in the future.

In addition, I also plan to add other networks (such as Hypernetwork or Custom Diffusion etc...) and other features in the future. It might be an idea to create another extension for these networks, but the single extension will be good to control their orders and weights.

Do you think it would be possible to implement the metadata editor in a separate file as much as possible? I think that would keep the file additional_networks.py simple (as even I can understand).

@space-nuko
Copy link
Contributor Author

Yeah, probably a good idea, I can implement the hashing changes separately

I think metadata editing is too ahead of its time since there's no model browser yet anyway

@space-nuko space-nuko closed this Jan 19, 2023
@kohya-ss
Copy link
Owner

Thank you for your consideration. I agree that it is too ahead 😅. But I think it is definitely useful.
It will be nice to implement the metadata editor as a separate extension, but there is a compatibility issue for the metadata... I have no good idea, but the metadata editor will edit only "ssmd_*" items might be good...

In addition, at this point, it might be nice to have an item that allows the user to specify an arbitrary string when training the model, such as "ss_training_comment" by a command-line argument. That would be helpful for not only the user of the model but also the user to train.

@space-nuko
Copy link
Contributor Author

space-nuko commented Jan 20, 2023

I think the comment could be helpful but the model trainer won't know how it turns out beforehand, so I'm not sure it will be 100% useful

I was mainly trying to solve the issue of having a notes.txt next to the model become lost when the folder is organized, so you forget the activation keywords and other important info that the training metadata alone can't capture

For metadata standardizing, the items with ss_ should come from training data only (so sd-scripts) and something like ssmd_ could be for user-specified metadata

Once there's a proper model browser where the cover images can be organized in an actually useful way, this kind of thing will probably gain more adoption than right now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants