Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial setup of lora support #2712

Closed
wants to merge 44 commits into from
Closed

Conversation

felorhik
Copy link

@felorhik felorhik commented Feb 18, 2023

I wanted to get Lora support going, so I decided to take a crack at it.

It uses the same syntax as https://github.com/AUTOMATIC1111/stable-diffusion-webui like so:

prompt: main prompt, <lora:name_of_lora:1>

Right now the weight does nothing, it will always be 1.
Also the lora must be trained in diffusers format https://huggingface.co/docs/diffusers/training/lora

This also requires the base diffusers version to be raised from 0.11 to 0.13

it should be able to support multiple, but I have not tested it that deeply. This is far from ready, but might be useful for anyone wanting to experiment with lora in InvokeUI

@blessedcoolant
Copy link
Collaborator

Turns out finding a LoRA model online that is in the diffusers format is ending up being harder than I expected. Most of them are shared as safetensors. I think I'll train my own Lora and save as diffusers and then try this PR out.

@ChrGriffin
Copy link

Getting LoRA support going is great, but as you point out, the vast majority are in safetensors format -- might be better to aim to support that up front?

@felorhik
Copy link
Author

Working on getting safetensors to work, however there is some complexity to it.

https://github.com/cloneofsimo/lora uses a format compatible with diffusers, and can be used to load safetensors based lora files, provided some patching is done on the pipeline. There are guides in the repo for that I have been experimenting with. https://github.com/cloneofsimo/lora/blob/71c8c1dba595d77d0eabdf9c278630168e5a8ce1/scripts/run_inference.ipynb

Most have been trained with https://github.com/kohya-ss/sd-scripts which uses its own format for the keys. I have been attempting to convert them and save in diffusers format, but no luck yet.

Given that, I am leaning more towards needing a conversion for the kohya scripts, rather then supporting them natively. Though knowing which method was used to train may be difficult. Still experimenting but overall diffusers can make a 3mb lora file, vs 150mb of the kohya method. I have not seen a difference in the quality either.

For training as it currently is:

Using the following diffusers training scripts I have had success loading them
https://github.com/huggingface/diffusers/blob/b2c1e0d6d4ffbd93fc0c381e5b9cdf316ca4f99f/examples/dreambooth/train_dreambooth_lora.py
https://github.com/huggingface/diffusers/blob/b2c1e0d6d4ffbd93fc0c381e5b9cdf316ca4f99f/examples/text_to_image/train_text_to_image_lora.py

@blessedcoolant
Copy link
Collaborator

Is there a script somewhere that allows us to convert Lora models from safetensors or whatever format to diffusers? If there is one, we could integrate that and load through that maybe?

@felorhik
Copy link
Author

Added an adjustment which should be able to load safetensors made by https://github.com/cloneofsimo/lora though still working on testing / debugging it.

pip install git+https://github.com/cloneofsimo/lora.git is needed as a dependency for it though.

@ChrGriffin
Copy link

ChrGriffin commented Feb 19, 2023

I have no idea if any of this information will be useful to you, but maybe it'll at least inspire some fresh Google terms!

I've been using the HuggingFace Diffusers package, specifically the LoRA Dreambooth example with basically zero changes, in order to train my LoRAs (the same one you linked). Unfortunately, the package produces a .bin file, not compatible with A1111 (changing the extension isn't enough). A little Googling did bring me to this thread with the script by ignacfetser at the bottom.

It's hardly the most... uh... structurally sound solution, but I can confirm that for now at least, it does work.

Of course, this isn't the exact issue you're encountering, but I figured I'd drop it here if any of the information was helpful.

@felorhik
Copy link
Author

@ChrGriffin It did help, gave me a good direction on a conversion script.

It saves as diffusers, and can be loaded after running

python ./scripts/convert_lora.py --lora_file=path/to/lora_file_name.safetensors

Although i have not tested it yet, it should work with ckpt too

python ./scripts/convert_lora.py --lora_file=path/to/lora_file_name.ckpt

It should save to ./models/lora/lora_file_name and be usable by <lora:lora_file_name:1>

Not seeing great results out of it yet, but it is loading into the pipeline at least.

@blessedcoolant
Copy link
Collaborator

If the conversion time is short, I think we can effectively do a one time conversion of the safetensor/ckpt model to diffusers. I think that might be more ideal because in the long run, we want to do Diffusers. And that way, we can avoid installing the original repo as a dependency and fully have it work through the Diffusers pipeline.

@neecapp
Copy link

neecapp commented Feb 19, 2023

huggingface/diffusers#2403 may be of some interest--though, you've already figured out the key mapping part.

I tried a variant of that in place of the cloneofismo monkey patch with better results (no errors for missing alphas). Still not perfect, seems like either the math is off somewhere or this lora I am testing with is junk...

Using the conversion script does not work for me. Likely due to missing text model encoder layers (lora_te_text_model_encoder_layers). There's also nothing updating text encoding in LoraManager for diffusers.

Edit: Conversion times from .safetensors to diffusers is fast, so I see no QOL impacts of a one-time conversion.

@blessedcoolant
Copy link
Collaborator

blessedcoolant commented Feb 19, 2023

Okay. Managed to load a diffusers version of the Lora model but it doesn't seem to be working.

@damian0815
Copy link
Contributor

damian0815 commented Feb 19, 2023

re: the prompt syntax. is this the way LORA's are going to be activated, as a prompt term? if so, i'd suggest a more explicit syntax in-line with the rest of Invoke. something like withLora(lora_name [,optional weight]) so eg a cat running in the forest withLora(tiger, 0.5) to apply the tiger.lora or whatever model at 50% strength

@jordanramstad where is the prompt parser logic happening in your code? i didn't see it somewhere obvious.

@ChrGriffin
Copy link

I believe the reasoning for the syntax is that it mimics A1111's syntax for loading LoRAs, making transitioning back and forth easier.

@felorhik
Copy link
Author

felorhik commented Feb 20, 2023

success!

@neecapp the PR you linked really helped.

Now supports safetensors made in other formats, it will load and merge into the current model when it runs, rather then try to convert or force it to load in diffusers format.

Converting may still be done, but this allows for the text encoder to be supported as well.

EDIT: got a little excited with it working, it will re-apply the weights on each execution without clearing, leading to gradual burning. Working on resetting it after each run to get around that.

@neecapp
Copy link

neecapp commented Feb 20, 2023

I'll try to look at it again if I have some time, but it looks like you'll load the lora repeatedly for every successive run, which you likely do not want.

On mobile at the moment, but it may be better to update hijack text_encode.forward and unet.forward to apply weights in a callback to a set of functions controlled by lora_manager, so that the prompt multiplier can be dynamically changed and the lora is only loaded at-most-once (depending on prompts).

@felorhik
Copy link
Author

Latest commit will break what was working before.

I have started to adjust it to apply layers with the weight data, but running into issues with matching tensors.
I don't think ill solve it tonight, so I have committed it if anyone else wants to take a look.

@neecapp
Copy link

neecapp commented Feb 23, 2023

Never mind, looks like they want people to just use peft.

@felorhik
Copy link
Author

@neecapp @damian0815

Added a peft setup, it does not work yet, but with a lora trained with peft (https://github.com/huggingface/peft) it will try to use it.

The issue is something related to sending to the right device, but the error just dumps out the entire model, making it hard to diagnose.

It should only take affect with withLora(lora_name,1) when the folder contains a lora_config.json and lora.pt file. Otherwise it will use diffusers. The config is a little different then standard peft, since the training they set puts the instance prompt infront of the file name, though I don't think that is necessary and makes it hard to scan the dir properly.

On another note, this makes 3 variation of Lora. diffusers, peft and "legacy" (kohya scripts). While support for all is nice, it does feel like we should focus on one, the code is getting kind of messy having support for the different types atm.

@neecapp
Copy link

neecapp commented Feb 26, 2023

I've been reading when I can. To be honest, as much as people may want the "legacy" variant to go away, that format covers 99.99% of all existing LoRA models. I don't think I've even seen a diffuser LoRA outside of engineer repos--with only unet support, they aren't very good. I doubt the Kohya variant is going anywhere anytime soon.

When the legacy format dominates and "just works" with A1111, most people will not think of it as legacy, they will see it as Invoke not having good LoRA support.

Haven't had much availability to think about a proper design for all of this, but there needs to be a good, low friction, way to support Kohya LoRA. In all reality, peft isn't a solution to the problem--at least not yet.

Probably an unpopular opinion, but I see supporting legacy as the immediate need with anything else as secondary, which can be done with follow-on PRs as PEFT et al. mature.

All of this is up to the Invoke team, of course. Apart from this PR, I'm not really active on here.

Edit: Source of the use peft for lora comment: huggingface/transformers#21770

@ChrGriffin
Copy link

ChrGriffin commented Feb 26, 2023

That may be an unpopular opinion within the devs, but as a user, I'm here saying I don't care -- at all -- about supporting the diffuser format of LoRAs. Effectively none of the LoRAs available to "average users" are in the diffuser format. For example, take a quick browse of the LoRAs available to users on Civitai and you'll find that all of them are in Safetensor format. It's less easily searchable, but similarly, you'll find that most or all LoRAs on HuggingFace are in Safetensor format. And finally, Automatic1111, the de facto Stable Diffusion webui, uses Safetensor format LoRAs.

Maybe it's "legacy", but it's what the Stable Diffusion community uses, overwhelmingly. Choosing not to support Safetensor LoRAs is effectively choosing not to support LoRAs at all.

@simonfuhrmann
Copy link

With all the work that has already been done here, and all the knowledge gathered, what is the state of things? I am really looking forward to using Loras with Invoke, as personally, I much prefer the Invoke UI over A1111.

FWIW, I do agree with the general sentiment here, that supporting the de-facto standard format is lot more valuable, and efforts to support diffusers can be done in follow up changes.

@felorhik
Copy link
Author

felorhik commented Mar 9, 2023

Just wanted to post an update.

With the talk of code freeze with the implementation of nodes. I have paused on doing much here.

There also appears to be another evolution of LoRA worth keeping an eye on https://github.com/KohakuBlueleaf/LyCORIS
Right now there is various implementations, but the kohya method seems to be the standard, even LyCORIS is utilizing it as a base. I am going to keep an eye on things for the time being, and will make revisions here once things settle a little.

@ChrGriffin
Copy link

Disappointing to see Invoke lagging so far behind other solutions in the SD space, but I do understand the perspective of waiting for a "settled" solution before implementing anything. I wonder if a solution for user-created extensions, like A1111 has, would ease over these issues. Then users could develop or install extensions for whichever LoRA implementation they use and the core Invoke codebase wouldn't be polluted with three or four different ways of loading LoRAs.

@knoopx
Copy link

knoopx commented Mar 18, 2023

I see some confusion here, some clarifications:

  • not sure what you refer to LEGACY here. everything LoRA related is edge, and if anything you could call it unadopted. compviz torch state dumps (ckpts) are neither legacy. diffusers is just an alternative implementation.
  • safetensors is a serialization format, it has nothing to do with LoRA. Can be used regardless what strategy was used to train.
  • different LoRA sizes depends on the rank (how many effective parameters are trained). huge LoRA's > 256mb usually come from people lacking understanding as it barely offers any improvement in quality.
  • https://github.com/cloneofsimo/lora/ was the first implementation, based on diffusers. spits out two files, one for the unet and one for the text encoder. it is also used by the popular (and buggy) webui extension
  • https://github.com/kohya-ss/sd-scripts took cloneofsimo work (and others) and evolved it. trains both unet and text encoder and spits a single file with combined weights. probably the most popular implementation given it is the most flexible and portable.
  • https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth_lora.py diffusers example, nobody uses it. trains only the unet.
  • https://github.com/huggingface/peft huggingface library meant to be universal, it is great but nobody uses it for stable diffusion. HF-style, messes up key names and doesn't have a proper "specification", it is a user choice how it is serialized.
  • https://github.com/KohakuBlueleaf/LoCon an enhanced method to train additional layers, some people already using it. kohya experimented with it in the past and did not see any advantage.
  • https://github.com/KohakuBlueleaf/LyCORIS too new, not adopted yet. needs to be validated
  • webui supports kohya-ss and diffusers-based LoRA natively. this extension acts as a platform to support additional "hypernetworks", including LoCon
  • on-the-fly conversion of formats is feasible, even whole checkpoints (as far as you can fit them on ram, it is fast).
  • LoRA is probably the future for running/training models on consumer-grade hardware.

@Void2258
Copy link

Void2258 commented Mar 19, 2023

Quick comment on the syntax. Just to be consistent with the rest of invoke, the weight should probably go outside ie <lora:name_of_lora>1 (or even (<lora:name_of_lora>)1 which is how textual inversions currently work though I hope it's changed since it's a bit heavy on the delimiters) if you go this way and not with the withlora version. This would still be close enough to A1111 not to be a huge hassle swapping back and forth. (if this was already settled, please ignore me; this is a lot to read through).

Regarding safetensor format, if there is a ton in that format, it needs support. That's why .doc is still supported even though it's been over a decade since it was technically superseded by .docx. Failing to support a very common format is equivalent to something not being supported for many people.

@lstein
Copy link
Collaborator

lstein commented Mar 28, 2023

It’s going to be a while before main is ready to receive LORA support. Do you have any interest in putting just the kohya support into the legacy 2.3 branch?

@lstein
Copy link
Collaborator

lstein commented Mar 29, 2023

I just did an experimental merge of this PR into the 2.3 branch, and it went in cleanly. I will test the code out tomorrow, but if it works as advertised I would propose that we go with it in that branch until the nodes refactor stabililzes.

@lzardy
Copy link

lzardy commented Mar 30, 2023

Hello, just passing by to say I think this PR is an example of what could be part of a plugin system.
Considering the high likelyhood that we will see new formats of models, inversions, dreambooth, and now LoRA, I believe you will be well rewarded for implementing some kind of API/Hooks at the very least.
This kind of thing is for the long term and best to be done sooner than later.
Regards, lizard man.

@lstein
Copy link
Collaborator

lstein commented Mar 30, 2023

@felorhik @neecapp After some minor fixes to the way the lora paths are assigned in legacy_lora_manager I was able to get a Kohya lora model to load. Unfortunately when I try to generate an image I get a freeze before the first denoising step executes. Was the "legacy" support ever working in this PR? I'd be grateful for any help with this. As noted above, the 3.0 main release is still a few weeks away from release and it would be nice to have a way to run LoRA files in the interim.

I've made a new PR rebased against 2.3: #3072

@felorhik
Copy link
Author

Thank you @lstein

Given your work, I am going to close this PR. I have been working mostly with LyCORIS now so I may open another PR for that in the future.

@felorhik felorhik closed this Mar 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants