Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train Textual Inversion in alternative models of Stable Diffusion #153

Open
fdagostino opened this issue Jun 12, 2023 · 3 comments
Open

Comments

@fdagostino
Copy link

Hi @rinongal, how are you?
I'm trying to train a person face with Textual Inversion on two models, the standard 1.5 base model and Deliberate V2 (DreamBooth fine tuned model based on 1.5, very photorealistic).

When doing the train on 1.5 model, it converges successfully, but when doing the train on Deliberate it doesn't converges despite testing all kind of config (different LR, etc.), including going 10k steps (the only setting that I've kept fixed is using two vectors to represent the token).
Someone told me that It could be related to EMA Weights on the model, but it doesn't make so much sense to me because when we are training we are moving the vectors around trying to find a position that represent the face and I don't see how the EMA Weights could be related to not finding the vector position.

Do you have any idea/insight/intuition on why I can find a vector that represent the face in the base model but can't find it in the other being that the last one was trained based on the other and is quite photorealistic?

I want to deepen this, any direction on how to debug is very appreciated!

Thanks,
Fran

@rinongal
Copy link
Owner

Hi @fdagostino ,

First of all, a possible workaround may be to train the face in the V1.5 model, and then initialize your DeliberateV2 training using this learned face embedding. The embeddings tend to transfer reasonably well between fine-tuned models, so it might serve as a good initialization. I haven't actively tried doing this, but these sorts of tricks typically work with GANs.

On the broader question - I'm actually not sure why this would happen. I wouldn't expect EMA weights to have a large impact on embedding tuning. Two possible things that do come to mind:

  1. If the weights are saved / loaded with different precision than the baseline model, this may have an impact.
  2. If Deliberate V2 is DreamBooth trained, does it come with its own keyword? This might conflict with the inversion process (e.g. the DB keyword might take attention away from the new TI word). Do your training prompts include this keyword? Have you tried removing / adding it to the prompts?

@fdagostino
Copy link
Author

Thanks @rinongal!
Will try to initialize with the learned embedding.
I don't really know how the model is trained, only that is a fine-tuned version of 1.5 model.

@RahulSajnani
Copy link

How do I find characters that do not have multiple embeddings for open clip (SDv2)? I have changed the code to work with my need, but no matter which placeholder character I try, it is always in the embedding. Can you help me @rinongal ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants