Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text encoder of SD 1.5 model is not trained which is not supposed to happen #855

Open
FurkanGozukara opened this issue Oct 4, 2023 · 52 comments

Comments

@FurkanGozukara
Copy link

FurkanGozukara commented Oct 4, 2023

Here the executed command

accelerate launch --num_cpu_threads_per_process=2 "./train_db.py" --pretrained_model_name_or_path="/workspace/stable-diffusion-webui/models/Stable-diffusion/Realistic_Vision_V5.1.safetensors" --train_data_dir="/workspace/stable-diffusion-webui/models/Stable-diffusion/img" --reg_data_dir="/workspace/stable-diffusion-webui/models/Stable-diffusion/reg" --resolution="768,768" --output_dir="/workspace/stable-diffusion-webui/models/Stable-diffusion/model" --logging_dir="/workspace/stable-diffusion-webui/models/Stable-diffusion/log" --save_model_as=safetensors --full_bf16 --output_name="me_1e7" --lr_scheduler_num_cycles="4" --max_data_loader_n_workers="0" --learning_rate="1e-07" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="4160" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --cache_latents --cache_latents_to_disk --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False warmup_init=False weight_decay=0.01 --max_data_loader_n_workers="0" --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0

When text encoder is not trained it is supposed to print Text Encoder is not trained.

This message is not printed either

    train_text_encoder = args.stop_text_encoder_training is None or args.stop_text_encoder_training >= 0
    unet.requires_grad_(True)  # 念のため追加
    text_encoder.requires_grad_(train_text_encoder)
    if not train_text_encoder:
        accelerator.print("Text Encoder is not trained.")

So how do I know text encoder were not trained? Because I extracted LoRA and it says text encoder is same

I did 30 trainings and so many trainings are wasted because of this bug :/

image

@kohya-ss

@FurkanGozukara
Copy link
Author

I manually set train text encoder true and added --stop_text_encoder_training 999999

But still lora extractor is saying text encoder is same

@kohya-ss
Copy link
Owner

kohya-ss commented Oct 4, 2023

I could reproduce the issue with same and some other settings.

I also trained with the previous version, tag v0.6.6, and Text Encoder is trained. train_db.py is almost identical in both version, so I think the most likely cause is one or some of dependent libraries.
I will check it sooner. However, since it means that there is probably nothing wrong with train_db.py, it may take some time to find the cause.

@FurkanGozukara
Copy link
Author

I could reproduce the issue with same and some other settings.

I also trained with the previous version, tag v0.6.6, and Text Encoder is trained. train_db.py is almost identical in both version, so I think the most likely cause is one or some of dependent libraries. I will check it sooner. However, since it means that there is probably nothing wrong with train_db.py, it may take some time to find the cause.

Thank you so much looking forward to solution. I am pretty sure those transformers diffusers or accelerator one of them broken

You are doing incredible job

@kohya-ss
Copy link
Owner

kohya-ss commented Oct 4, 2023

I hope so too, but if there is something wrong with my script, I apologize.

Thank you so much looking forward to solution. I am pretty sure those transformers diffusers or accelerator one of them broken

I hope so too, but if there is something wrong with my script, I apologize.

@FurkanGozukara
Copy link
Author

I hope so too, but if there is something wrong with my script, I apologize.

Thank you so much looking forward to solution. I am pretty sure those transformers diffusers or accelerator one of them broken

I hope so too, but if there is something wrong with my script, I apologize.

SDXL text encoder is also not trained

image

@FurkanGozukara
Copy link
Author

FurkanGozukara commented Oct 4, 2023

sadly no version of SDXL is training text encoder :(

I couldn't find working version with bmaltais/kohya_ss

edit : 3 months old sdxl branch working for some reason

@bluvoll
Copy link

bluvoll commented Oct 5, 2023

sadly no version of SDXL is training text encoder :(

I couldn't find working version with bmaltais/kohya_ss

edit : 3 months old sdxl branch working for some reason

Just add --train_text_encoder as extra parameter and it will train the TE I think this is the intended behavior, as for extracting the lora from the Dreambooth if the TE has been trained enough to be different it will be extracted, but you can force the extraction by changing the value here (Kohya GUI but you can specify it in the command line no worries
imagen

You'll then get this and it will be extracted as expected

imagen

@FurkanGozukara
Copy link
Author

FurkanGozukara commented Oct 5, 2023

sadly no version of SDXL is training text encoder :(
I couldn't find working version with bmaltais/kohya_ss
edit : 3 months old sdxl branch working for some reason

Just add --train_text_encoder as extra parameter and it will train the TE I think this is the intended behavior, as for extracting the lora from the Dreambooth if the TE has been trained enough to be different it will be extracted, but you can force the extraction by changing the value here (Kohya GUI but you can specify it in the command line no worries imagen

You'll then get this and it will be extracted as expected

imagen

i tested with 0.01 and 0.004 both same

learning rate 1e-5
4160 steps

still same

image

image

when i make 0.0001 it shows very tiny difference but this seems to me wrong

image

@FurkanGozukara
Copy link
Author

I will test with adding train_text_encoder command too ty

@FurkanGozukara
Copy link
Author

by the way difference of Stable Diffusion 1.5 is also very any ideas?

it is 0.0009 - 4160 steps 1e-6 LR

i am using adafactor

@FurkanGozukara
Copy link
Author

I am testing realistic vision 2 on ShivamShriraoDreamBooth colab

I wonder how much text encoder difference it will have

very low LR 4e-7 - 2080 steps

@kohya-ss
Copy link
Owner

kohya-ss commented Oct 5, 2023

I have tested with my dataset, AdamW 8bit optimizer, various learning rates. I found:

  • mixed precision fp16 or no seems to make difference Text Encoder weights.
  • Higher learning rate like 1e-4 also seems to make make difference.

So I believe the scripts and the libraries are fine. However, I don't know why the same settings as before would produce different training results for Text Encoder.

I wrote another script to compare Text Encoder weights. You will find embeddings.token_embedding, some norm weights and biases have a large difference than attention. The LoRA extracting script only take care of attn layers, so the script determines two Text Encoders are same.

import argparse
import torch
from safetensors.torch import load_file

parser = argparse.ArgumentParser()
parser.add_argument("model1", help="path to model1")
parser.add_argument("model2", help="path to model2")
parser.add_argument("--rtol", type=float, default=1e-8, help="relative tolerance")
parser.add_argument("--atol", type=float, default=1e-6, help="absolute tolerance")
parser.add_argument("--bf16", action="store_true", help="use bf16 instead of fp32")
args = parser.parse_args()

model1_path = args.model1
model2_path = args.model2

# Load safetensors or checkpoint from each model path
print("loading models...")
if model1_path.endswith(".safetensors"):
    model1_sd = load_file(model1_path)
else:
    model1_sd = torch.load(model1_path)
if model2_path.endswith(".safetensors"):
    model2_sd = load_file(model2_path)
else:
    model2_sd = torch.load(model2_path)

if "state_dict" in model1_sd:
    model1_sd = model1_sd["state_dict"]
if "state_dict" in model2_sd:
    model2_sd = model2_sd["state_dict"]

# Compare the weights of each model
prefix_to_compare = "cond_stage_model"
print("comparing weights...")
print(f"key,\tall_close,\tmax_diff,\tmean_diff,\tmax_value1,\tmin_value1")
for key in model1_sd.keys():
    if key.startswith(prefix_to_compare):
        if key not in model2_sd:
            print(f"*** Key {key} not found in model2")
            continue
        if model1_sd[key].dtype == torch.long:
            # doesn't compare position ids
            # diff = torch.sum(model1_sd[key] != model2_sd[key])
            # print(f"*** {key}: long, {diff} different values")
            continue
        model1_value = model1_sd[key]
        model2_value = model2_sd[key]
        if args.bf16:
            model1_value = model1_value.to(torch.bfloat16)
            model2_value = model2_value.to(torch.bfloat16)
        model1_value = model1_value.to(torch.float32)
        model2_value = model2_value.to(torch.float32)

        all_close = torch.allclose(model1_value, model2_value, rtol=args.rtol, atol=args.atol)
        diff = torch.abs(model1_sd[key] - model2_sd[key])
        print(
            f"{key},\t{all_close},\t{torch.max(diff)},\t{torch.mean(diff)},\t{torch.max(model1_sd[key])},\t{torch.min(model1_sd[key])}"
        )

@FurkanGozukara
Copy link
Author

@kohya-ss thank you so much

can we say that setting higher text encoder learning rate can be more beneficial in this case?

can we give already different LR for text encoder when doing SD 1.5 or SDXL training?

@bluvoll
Copy link

bluvoll commented Oct 5, 2023

@kohya-ss thank you so much

can we say that setting higher text encoder learning rate can be more beneficial in this case?

can we give already different LR for text encoder when doing SD 1.5 or SDXL training?

afaik it doesn't have a way to specifiy LR for TE.

@AIEXAAA
Copy link

AIEXAAA commented Oct 5, 2023

I may have found the problem, which can be divided into two parts:

1.The initial loss values of SD1.5 training are different, which is related to line 1047 in library\model_util.py. If we change

   # logging.set_verbosity_error()  # don't show annoying warning
   # text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14").to(device)
   # logging.set_verbosity_warning()
   # print(f"config: {text_model.config}")
   cfg = CLIPTextConfig(
       vocab_size=49408,
       hidden_size=768,
       intermediate_size=3072,
       num_hidden_layers=12,
       num_attention_heads=12,
       max_position_embeddings=77,
       hidden_act="quick_gelu",
       layer_norm_eps=1e-05,
       dropout=0.0,
       attention_dropout=0.0,
       initializer_range=0.02,
       initializer_factor=1.0,
       pad_token_id=1,
       bos_token_id=0,
       eos_token_id=2,
       model_type="clip_text_model",
       projection_dim=768,
       torch_dtype="float32",
   )
   text_model = CLIPTextModel._from_config(cfg)

back to

    logging.set_verbosity_error()  # don't show annoying warning
    text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14").to(device)
    logging.set_verbosity_warning()
    print(f"config: {text_model.config}")
    # cfg = CLIPTextConfig(
    #     vocab_size=49408,
    #     hidden_size=768,
    #     intermediate_size=3072,
    #     num_hidden_layers=12,
    #     num_attention_heads=12,
    #     max_position_embeddings=77,
    #     hidden_act="quick_gelu",
    #     layer_norm_eps=1e-05,
    #     dropout=0.0,
    #     attention_dropout=0.0,
    #     initializer_range=0.02,
    #     initializer_factor=1.0,
    #     pad_token_id=1,
    #     bos_token_id=0,
    #     eos_token_id=2,
    #     model_type="clip_text_model",
    #     projection_dim=768,
    #     torch_dtype="float32",
    # )
    # text_model = CLIPTextModel._from_config(cfg)

, the initial values will be the same.

2.The training process of SD1.5 is different, which is related to line 228 in train_network.py. If we delete the following two lines, the training process will be the same:

     if torch.__version__ >= "2.0.0":  # PyTorch 2.0.0 以上対応のxformersなら以下が使える
         vae.set_use_memory_efficient_attention_xformers(args.xformers)

@bluvoll
Copy link

bluvoll commented Oct 5, 2023

sadly no version of SDXL is training text encoder :(
I couldn't find working version with bmaltais/kohya_ss
edit : 3 months old sdxl branch working for some reason

Just add --train_text_encoder as extra parameter and it will train the TE I think this is the intended behavior, as for extracting the lora from the Dreambooth if the TE has been trained enough to be different it will be extracted, but you can force the extraction by changing the value here (Kohya GUI but you can specify it in the command line no worries imagen
You'll then get this and it will be extracted as expected
imagen

i tested with 0.01 and 0.004 both same

learning rate 1e-5 4160 steps

still same

image

image

when i make 0.0001 it shows very tiny difference but this seems to me wrong

image

I had to use 0.000015 LR for it to show differences in about 8k steps, so its very slow, but the extracted lora had a working TE and behaved as expected.

@timoshishi
Copy link

sadly no version of SDXL is training text encoder :(

I couldn't find working version with bmaltais/kohya_ss

edit : 3 months old sdxl branch working for some reason

Can you provide the commit hash for the working branch?

@FurkanGozukara
Copy link
Author

sadly no version of SDXL is training text encoder :(
I couldn't find working version with bmaltais/kohya_ss
edit : 3 months old sdxl branch working for some reason

Can you provide the commit hash for the working branch?

i think it was mistaken but not sure. i will do more research

this is the branch : https://github.com/bmaltais/kohya_ss/tree/sdxl-dev

@kohya-ss
Copy link
Owner

kohya-ss commented Oct 5, 2023

can we say that setting higher text encoder learning rate can be more beneficial in this case?

I don't think so. I think the learning rate for Text Encoder should be lower than the learning rate for U-Net in general.

can we give already different LR for text encoder when doing SD 1.5 or SDXL training?

Unfortunately, it is impossible for SD 1.5. For SDXL, we can use --block_lr option. It specifies 23 values of the learning rate for each U-Net block, like --block_lr 1e-4,2e-4,3e-4,4e-4,5e-4,6e-4,7e-4,8e-4,9e-4,0e-4,1e-5,2e-5,3e-5,4e-5,5e-5,6e-5,7e-5,8e-5,9e-5,0e-4,1e-4,2e-4,3e-4 .

So if we set this option, the default learning rate is used for Text Encoder.

@FurkanGozukara
Copy link
Author

FurkanGozukara commented Oct 5, 2023

@kohya-ss ty

my text encoder enabled training is about to be completed for SDXL with

--train_text_encoder

with this command it is using exactly same VRAM is this expected?

but it is slower like 32%

1 more question

DreamBooth extension of Automatic1111 had use EMA during training option - this was significantly increasing VRAM usage but also quality

You don't have that feature?

@IdiotSandwichTheThird
Copy link

@kohya-ss ty

my text encoder enabled training is about to be completed for SDXL with

--train_text_encoder

with this command it is using exactly same VRAM is this expected?

but it is slower like 32%

1 more question

DreamBooth extension of Automatic1111 had use EMA during training option - this was significantly increasing VRAM usage but also quality

You don't have that feature?

It sounds like you'd enjoy this repo for training more, it has adjustable lr for text/unet, EMA, masked training, etc. https://github.com/Nerogar/OneTrainer

@FurkanGozukara
Copy link
Author

@kohya-ss ty
my text encoder enabled training is about to be completed for SDXL with
--train_text_encoder
with this command it is using exactly same VRAM is this expected?
but it is slower like 32%
1 more question
DreamBooth extension of Automatic1111 had use EMA during training option - this was significantly increasing VRAM usage but also quality
You don't have that feature?

It sounds like you'd enjoy this repo for training more, it has adjustable lr for text/unet, EMA, masked training, etc. https://github.com/Nerogar/OneTrainer

thanks i should experiment and compare

@FurkanGozukara
Copy link
Author

@kohya-ss anyway to set LR for text encoder?

it super fast get cooked :D

https://twitter.com/GozukaraFurkan/status/1710416135747748150

@IdiotSandwichTheThird
Copy link

IdiotSandwichTheThird commented Oct 7, 2023

@kohya-ss
Not sure if you've noticed, but I just tried extracting a lora from 2 models, which I know for sure have different trained text encoders, and I still got the above "Text Encoder is same" message.
I can furthermore confirm the text encoders are different, because each will produce a different image when loaded in comfyUI, see: https://i.imgur.com/xoQpxWo.png

Therefore I think the most likely issue lies simply with extract_lora_from_models.py erroneously thinking the two models are the same.

Edit:
More testing; I have edited extract_lora_from_models.py to always pass true for text encoder different.

    # Text Encoder might be same
    #if not text_encoder_different and torch.max(torch.abs(diff)) > MIN_DIFF:
    text_encoder_different = True
    print(f"Forcing use of text encoder. {torch.max(torch.abs(diff))} > {MIN_DIFF}")

The resulting lora works way better than before: https://i.imgur.com/VChzcw6.jpeg
Left is with skipped TE extract, right with the above modification. The right image is way closer to the style of the trained model.

@kohya-ss
Copy link
Owner

kohya-ss commented Oct 7, 2023

my text encoder enabled training is about to be completed for SDXL with

--train_text_encoder

with this command it is using exactly same VRAM is this expected?

but it is slower like 32%

--train_text_encoder option should increate VRAM usage. But I have less experience for training Text Encoders. It is needed to check the result.

DreamBooth extension of Automatic1111 had use EMA during training option - this was significantly increasing VRAM usage but also quality

You don't have that feature?

Unfortuntaly, there is no EMA feature currently. I would like to support it, but I think other tasks have higher priority. Of course you can use another trainer :)

@kohya-ss
Copy link
Owner

kohya-ss commented Oct 7, 2023

@kohya-ss anyway to set LR for text encoder?

it super fast get cooked :D

As I mentioned on X, we can use --block_lr option to set LRs for each U-Net block. The default learning rate is used to Text Encoder.

@kohya-ss
Copy link
Owner

kohya-ss commented Oct 7, 2023

More testing; I have edited extract_lora_from_models.py to always pass true for text encoder different.

I modified to increase MIN_DIFF before, but it seems to be too large. I will add an option to set MIN_DIFF sooner.

@FurkanGozukara
Copy link
Author

@kohya-ss

I used --block_lr and it works. text encoder not anymore cooked. here some comparisons

https://twitter.com/GozukaraFurkan/status/1710580153665925179

https://twitter.com/GozukaraFurkan/status/1710582243742142532

https://twitter.com/GozukaraFurkan/status/1710609957626810825

@kohya-ss
Copy link
Owner

kohya-ss commented Oct 9, 2023

I used --block_lr and it works. text encoder not anymore cooked. here some comparisons

That's nice! I didn't know the prompt for images, but I feel the right image might represent well the prompt, for example the style and the background.

@mykeehu
Copy link

mykeehu commented Oct 14, 2023

I found it difficult to follow the dialogue, because there are other things at stake. Has the Text Encoder problem been fixed under SD 1.5 or not?

@AIEXAAA
Copy link

AIEXAAA commented Oct 16, 2023

I found it difficult to follow the dialogue, because there are other things at stake. Has the Text Encoder problem been fixed under SD 1.5 or not?

The issue I tested still exists in the new version, and the LoRa trained cannot be used. You can try the modifications I mentioned earlier, it may be useful to you.

@mykeehu
Copy link

mykeehu commented Oct 20, 2023

The issue I tested still exists in the new version, and the LoRa trained cannot be used. You can try the modifications I mentioned earlier, it may be useful to you.

I compared the config, and there was only one line difference: torch_dtype="float32" instead of torch_dtype=null.
I guess the part in train_network.py is because of torch 2.0, and that's why it was changed from null to float32 in the config. I don't have any other idea, because I guess the two are related.

I'm now using version 21.8.4 of the GUI, which @FurkanGozukara claims still had good training for SD 1.5 (and I did make good Lora's with it), and it already had the parameters you describe, so it's more likely that the bug is elsewhere.

@AIEXAAA
Copy link

AIEXAAA commented Oct 20, 2023

The issue I tested still exists in the new version, and the LoRa trained cannot be used. You can try the modifications I mentioned earlier, it may be useful to you.

I compared the config, and there was only one line difference: torch_dtype="float32" instead of torch_dtype=null. I guess the part in train_network.py is because of torch 2.0, and that's why it was changed from null to float32 in the config. I don't have any other idea, because I guess the two are related.

I'm now using version 21.8.4 of the GUI, which @FurkanGozukara claims still had good training for SD 1.5 (and I did make good Lora's with it), and it already had the parameters you describe, so it's more likely that the bug is elsewhere.

I’m not sure where the problem lies, but you might be right.

For me, the so-called correctness is to reproduce the training results of SD1.5 before introducing SDXL. I found that when the author does not quote “openai/clip-vit-large-patch14”, the initial loss function of training will be different. And when the author later introduces

    if torch.__version__ >= "2.0.0":  # PyTorch 2.0.0 以上対応のxformersなら以下が使える
        vae.set_use_memory_efficient_attention_xformers(args.xformers)

the trained SD1.5 lora will be completely damaged.

As for what you said about torch_dtype=“float32” , at this time we have already abandoned the reference to “openai/clip-vit-large-patch14”, and the training results are already different from before.

@FurkanGozukara
Copy link
Author

i am not sure but SDXL training is far superior atm

here you can see my pictures : i shared 180+ : https://civitai.com/user/SECourses

best config : https://www.patreon.com/posts/89213064

quick tutorial : https://www.youtube.com/watch?v=EEV8RPohsbw

@mykeehu
Copy link

mykeehu commented Oct 20, 2023

@AIEXAAA I looked at this link and matched the parameters to the ones in the .py file and only that one line is different. Since torch 2.x has been made the default in the new kohya versions, I assume there is a correlation.
https://huggingface.co/openai/clip-vit-large-patch14/blob/main/config.json

@FurkanGozukara thanks, but I want to train SD 1.5, not SDXL.

@FurkanGozukara
Copy link
Author

@AIEXAAA I looked at this link and matched the parameters to the ones in the .py file and only that one line is different. Since torch 2.x has been made the default in the new kohya versions, I assume there is a correlation. https://huggingface.co/openai/clip-vit-large-patch14/blob/main/config.json

@FurkanGozukara thanks, but I want to train SD 1.5, not SDXL.

for sd 1.5 i am still in research

my older tutorial still working great though since it has EMA support too

https://youtu.be/g0wXIcRhkJk

@AIEXAAA
Copy link

AIEXAAA commented Oct 20, 2023

@AIEXAAA I looked at this link and matched the parameters to the ones in the .py file and only that one line is different. Since torch 2.x has been made the default in the new kohya versions, I assume there is a correlation. https://huggingface.co/openai/clip-vit-large-patch14/blob/main/config.json

I think I roughly understand what you’re saying, and when

“print(f"config: {text_model.config}”)

is displayed, the value is consistent with the author’s default cfg, but the problem still results in different outcomes.

As for the PyTorch issue, even if I update to 2.0 or 2.01, or even update this training program to the latest version, as long as I modify it in the way I mentioned earlier, then the results of SD1.5 lora training will be consistent with before introducing SDXL. Therefore, it’s hard to assert that it is related to PyTorch 2.0.

@kohya-ss
Copy link
Owner

I think this issue is already solved, but #890 seems to exists. I will work on #890.

@mykeehu
Copy link

mykeehu commented Oct 22, 2023

I'm glad you found the source of the problem. Looking forward to the fix! :)

@WarAnakin
Copy link

Hello everyone,
I am pretty glad to see someone finally was able to identify this issue.
I am the creator and founder of Team Crystal Clear. Some of you might be familiar with the name, to others it may be new.
This is an issue i myself brought up in august when I first trained Crystal Clear XL. Unfortunately at the time, everyone I mentioned the fact that I am unable to properly train on kohya due to faulty text encoders, I was dismissed and told this is a me related issue. So, presented with no other choice, my team and I got to work and fixed the issue so we can properly train Crystal Clear XL. It's not in my nature to make available broken releases given how most of the work we do are commissions for game developers, the automotive industry, stable diffusion service providers, the brands and apparel industry, instagram and only fans influencers and models, and many other various businesses.
This means that pretty much every checkpoint other than CCXL was trained on broken text encoders since august.
Now, I don't have the time to go through all this comments to know if the issue is fixed or not, but i'm looking forward to see how the future changes compare to the ones we made. And as for kohya, you might not remember, but i did bring this up to you on the civitai discord back in august.

@FurkanGozukara
Copy link
Author

Hello everyone, I am pretty glad to see someone finally was able to identify this issue. I am the creator and founder of Team Crystal Clear. Some of you might be familiar with the name, to others it may be new. This is an issue i myself brought up in august when I first trained Crystal Clear XL. Unfortunately at the time, everyone I mentioned the fact that I am unable to properly train on kohya due to faulty text encoders, I was dismissed and told this is a me related issue. So, presented with no other choice, my team and I got to work and fixed the issue so we can properly train Crystal Clear XL. It's not in my nature to make available broken releases given how most of the work we do are commissions for game developers, the automotive industry, stable diffusion service providers, the brands and apparel industry, instagram and only fans influencers and models, and many other various businesses. This means that pretty much every checkpoint other than CCXL was trained on broken text encoders since august. Now, I don't have the time to go through all this comments to know if the issue is fixed or not, but i'm looking forward to see how the future changes compare to the ones we made. And as for kohya, you might not remember, but i did bring this up to you on the civitai discord back in august.

I am also doing training for companies. So far only using UNET training. Results are great but after text encoder I am hoping we will get even better results

@WarAnakin
Copy link

Hello everyone, I am pretty glad to see someone finally was able to identify this issue. I am the creator and founder of Team Crystal Clear. Some of you might be familiar with the name, to others it may be new. This is an issue i myself brought up in august when I first trained Crystal Clear XL. Unfortunately at the time, everyone I mentioned the fact that I am unable to properly train on kohya due to faulty text encoders, I was dismissed and told this is a me related issue. So, presented with no other choice, my team and I got to work and fixed the issue so we can properly train Crystal Clear XL. It's not in my nature to make available broken releases given how most of the work we do are commissions for game developers, the automotive industry, stable diffusion service providers, the brands and apparel industry, instagram and only fans influencers and models, and many other various businesses. This means that pretty much every checkpoint other than CCXL was trained on broken text encoders since august. Now, I don't have the time to go through all this comments to know if the issue is fixed or not, but i'm looking forward to see how the future changes compare to the ones we made. And as for kohya, you might not remember, but i did bring this up to you on the civitai discord back in august.

I am also doing training for companies. So far only using UNET training. Results are great but after text encoder I am hoping we will get even better results

It's great to meet you Furkan. I've always found the research you do and the dedication you have towards stable diffusion, nothing short of outstanding. You are a wonderful content maker and I fully support and recommend your work.

@BrennenRB
Copy link

I have been struggling with the faulty text encoder for the last few weeks, and was hoping that it would be fixed with the November 11 [v21.1.1] update, but that does not seem to be the case. I am still getting "Text encoder is same. Extract U-Net only." when extracting LoRAs. Is anyone else having this problem? Found workarounds? Know when it will be fixed?

@FurkanGozukara
Copy link
Author

I have been struggling with the faulty text encoder for the last few weeks, and was hoping that it would be fixed with the November 11 [v21.1.1] update, but that does not seem to be the case. I am still getting "Text encoder is same. Extract U-Net only." when extracting LoRAs. Is anyone else having this problem? Found workarounds? Know when it will be fixed?

i am using bmaltais GUI dev2 branch and working great SDXL training

@mykeehu
Copy link

mykeehu commented Nov 14, 2023

SD 1.5 TE is still not good for Lora training. Yesterday I tried the same training under 21.8.4 GUI and 22.1.1 (with updated kohya script) and got completely different results, in the latest version it was overcooked by the third epoch, while in 21.8.4 I got a perfect Lora.

@BrennenRB
Copy link

I have been struggling with the faulty text encoder for the last few weeks, and was hoping that it would be fixed with the November 11 [v21.1.1] update, but that does not seem to be the case. I am still getting "Text encoder is same. Extract U-Net only." when extracting LoRAs. Is anyone else having this problem? Found workarounds? Know when it will be fixed?

i am using bmaltais GUI dev2 branch and working great SDXL training

Are you using Dreambooth or Finetune in the dev2 branch?

@suede299
Copy link

I trained loha on sdxl with the last two updates, tried various parameters and always had a hard time getting satisfactory results, could be due to a couple things.

  1. an optimizer that can automatically determine the learning rate and can't specify different learning rates for te and unet.
    2.train network.py cannot set Stop text encoder training.
  2. load LoRA network weights does not load Lycoris correctly.
  3. sdxl's two text encoders are not separated.
    Wanted to train the missing character expressions from the sdxl base model into the same lora, so chose loha, but couldn't actually get anything that worked at all.

@DarkAlchy
Copy link

error: unrecognized arguments: --train_text_encoder Apparently Kohya has removed this for 1.5 training and when the model for Dreambooth is only 2GB you know it does not have the TE when the model it trained from is 4.7GB.

@FurkanGozukara
Copy link
Author

sd 1.5 trains by default TE

@DarkAlchy
Copy link

Didn't for me, but I don't use 1.5 since 2.0 was released, just had to use it to help LyCORIS test something.

@kohya-ss
Copy link
Owner

@DarkAlchy train_db.py trains Text Encoder by default, for SD 1.5 and 2.0/2.1, as FurkanGozukara said.

If you do not want to train Text Encode, please add an option --stop_text_encoder_training=-1.

@DarkAlchy
Copy link

@DarkAlchy train_db.py trains Text Encoder by default, for SD 1.5 and 2.0/2.1, as FurkanGozukara said.

If you do not want to train Text Encode, please add an option --stop_text_encoder_training=-1.

=-1? Alright.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests