Text encoder of SD 1.5 model is not trained which is not supposed to happen #855

FurkanGozukara · 2023-10-04T11:28:01Z

Here the executed command

accelerate launch --num_cpu_threads_per_process=2 "./train_db.py" --pretrained_model_name_or_path="/workspace/stable-diffusion-webui/models/Stable-diffusion/Realistic_Vision_V5.1.safetensors" --train_data_dir="/workspace/stable-diffusion-webui/models/Stable-diffusion/img" --reg_data_dir="/workspace/stable-diffusion-webui/models/Stable-diffusion/reg" --resolution="768,768" --output_dir="/workspace/stable-diffusion-webui/models/Stable-diffusion/model" --logging_dir="/workspace/stable-diffusion-webui/models/Stable-diffusion/log" --save_model_as=safetensors --full_bf16 --output_name="me_1e7" --lr_scheduler_num_cycles="4" --max_data_loader_n_workers="0" --learning_rate="1e-07" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="4160" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --cache_latents --cache_latents_to_disk --optimizer_type="Adafactor" --optimizer_args scale_parameter=False relative_step=False warmup_init=False weight_decay=0.01 --max_data_loader_n_workers="0" --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0

When text encoder is not trained it is supposed to print Text Encoder is not trained.

This message is not printed either

    train_text_encoder = args.stop_text_encoder_training is None or args.stop_text_encoder_training >= 0
    unet.requires_grad_(True)  # 念のため追加
    text_encoder.requires_grad_(train_text_encoder)
    if not train_text_encoder:
        accelerator.print("Text Encoder is not trained.")

So how do I know text encoder were not trained? Because I extracted LoRA and it says text encoder is same

I did 30 trainings and so many trainings are wasted because of this bug :/

@kohya-ss

The text was updated successfully, but these errors were encountered:

FurkanGozukara · 2023-10-04T12:22:44Z

I manually set train text encoder true and added --stop_text_encoder_training 999999

But still lora extractor is saying text encoder is same

kohya-ss · 2023-10-04T13:07:59Z

I could reproduce the issue with same and some other settings.

I also trained with the previous version, tag v0.6.6, and Text Encoder is trained. train_db.py is almost identical in both version, so I think the most likely cause is one or some of dependent libraries.
I will check it sooner. However, since it means that there is probably nothing wrong with train_db.py, it may take some time to find the cause.

FurkanGozukara · 2023-10-04T13:10:52Z

I could reproduce the issue with same and some other settings.

I also trained with the previous version, tag v0.6.6, and Text Encoder is trained. train_db.py is almost identical in both version, so I think the most likely cause is one or some of dependent libraries. I will check it sooner. However, since it means that there is probably nothing wrong with train_db.py, it may take some time to find the cause.

Thank you so much looking forward to solution. I am pretty sure those transformers diffusers or accelerator one of them broken

You are doing incredible job

kohya-ss · 2023-10-04T13:12:33Z

I hope so too, but if there is something wrong with my script, I apologize.

Thank you so much looking forward to solution. I am pretty sure those transformers diffusers or accelerator one of them broken

I hope so too, but if there is something wrong with my script, I apologize.

FurkanGozukara · 2023-10-04T19:43:43Z

I hope so too, but if there is something wrong with my script, I apologize.

Thank you so much looking forward to solution. I am pretty sure those transformers diffusers or accelerator one of them broken

I hope so too, but if there is something wrong with my script, I apologize.

SDXL text encoder is also not trained

FurkanGozukara · 2023-10-04T20:09:50Z

sadly no version of SDXL is training text encoder :(

I couldn't find working version with bmaltais/kohya_ss

edit : 3 months old sdxl branch working for some reason

bluvoll · 2023-10-05T04:54:09Z

sadly no version of SDXL is training text encoder :(

I couldn't find working version with bmaltais/kohya_ss

edit : 3 months old sdxl branch working for some reason

Just add --train_text_encoder as extra parameter and it will train the TE I think this is the intended behavior, as for extracting the lora from the Dreambooth if the TE has been trained enough to be different it will be extracted, but you can force the extraction by changing the value here (Kohya GUI but you can specify it in the command line no worries

You'll then get this and it will be extracted as expected

FurkanGozukara · 2023-10-05T09:09:43Z

sadly no version of SDXL is training text encoder :(
I couldn't find working version with bmaltais/kohya_ss
edit : 3 months old sdxl branch working for some reason

Just add --train_text_encoder as extra parameter and it will train the TE I think this is the intended behavior, as for extracting the lora from the Dreambooth if the TE has been trained enough to be different it will be extracted, but you can force the extraction by changing the value here (Kohya GUI but you can specify it in the command line no worries

You'll then get this and it will be extracted as expected

i tested with 0.01 and 0.004 both same

learning rate 1e-5
4160 steps

still same

when i make 0.0001 it shows very tiny difference but this seems to me wrong

FurkanGozukara · 2023-10-05T09:47:53Z

I will test with adding train_text_encoder command too ty

FurkanGozukara · 2023-10-05T10:17:31Z

by the way difference of Stable Diffusion 1.5 is also very any ideas?

it is 0.0009 - 4160 steps 1e-6 LR

i am using adafactor

FurkanGozukara · 2023-10-05T11:26:44Z

I am testing realistic vision 2 on ShivamShriraoDreamBooth colab

I wonder how much text encoder difference it will have

very low LR 4e-7 - 2080 steps

kohya-ss · 2023-10-05T11:51:15Z

I have tested with my dataset, AdamW 8bit optimizer, various learning rates. I found:

mixed precision fp16 or no seems to make difference Text Encoder weights.
Higher learning rate like 1e-4 also seems to make make difference.

So I believe the scripts and the libraries are fine. However, I don't know why the same settings as before would produce different training results for Text Encoder.

I wrote another script to compare Text Encoder weights. You will find embeddings.token_embedding, some norm weights and biases have a large difference than attention. The LoRA extracting script only take care of attn layers, so the script determines two Text Encoders are same.

import argparse
import torch
from safetensors.torch import load_file

parser = argparse.ArgumentParser()
parser.add_argument("model1", help="path to model1")
parser.add_argument("model2", help="path to model2")
parser.add_argument("--rtol", type=float, default=1e-8, help="relative tolerance")
parser.add_argument("--atol", type=float, default=1e-6, help="absolute tolerance")
parser.add_argument("--bf16", action="store_true", help="use bf16 instead of fp32")
args = parser.parse_args()

model1_path = args.model1
model2_path = args.model2

# Load safetensors or checkpoint from each model path
print("loading models...")
if model1_path.endswith(".safetensors"):
    model1_sd = load_file(model1_path)
else:
    model1_sd = torch.load(model1_path)
if model2_path.endswith(".safetensors"):
    model2_sd = load_file(model2_path)
else:
    model2_sd = torch.load(model2_path)

if "state_dict" in model1_sd:
    model1_sd = model1_sd["state_dict"]
if "state_dict" in model2_sd:
    model2_sd = model2_sd["state_dict"]

# Compare the weights of each model
prefix_to_compare = "cond_stage_model"
print("comparing weights...")
print(f"key,\tall_close,\tmax_diff,\tmean_diff,\tmax_value1,\tmin_value1")
for key in model1_sd.keys():
    if key.startswith(prefix_to_compare):
        if key not in model2_sd:
            print(f"*** Key {key} not found in model2")
            continue
        if model1_sd[key].dtype == torch.long:
            # doesn't compare position ids
            # diff = torch.sum(model1_sd[key] != model2_sd[key])
            # print(f"*** {key}: long, {diff} different values")
            continue
        model1_value = model1_sd[key]
        model2_value = model2_sd[key]
        if args.bf16:
            model1_value = model1_value.to(torch.bfloat16)
            model2_value = model2_value.to(torch.bfloat16)
        model1_value = model1_value.to(torch.float32)
        model2_value = model2_value.to(torch.float32)

        all_close = torch.allclose(model1_value, model2_value, rtol=args.rtol, atol=args.atol)
        diff = torch.abs(model1_sd[key] - model2_sd[key])
        print(
            f"{key},\t{all_close},\t{torch.max(diff)},\t{torch.mean(diff)},\t{torch.max(model1_sd[key])},\t{torch.min(model1_sd[key])}"
        )

FurkanGozukara · 2023-10-05T12:26:13Z

@kohya-ss thank you so much

can we say that setting higher text encoder learning rate can be more beneficial in this case?

can we give already different LR for text encoder when doing SD 1.5 or SDXL training?

bluvoll · 2023-10-05T12:54:43Z

@kohya-ss thank you so much

can we say that setting higher text encoder learning rate can be more beneficial in this case?

can we give already different LR for text encoder when doing SD 1.5 or SDXL training?

afaik it doesn't have a way to specifiy LR for TE.

AIEXAAA · 2023-10-05T13:55:14Z

I may have found the problem, which can be divided into two parts:

1.The initial loss values of SD1.5 training are different, which is related to line 1047 in library\model_util.py. If we change

   # logging.set_verbosity_error()  # don't show annoying warning
   # text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14").to(device)
   # logging.set_verbosity_warning()
   # print(f"config: {text_model.config}")
   cfg = CLIPTextConfig(
       vocab_size=49408,
       hidden_size=768,
       intermediate_size=3072,
       num_hidden_layers=12,
       num_attention_heads=12,
       max_position_embeddings=77,
       hidden_act="quick_gelu",
       layer_norm_eps=1e-05,
       dropout=0.0,
       attention_dropout=0.0,
       initializer_range=0.02,
       initializer_factor=1.0,
       pad_token_id=1,
       bos_token_id=0,
       eos_token_id=2,
       model_type="clip_text_model",
       projection_dim=768,
       torch_dtype="float32",
   )
   text_model = CLIPTextModel._from_config(cfg)

back to

    logging.set_verbosity_error()  # don't show annoying warning
    text_model = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14").to(device)
    logging.set_verbosity_warning()
    print(f"config: {text_model.config}")
    # cfg = CLIPTextConfig(
    #     vocab_size=49408,
    #     hidden_size=768,
    #     intermediate_size=3072,
    #     num_hidden_layers=12,
    #     num_attention_heads=12,
    #     max_position_embeddings=77,
    #     hidden_act="quick_gelu",
    #     layer_norm_eps=1e-05,
    #     dropout=0.0,
    #     attention_dropout=0.0,
    #     initializer_range=0.02,
    #     initializer_factor=1.0,
    #     pad_token_id=1,
    #     bos_token_id=0,
    #     eos_token_id=2,
    #     model_type="clip_text_model",
    #     projection_dim=768,
    #     torch_dtype="float32",
    # )
    # text_model = CLIPTextModel._from_config(cfg)

, the initial values will be the same.

2.The training process of SD1.5 is different, which is related to line 228 in train_network.py. If we delete the following two lines, the training process will be the same：

     if torch.__version__ >= "2.0.0":  # PyTorch 2.0.0 以上対応のxformersなら以下が使える
         vae.set_use_memory_efficient_attention_xformers(args.xformers)

bluvoll · 2023-10-05T14:20:47Z

sadly no version of SDXL is training text encoder :(
I couldn't find working version with bmaltais/kohya_ss
edit : 3 months old sdxl branch working for some reason

Just add --train_text_encoder as extra parameter and it will train the TE I think this is the intended behavior, as for extracting the lora from the Dreambooth if the TE has been trained enough to be different it will be extracted, but you can force the extraction by changing the value here (Kohya GUI but you can specify it in the command line no worries
You'll then get this and it will be extracted as expected

i tested with 0.01 and 0.004 both same

learning rate 1e-5 4160 steps

still same

when i make 0.0001 it shows very tiny difference but this seems to me wrong

I had to use 0.000015 LR for it to show differences in about 8k steps, so its very slow, but the extracted lora had a working TE and behaved as expected.

timoshishi · 2023-10-05T15:28:54Z

sadly no version of SDXL is training text encoder :(

I couldn't find working version with bmaltais/kohya_ss

edit : 3 months old sdxl branch working for some reason

Can you provide the commit hash for the working branch?

FurkanGozukara · 2023-10-05T15:30:10Z

sadly no version of SDXL is training text encoder :(
I couldn't find working version with bmaltais/kohya_ss
edit : 3 months old sdxl branch working for some reason

Can you provide the commit hash for the working branch?

i think it was mistaken but not sure. i will do more research

this is the branch : https://github.com/bmaltais/kohya_ss/tree/sdxl-dev

kohya-ss · 2023-10-05T23:25:33Z

can we say that setting higher text encoder learning rate can be more beneficial in this case?

I don't think so. I think the learning rate for Text Encoder should be lower than the learning rate for U-Net in general.

can we give already different LR for text encoder when doing SD 1.5 or SDXL training?

Unfortunately, it is impossible for SD 1.5. For SDXL, we can use --block_lr option. It specifies 23 values of the learning rate for each U-Net block, like --block_lr 1e-4,2e-4,3e-4,4e-4,5e-4,6e-4,7e-4,8e-4,9e-4,0e-4,1e-5,2e-5,3e-5,4e-5,5e-5,6e-5,7e-5,8e-5,9e-5,0e-4,1e-4,2e-4,3e-4 .

So if we set this option, the default learning rate is used for Text Encoder.

FurkanGozukara · 2023-10-05T23:58:20Z

@kohya-ss ty

my text encoder enabled training is about to be completed for SDXL with

--train_text_encoder

with this command it is using exactly same VRAM is this expected?

but it is slower like 32%

1 more question

DreamBooth extension of Automatic1111 had use EMA during training option - this was significantly increasing VRAM usage but also quality

You don't have that feature?

IdiotSandwichTheThird · 2023-10-06T01:36:57Z

@kohya-ss ty

my text encoder enabled training is about to be completed for SDXL with

--train_text_encoder

with this command it is using exactly same VRAM is this expected?

but it is slower like 32%

1 more question

DreamBooth extension of Automatic1111 had use EMA during training option - this was significantly increasing VRAM usage but also quality

You don't have that feature?

It sounds like you'd enjoy this repo for training more, it has adjustable lr for text/unet, EMA, masked training, etc. https://github.com/Nerogar/OneTrainer

FurkanGozukara · 2023-10-06T16:07:38Z

@kohya-ss ty
my text encoder enabled training is about to be completed for SDXL with
--train_text_encoder
with this command it is using exactly same VRAM is this expected?
but it is slower like 32%
1 more question
DreamBooth extension of Automatic1111 had use EMA during training option - this was significantly increasing VRAM usage but also quality
You don't have that feature?

It sounds like you'd enjoy this repo for training more, it has adjustable lr for text/unet, EMA, masked training, etc. https://github.com/Nerogar/OneTrainer

thanks i should experiment and compare

FurkanGozukara · 2023-10-06T22:13:35Z

@kohya-ss anyway to set LR for text encoder?

it super fast get cooked :D

https://twitter.com/GozukaraFurkan/status/1710416135747748150

IdiotSandwichTheThird · 2023-10-07T05:24:28Z

@kohya-ss
Not sure if you've noticed, but I just tried extracting a lora from 2 models, which I know for sure have different trained text encoders, and I still got the above "Text Encoder is same" message.
I can furthermore confirm the text encoders are different, because each will produce a different image when loaded in comfyUI, see: https://i.imgur.com/xoQpxWo.png

Therefore I think the most likely issue lies simply with extract_lora_from_models.py erroneously thinking the two models are the same.

Edit:
More testing; I have edited extract_lora_from_models.py to always pass true for text encoder different.

    # Text Encoder might be same
    #if not text_encoder_different and torch.max(torch.abs(diff)) > MIN_DIFF:
    text_encoder_different = True
    print(f"Forcing use of text encoder. {torch.max(torch.abs(diff))} > {MIN_DIFF}")

The resulting lora works way better than before: https://i.imgur.com/VChzcw6.jpeg
Left is with skipped TE extract, right with the above modification. The right image is way closer to the style of the trained model.

kohya-ss · 2023-10-07T10:13:02Z

my text encoder enabled training is about to be completed for SDXL with

--train_text_encoder

with this command it is using exactly same VRAM is this expected?

but it is slower like 32%

--train_text_encoder option should increate VRAM usage. But I have less experience for training Text Encoders. It is needed to check the result.

DreamBooth extension of Automatic1111 had use EMA during training option - this was significantly increasing VRAM usage but also quality

You don't have that feature?

Unfortuntaly, there is no EMA feature currently. I would like to support it, but I think other tasks have higher priority. Of course you can use another trainer :)

kohya-ss · 2023-10-07T10:14:06Z

@kohya-ss anyway to set LR for text encoder?

it super fast get cooked :D

As I mentioned on X, we can use --block_lr option to set LRs for each U-Net block. The default learning rate is used to Text Encoder.

kohya-ss · 2023-10-07T10:15:08Z

More testing; I have edited extract_lora_from_models.py to always pass true for text encoder different.

I modified to increase MIN_DIFF before, but it seems to be too large. I will add an option to set MIN_DIFF sooner.

FurkanGozukara · 2023-10-07T10:56:23Z

@kohya-ss

I used --block_lr and it works. text encoder not anymore cooked. here some comparisons

https://twitter.com/GozukaraFurkan/status/1710580153665925179

https://twitter.com/GozukaraFurkan/status/1710582243742142532

https://twitter.com/GozukaraFurkan/status/1710609957626810825

kohya-ss · 2023-10-09T02:36:53Z

I used --block_lr and it works. text encoder not anymore cooked. here some comparisons

That's nice! I didn't know the prompt for images, but I feel the right image might represent well the prompt, for example the style and the background.

mykeehu · 2023-10-14T08:32:36Z

I found it difficult to follow the dialogue, because there are other things at stake. Has the Text Encoder problem been fixed under SD 1.5 or not?

AIEXAAA · 2023-10-16T13:57:37Z

I found it difficult to follow the dialogue, because there are other things at stake. Has the Text Encoder problem been fixed under SD 1.5 or not?

The issue I tested still exists in the new version, and the LoRa trained cannot be used. You can try the modifications I mentioned earlier, it may be useful to you.

mykeehu · 2023-10-20T06:14:17Z

The issue I tested still exists in the new version, and the LoRa trained cannot be used. You can try the modifications I mentioned earlier, it may be useful to you.

I compared the config, and there was only one line difference: torch_dtype="float32" instead of torch_dtype=null.
I guess the part in train_network.py is because of torch 2.0, and that's why it was changed from null to float32 in the config. I don't have any other idea, because I guess the two are related.

I'm now using version 21.8.4 of the GUI, which @FurkanGozukara claims still had good training for SD 1.5 (and I did make good Lora's with it), and it already had the parameters you describe, so it's more likely that the bug is elsewhere.

AIEXAAA · 2023-10-20T10:14:47Z

The issue I tested still exists in the new version, and the LoRa trained cannot be used. You can try the modifications I mentioned earlier, it may be useful to you.

I compared the config, and there was only one line difference: torch_dtype="float32" instead of torch_dtype=null. I guess the part in train_network.py is because of torch 2.0, and that's why it was changed from null to float32 in the config. I don't have any other idea, because I guess the two are related.

I'm now using version 21.8.4 of the GUI, which @FurkanGozukara claims still had good training for SD 1.5 (and I did make good Lora's with it), and it already had the parameters you describe, so it's more likely that the bug is elsewhere.

I’m not sure where the problem lies, but you might be right.

For me, the so-called correctness is to reproduce the training results of SD1.5 before introducing SDXL. I found that when the author does not quote “openai/clip-vit-large-patch14”, the initial loss function of training will be different. And when the author later introduces

    if torch.__version__ >= "2.0.0":  # PyTorch 2.0.0 以上対応のxformersなら以下が使える
        vae.set_use_memory_efficient_attention_xformers(args.xformers)

the trained SD1.5 lora will be completely damaged.

As for what you said about torch_dtype=“float32” , at this time we have already abandoned the reference to “openai/clip-vit-large-patch14”, and the training results are already different from before.

FurkanGozukara · 2023-10-20T10:20:11Z

i am not sure but SDXL training is far superior atm

here you can see my pictures : i shared 180+ : https://civitai.com/user/SECourses

best config : https://www.patreon.com/posts/89213064

quick tutorial : https://www.youtube.com/watch?v=EEV8RPohsbw

mykeehu · 2023-10-20T12:27:12Z

@AIEXAAA I looked at this link and matched the parameters to the ones in the .py file and only that one line is different. Since torch 2.x has been made the default in the new kohya versions, I assume there is a correlation.
https://huggingface.co/openai/clip-vit-large-patch14/blob/main/config.json

@FurkanGozukara thanks, but I want to train SD 1.5, not SDXL.

FurkanGozukara · 2023-10-20T12:34:38Z

@AIEXAAA I looked at this link and matched the parameters to the ones in the .py file and only that one line is different. Since torch 2.x has been made the default in the new kohya versions, I assume there is a correlation. https://huggingface.co/openai/clip-vit-large-patch14/blob/main/config.json

@FurkanGozukara thanks, but I want to train SD 1.5, not SDXL.

for sd 1.5 i am still in research

my older tutorial still working great though since it has EMA support too

https://youtu.be/g0wXIcRhkJk

AIEXAAA · 2023-10-20T13:30:58Z

@AIEXAAA I looked at this link and matched the parameters to the ones in the .py file and only that one line is different. Since torch 2.x has been made the default in the new kohya versions, I assume there is a correlation. https://huggingface.co/openai/clip-vit-large-patch14/blob/main/config.json

I think I roughly understand what you’re saying, and when

“print(f"config: {text_model.config}”)

is displayed, the value is consistent with the author’s default cfg, but the problem still results in different outcomes.

As for the PyTorch issue, even if I update to 2.0 or 2.01, or even update this training program to the latest version, as long as I modify it in the way I mentioned earlier, then the results of SD1.5 lora training will be consistent with before introducing SDXL. Therefore, it’s hard to assert that it is related to PyTorch 2.0.

kohya-ss · 2023-10-22T08:08:06Z

I think this issue is already solved, but #890 seems to exists. I will work on #890.

mykeehu · 2023-10-22T08:59:14Z

I'm glad you found the source of the problem. Looking forward to the fix! :)

WarAnakin · 2023-10-28T20:59:54Z

Hello everyone,
I am pretty glad to see someone finally was able to identify this issue.
I am the creator and founder of Team Crystal Clear. Some of you might be familiar with the name, to others it may be new.
This is an issue i myself brought up in august when I first trained Crystal Clear XL. Unfortunately at the time, everyone I mentioned the fact that I am unable to properly train on kohya due to faulty text encoders, I was dismissed and told this is a me related issue. So, presented with no other choice, my team and I got to work and fixed the issue so we can properly train Crystal Clear XL. It's not in my nature to make available broken releases given how most of the work we do are commissions for game developers, the automotive industry, stable diffusion service providers, the brands and apparel industry, instagram and only fans influencers and models, and many other various businesses.
This means that pretty much every checkpoint other than CCXL was trained on broken text encoders since august.
Now, I don't have the time to go through all this comments to know if the issue is fixed or not, but i'm looking forward to see how the future changes compare to the ones we made. And as for kohya, you might not remember, but i did bring this up to you on the civitai discord back in august.

FurkanGozukara · 2023-10-28T21:04:07Z

Hello everyone, I am pretty glad to see someone finally was able to identify this issue. I am the creator and founder of Team Crystal Clear. Some of you might be familiar with the name, to others it may be new. This is an issue i myself brought up in august when I first trained Crystal Clear XL. Unfortunately at the time, everyone I mentioned the fact that I am unable to properly train on kohya due to faulty text encoders, I was dismissed and told this is a me related issue. So, presented with no other choice, my team and I got to work and fixed the issue so we can properly train Crystal Clear XL. It's not in my nature to make available broken releases given how most of the work we do are commissions for game developers, the automotive industry, stable diffusion service providers, the brands and apparel industry, instagram and only fans influencers and models, and many other various businesses. This means that pretty much every checkpoint other than CCXL was trained on broken text encoders since august. Now, I don't have the time to go through all this comments to know if the issue is fixed or not, but i'm looking forward to see how the future changes compare to the ones we made. And as for kohya, you might not remember, but i did bring this up to you on the civitai discord back in august.

I am also doing training for companies. So far only using UNET training. Results are great but after text encoder I am hoping we will get even better results

WarAnakin · 2023-10-28T21:28:56Z

Hello everyone, I am pretty glad to see someone finally was able to identify this issue. I am the creator and founder of Team Crystal Clear. Some of you might be familiar with the name, to others it may be new. This is an issue i myself brought up in august when I first trained Crystal Clear XL. Unfortunately at the time, everyone I mentioned the fact that I am unable to properly train on kohya due to faulty text encoders, I was dismissed and told this is a me related issue. So, presented with no other choice, my team and I got to work and fixed the issue so we can properly train Crystal Clear XL. It's not in my nature to make available broken releases given how most of the work we do are commissions for game developers, the automotive industry, stable diffusion service providers, the brands and apparel industry, instagram and only fans influencers and models, and many other various businesses. This means that pretty much every checkpoint other than CCXL was trained on broken text encoders since august. Now, I don't have the time to go through all this comments to know if the issue is fixed or not, but i'm looking forward to see how the future changes compare to the ones we made. And as for kohya, you might not remember, but i did bring this up to you on the civitai discord back in august.

I am also doing training for companies. So far only using UNET training. Results are great but after text encoder I am hoping we will get even better results

It's great to meet you Furkan. I've always found the research you do and the dedication you have towards stable diffusion, nothing short of outstanding. You are a wonderful content maker and I fully support and recommend your work.

BrennenRB · 2023-11-14T16:08:13Z

I have been struggling with the faulty text encoder for the last few weeks, and was hoping that it would be fixed with the November 11 [v21.1.1] update, but that does not seem to be the case. I am still getting "Text encoder is same. Extract U-Net only." when extracting LoRAs. Is anyone else having this problem? Found workarounds? Know when it will be fixed?

FurkanGozukara · 2023-11-14T16:17:11Z

I have been struggling with the faulty text encoder for the last few weeks, and was hoping that it would be fixed with the November 11 [v21.1.1] update, but that does not seem to be the case. I am still getting "Text encoder is same. Extract U-Net only." when extracting LoRAs. Is anyone else having this problem? Found workarounds? Know when it will be fixed?

i am using bmaltais GUI dev2 branch and working great SDXL training

mykeehu · 2023-11-14T16:42:33Z

SD 1.5 TE is still not good for Lora training. Yesterday I tried the same training under 21.8.4 GUI and 22.1.1 (with updated kohya script) and got completely different results, in the latest version it was overcooked by the third epoch, while in 21.8.4 I got a perfect Lora.

BrennenRB · 2023-11-14T16:49:18Z

I have been struggling with the faulty text encoder for the last few weeks, and was hoping that it would be fixed with the November 11 [v21.1.1] update, but that does not seem to be the case. I am still getting "Text encoder is same. Extract U-Net only." when extracting LoRAs. Is anyone else having this problem? Found workarounds? Know when it will be fixed?

i am using bmaltais GUI dev2 branch and working great SDXL training

Are you using Dreambooth or Finetune in the dev2 branch?

suede299 · 2023-11-15T01:22:51Z

I trained loha on sdxl with the last two updates, tried various parameters and always had a hard time getting satisfactory results, could be due to a couple things.

an optimizer that can automatically determine the learning rate and can't specify different learning rates for te and unet.
2.train network.py cannot set Stop text encoder training.
load LoRA network weights does not load Lycoris correctly.
sdxl's two text encoders are not separated.
Wanted to train the missing character expressions from the sdxl base model into the same lora, so chose loha, but couldn't actually get anything that worked at all.

DarkAlchy · 2023-12-14T02:43:06Z

error: unrecognized arguments: --train_text_encoder Apparently Kohya has removed this for 1.5 training and when the model for Dreambooth is only 2GB you know it does not have the TE when the model it trained from is 4.7GB.

FurkanGozukara · 2023-12-14T11:34:34Z

sd 1.5 trains by default TE

DarkAlchy · 2023-12-14T23:10:14Z

Didn't for me, but I don't use 1.5 since 2.0 was released, just had to use it to help LyCORIS test something.

kohya-ss · 2023-12-14T23:31:44Z

@DarkAlchy train_db.py trains Text Encoder by default, for SD 1.5 and 2.0/2.1, as FurkanGozukara said.

If you do not want to train Text Encode, please add an option --stop_text_encoder_training=-1.

DarkAlchy · 2023-12-14T23:46:16Z

@DarkAlchy train_db.py trains Text Encoder by default, for SD 1.5 and 2.0/2.1, as FurkanGozukara said.

If you do not want to train Text Encode, please add an option --stop_text_encoder_training=-1.

=-1? Alright.

mykeehu mentioned this issue Oct 5, 2023

stop_text_encoder_training support for LoRA network train bmaltais/kohya_ss#1543

Closed

AIEXAAA mentioned this issue Oct 16, 2023

Appeal for the Separation of SD 1.5 from SDXL bmaltais/kohya_ss#1401

Closed

araleza mentioned this issue Oct 21, 2023

--train_text_encoder in sdxl_train.py does literally NOTHING #890

Closed

Text encoder of SD 1.5 model is not trained which is not supposed to happen #855

Text encoder of SD 1.5 model is not trained which is not supposed to happen #855

Comments

FurkanGozukara commented Oct 4, 2023 • edited

FurkanGozukara commented Oct 4, 2023

kohya-ss commented Oct 4, 2023

FurkanGozukara commented Oct 4, 2023

kohya-ss commented Oct 4, 2023

FurkanGozukara commented Oct 4, 2023

FurkanGozukara commented Oct 4, 2023 • edited

bluvoll commented Oct 5, 2023

FurkanGozukara commented Oct 5, 2023 • edited

FurkanGozukara commented Oct 5, 2023

FurkanGozukara commented Oct 5, 2023

FurkanGozukara commented Oct 5, 2023

kohya-ss commented Oct 5, 2023

FurkanGozukara commented Oct 5, 2023

bluvoll commented Oct 5, 2023

AIEXAAA commented Oct 5, 2023 • edited

bluvoll commented Oct 5, 2023

timoshishi commented Oct 5, 2023

FurkanGozukara commented Oct 5, 2023

kohya-ss commented Oct 5, 2023

FurkanGozukara commented Oct 5, 2023 • edited

IdiotSandwichTheThird commented Oct 6, 2023

FurkanGozukara commented Oct 6, 2023

FurkanGozukara commented Oct 6, 2023

IdiotSandwichTheThird commented Oct 7, 2023 • edited

kohya-ss commented Oct 7, 2023

kohya-ss commented Oct 7, 2023

kohya-ss commented Oct 7, 2023

FurkanGozukara commented Oct 7, 2023

kohya-ss commented Oct 9, 2023

mykeehu commented Oct 14, 2023

AIEXAAA commented Oct 16, 2023 • edited

mykeehu commented Oct 20, 2023 • edited

AIEXAAA commented Oct 20, 2023

FurkanGozukara commented Oct 20, 2023

mykeehu commented Oct 20, 2023 • edited

FurkanGozukara commented Oct 20, 2023

AIEXAAA commented Oct 20, 2023 • edited

kohya-ss commented Oct 22, 2023

mykeehu commented Oct 22, 2023

WarAnakin commented Oct 28, 2023

FurkanGozukara commented Oct 28, 2023

WarAnakin commented Oct 28, 2023

BrennenRB commented Nov 14, 2023

FurkanGozukara commented Nov 14, 2023

mykeehu commented Nov 14, 2023

BrennenRB commented Nov 14, 2023

suede299 commented Nov 15, 2023

DarkAlchy commented Dec 14, 2023

FurkanGozukara commented Dec 14, 2023

DarkAlchy commented Dec 14, 2023

kohya-ss commented Dec 14, 2023

DarkAlchy commented Dec 14, 2023

FurkanGozukara commented Oct 4, 2023 •

edited

FurkanGozukara commented Oct 4, 2023 •

edited

FurkanGozukara commented Oct 5, 2023 •

edited

AIEXAAA commented Oct 5, 2023 •

edited

FurkanGozukara commented Oct 5, 2023 •

edited

IdiotSandwichTheThird commented Oct 7, 2023 •

edited

AIEXAAA commented Oct 16, 2023 •

edited

mykeehu commented Oct 20, 2023 •

edited

mykeehu commented Oct 20, 2023 •

edited

AIEXAAA commented Oct 20, 2023 •

edited