Better implementation for te autocast #895

KohakuBlueleaf · 2023-10-24T14:56:33Z

When we disable the training for TE, we will not prepare it and in this case we will need explicit convert it to the target dtype (or it will remain in fp32 which may not the expected behavior)

So basically I do 2 things here:
1, explicitly convert the TE to target dtype/device when it is not be trained
2, explicitly add autocast for TE since sometime it may not be prepare. (or definitely will not be prepared eg: cached TE)

kohya-ss · 2023-10-24T23:50:35Z

Thank you for this! It seems to be very good. I will review and merge sooner.

Perhaps, when we cache Text Encoder outputs, another way might be fine to change the dtype of Text Encoders to fp16/bf16 in advance...

KohakuBlueleaf · 2023-10-25T03:08:12Z

Thank you for this! It seems to be very good. I will review and merge sooner.

Perhaps, when we cache Text Encoder outputs, another way might be fine to change the dtype of Text Encoders to fp16/bf16 in advance...

Yeah I also add "te.to(weight_dtype)" too

KohakuBlueleaf · 2023-10-25T03:15:45Z

@kohya-ss Oh I got what you said.
Basically we cache TE with weight_dtype is better?
I also found that sdxl_train.py cache TE in fp32.
I think this is bad (since cached context is actually huge, and will slow down the caching process)

KohakuBlueleaf · 2023-10-25T03:22:22Z

@kohya-ss I added autocast TE caching into sdxl_train.py and also add seperated LR settings for TE in sdxl_train.py which is requested by @Linaqruf

kohya-ss · 2023-10-25T03:27:04Z

Basically we cache TE with weight_dtype is better?

I think so. In my understanding, it is same to apply autocast.
(However, we convert back TEs to float32 and move to cpu for sampling images.)

@kohya-ss I added autocast TE caching into sdxl_train.py and also add seperated LR settings for TE in sdxl_train.py which is requested by @Linaqruf

This is really nice!

KohakuBlueleaf · 2023-10-25T03:31:20Z

Basically we cache TE with weight_dtype is better?

I think so. In my understanding, it is same to apply autocast. (However, we convert back TEs to float32 and move to cpu for sampling images.)

@kohya-ss I added autocast TE caching into sdxl_train.py and also add seperated LR settings for TE in sdxl_train.py which is requested by @Linaqruf

This is really nice!

autocast and changing dtype is actually different.
autocast will convert your weight into target dtype(if backend accept it) and then do computation.
So convert weight directly still need autocast to ensure all the operation can run normally

kohya-ss · 2023-10-25T04:01:23Z

autocast and changing dtype is actually different.
autocast will convert your weight into target dtype(if backend accept it) and then do computation.
So convert weight directly still need autocast to ensure all the operation can run normally

hmm thank you for clarification. When generating images, we call the model converted to float16 or bfloat16 directly, so I thought there would be no difference, but it is better to use autocast.

Linaqruf · 2023-10-25T04:53:45Z

@kohya-ss I added autocast TE caching into sdxl_train.py and also add seperated LR settings for TE in sdxl_train.py which is requested by @Linaqruf

Nice 😆

kohya-ss · 2023-10-25T13:27:49Z

When I apply this PR and train with sdxl_train.py including Text Encoder, it seems that neither U-Net nor Text Encoder is trained. When training only U-Net, there is no problem.

I will investigate further. But I think that perhaps multiple models may not work when specified like: {"params": ..., "lr": ...}...

KohakuBlueleaf · 2023-10-25T13:29:19Z

When I apply this PR and train with sdxl_train.py including Text Encoder, it seems that neither U-Net nor Text Encoder is trained. When training only U-Net, there is no problem.

I will investigate further. But I think that perhaps multiple models may not work when specified like: {"params": ..., "lr": ...}...

Ok this makes sense
Will also check that!

KohakuBlueleaf · 2023-10-25T14:29:30Z

@kohya-ss it is weird.
I checked my last training with this fork
it is definitely learning things.

but it cannot run fp16 (full fp16 not work, full bf16 work)
said

AssertionError: No inf checks were recorded for this optimizer.

araleza · 2023-10-25T21:04:31Z

Hi, I think you may need to make the fix to the Text Encoder learning rate code in the --block_lr section that I found here:

#890 (comment)

KohakuBlueleaf · 2023-10-26T01:53:21Z

@araleza @kohya-ss This is weird for me
In my implementation, both TE and UNet only put their generator into the params dict
But When I tried to train it, It can definitely learn things. (Expected: don't learn anything since all the param generator are used)

KohakuBlueleaf · 2023-10-26T01:54:38Z

But I still made a fix with it

KohakuBlueleaf · 2023-10-26T02:03:51Z

@kohya-ss I also add a manual timeout settings for DDP since for some large dataset with multi gpu training. It is very likely to runout the default timeout(30min) when caching the latents or TextEncoder output.
(For example: the dataset I'm using need literally 3days to cache the Latents)

FurkanGozukara · 2023-10-27T21:41:21Z

when we can expect this to be merged thank you so much

kohya-ss · 2023-10-28T06:48:57Z

@KohakuBlueleaf
Thanks for the various updates! The script seems to be working fine.

I would like to add some features after the merge, such as specifying independent learning rates for Text Encoder 1 and 2, excluding from the optimizing parameters for models with 0 specified for the learning rate, etc.

FurkanGozukara · 2023-10-28T10:34:44Z

@KohakuBlueleaf Thanks for the various updates! The script seems to be working fine.

I would like to add some features after the merge, such as specifying independent learning rates for Text Encoder 1 and 2, excluding from the optimizing parameters for models with 0 specified for the learning rate, etc.

so this will support both SD 1.5 and SDXL?

Fix setup logic... again

KohakuBlueleaf added 3 commits October 24, 2023 22:44

Better implementation for te

9e4e074

Fix some misunderstanding

d0135bb

as same as unet, add explicit convert

8dd86a9

Better cache TE and TE lr

dea6597

kohya-ss mentioned this pull request Oct 25, 2023

--train_text_encoder in sdxl_train.py does literally NOTHING #890

Closed

Fix with list

c951ab6

Add timeout settings

4df9411

Fix arg style

58157db

kohya-ss merged commit 1cefb2a into kohya-ss:dev Oct 28, 2023
1 check passed

wkpark pushed a commit to wkpark/sd-scripts that referenced this pull request Feb 27, 2024

Merge pull request kohya-ss#895 from bmaltais/dev2

89bd298

Fix setup logic... again

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better implementation for te autocast #895

Better implementation for te autocast #895

KohakuBlueleaf commented Oct 24, 2023

kohya-ss commented Oct 24, 2023

KohakuBlueleaf commented Oct 25, 2023

KohakuBlueleaf commented Oct 25, 2023

KohakuBlueleaf commented Oct 25, 2023

kohya-ss commented Oct 25, 2023

KohakuBlueleaf commented Oct 25, 2023

kohya-ss commented Oct 25, 2023

Linaqruf commented Oct 25, 2023

kohya-ss commented Oct 25, 2023

KohakuBlueleaf commented Oct 25, 2023

KohakuBlueleaf commented Oct 25, 2023

araleza commented Oct 25, 2023

KohakuBlueleaf commented Oct 26, 2023

KohakuBlueleaf commented Oct 26, 2023

KohakuBlueleaf commented Oct 26, 2023

FurkanGozukara commented Oct 27, 2023

kohya-ss commented Oct 28, 2023

FurkanGozukara commented Oct 28, 2023

Better implementation for te autocast #895

Better implementation for te autocast #895

Conversation

KohakuBlueleaf commented Oct 24, 2023

kohya-ss commented Oct 24, 2023

KohakuBlueleaf commented Oct 25, 2023

KohakuBlueleaf commented Oct 25, 2023

KohakuBlueleaf commented Oct 25, 2023

kohya-ss commented Oct 25, 2023

KohakuBlueleaf commented Oct 25, 2023

kohya-ss commented Oct 25, 2023

Linaqruf commented Oct 25, 2023

kohya-ss commented Oct 25, 2023

KohakuBlueleaf commented Oct 25, 2023

KohakuBlueleaf commented Oct 25, 2023

araleza commented Oct 25, 2023

KohakuBlueleaf commented Oct 26, 2023

KohakuBlueleaf commented Oct 26, 2023

KohakuBlueleaf commented Oct 26, 2023

FurkanGozukara commented Oct 27, 2023

kohya-ss commented Oct 28, 2023

FurkanGozukara commented Oct 28, 2023