During training, loss=nan and a broken LoRA is generated #201

acncagua · 2023-02-17T06:45:43Z

Learning LoRA with the following parameters results in loss=nan.
The resulting LoRA file is corrupt
Is there anything I can do to improve it?
xformers and others are recommended

accelerate launch --num_cpu_threads_per_process 1 train_network.py
--pretrained_model_name_or_path=J:\stable-diffusion-webui\models\Stable-diffusion\zmodels_0_marge_source\NAIbasil.safetensors
--train_data_dir=J:\sd-scripts\training
--output_dir=J:\sd-scripts\output
--reg_data_dir=J:\sd-scripts\seisoku
--resolution=512,512
--train_batch_size=6
--unet_lr=5e-5
--text_encoder_lr=5e-3
--max_train_epochs=10
--save_every_n_epochs=1
--save_model_as=safetensors
--clip_skip=2
--seed=42
--color_aug
--min_bucket_reso=320
--max_bucket_reso=1024
--network_module=networks.lora
--lr_scheduler=cosine_with_restarts
--lr_warmup_steps=500
--keep_tokens=2
--shuffle_caption
--network_dim=128
--network_alpha=64
--enable_bucket
--mixed_precision=fp16
--xformers
--use_8bit_adam
--lr_scheduler_num_cycles=4
--caption_extension=.txt
--persistent_data_loader_workers
--bucket_no_upscale
--caption_dropout_rate=0.05

abhiishekpal · 2023-02-19T12:41:16Z

@acncagua Did you face the issue even with a much lower learning rate?

FlyHighest · 2023-02-21T07:04:03Z

I solve this issue by disabling xformers during training.

zx96-001 · 2023-05-11T14:18:54Z

where can i find naibasil.safetensor ? wanted to check it out, and I see it's mentioned in the thread, sorry for the comment is not really related to your problem ^^

suzhenghang mentioned this issue Apr 10, 2023

About step_loss of version2 ExponentialML/Text-To-Video-Finetuning#46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

During training, loss=nan and a broken LoRA is generated #201

During training, loss=nan and a broken LoRA is generated #201

acncagua commented Feb 17, 2023

abhiishekpal commented Feb 19, 2023 •

edited

Loading

FlyHighest commented Feb 21, 2023

zx96-001 commented May 11, 2023

During training, loss=nan and a broken LoRA is generated #201

During training, loss=nan and a broken LoRA is generated #201

Comments

acncagua commented Feb 17, 2023

Learning LoRA with the following parameters results in loss=nan. The resulting LoRA file is corrupt Is there anything I can do to improve it? xformers and others are recommended

abhiishekpal commented Feb 19, 2023 • edited Loading

FlyHighest commented Feb 21, 2023

zx96-001 commented May 11, 2023

Learning LoRA with the following parameters results in loss=nan.
The resulting LoRA file is corrupt
Is there anything I can do to improve it?
xformers and others are recommended

abhiishekpal commented Feb 19, 2023 •

edited

Loading