Support for Prodigy(Dadapt variety for Dylora) #585

sdbds · 2023-06-12T20:17:56Z

Details see here↓
facebookresearch/dadaptation#24

Paper: https://arxiv.org/pdf/2306.06101.pdf

official repository：https://github.com/konstmish/prodigy

kohya-ss · 2023-06-14T23:40:09Z

Thanks, looks very good! I will check it out when I have time!

fine_tune.py

jimtalksdata · 2023-07-11T15:55:03Z

Works well for LORAs on SDXL. Convergence to the optimal LR can be a bit slow (1000 steps) compared to DAdapt, or maybe it's just SDXL being big. Does not blow up compared to DAdapt though. Needs more testing.

Question: which version was implemented, Prodigy (2 in the paper) or Resetting (3)?

sdbds · 2023-07-11T17:50:29Z

Works well for LORAs on SDXL. Convergence to the optimal LR can be a bit slow (1000 steps) compared to DAdapt, or maybe it's just SDXL being big. Does not blow up compared to DAdapt though. Needs more testing.

Question: which version was implemented, Prodigy (2 in the paper) or Resetting (3)?

I would suggest modifying the value of d0 to accommodate SDXL(5e-7) as well as dylora(5e-4), which are models that require a larger initial learning rate.
you can see more experience in here
https://civitai.com/articles/1022/sdxl-trainingbdsqlsz-lora-training-advanced-tutorial2best-optimizerprodigy-is-all-you-need

FurkanGozukara · 2023-07-11T17:58:32Z

Works well for LORAs on SDXL. Convergence to the optimal LR can be a bit slow (1000 steps) compared to DAdapt, or maybe it's just SDXL being big. Does not blow up compared to DAdapt though. Needs more testing.

Question: which version was implemented, Prodigy (2 in the paper) or Resetting (3)?

how do you use the generated safetensors file? can you use with diffusers pipeline?

jimtalksdata · 2023-07-11T18:00:21Z

Works well for LORAs on SDXL. Convergence to the optimal LR can be a bit slow (1000 steps) compared to DAdapt, or maybe it's just SDXL being big. Does not blow up compared to DAdapt though. Needs more testing.
Question: which version was implemented, Prodigy (2 in the paper) or Resetting (3)?

I would suggest modifying the value of d0 to accommodate SDXL(5e-7) as well as dylora(5e-4), which are models that require a larger initial learning rate. you can see more experience in here https://civitai.com/articles/1022/sdxl-trainingbdsqlsz-lora-training-advanced-tutorial2best-optimizerprodigy-is-all-you-need

Thanks for the tutorial, good stuff. Could use a straightforward way to set d0 (the initial LR) if I know that the algorithm will just "waste time" at 1e-6 in the beginning.

Edit nvm I see, just add d0=(number) and d_coef=(number) to the args.

jimtalksdata · 2023-07-11T18:01:00Z

Works well for LORAs on SDXL. Convergence to the optimal LR can be a bit slow (1000 steps) compared to DAdapt, or maybe it's just SDXL being big. Does not blow up compared to DAdapt though. Needs more testing.
Question: which version was implemented, Prodigy (2 in the paper) or Resetting (3)?

how do you use the generated safetensors file? can you use with diffusers pipeline?

The LORA pipeline works with ComfyUI at the moment. Don't know about other impl.

sdbds · 2023-07-11T18:10:28Z

Works well for LORAs on SDXL. Convergence to the optimal LR can be a bit slow (1000 steps) compared to DAdapt, or maybe it's just SDXL being big. Does not blow up compared to DAdapt though. Needs more testing.
Question: which version was implemented, Prodigy (2 in the paper) or Resetting (3)?

I would suggest modifying the value of d0 to accommodate SDXL(5e-7) as well as dylora(5e-4), which are models that require a larger initial learning rate. you can see more experience in here https://civitai.com/articles/1022/sdxl-trainingbdsqlsz-lora-training-advanced-tutorial2best-optimizerprodigy-is-all-you-need

Thanks for the tutorial, good stuff. Could use a straightforward way to set d0 (the initial LR) if I know that the algorithm will just "waste time" at 1e-6 in the beginning.

Works well for LORAs on SDXL. Convergence to the optimal LR can be a bit slow (1000 steps) compared to DAdapt, or maybe it's just SDXL being big. Does not blow up compared to DAdapt though. Needs more testing.

Question: which version was implemented, Prodigy (2 in the paper) or Resetting (3)?

official only 2,no resetting code
i written 3 but not test it now

# Algorithm 3 D-Adaptation with Resetting[^3^][3]
# Input: d0 > 0, x0, G[^4^][4][^5^][5]
# Initialize variables
d = d0 # initial distance estimate[^6^][6]
x = x0 # initial point
s = 0 # gradient sum
r = 0 # reset counter
k = 0 # iteration counter within a reset
x0_r = x0 # initial point after a reset
s_r = 0 # gradient sum after a reset

# Loop until the maximum number of iterations is reached
for j in range(n):
    # Compute the gradient at the current point
    g_r_k = f.gradient(x)
    # Update the gradient sum
    s_r_k_plus_1 = s_r + g_r_k
    # Compute the step size using D-Adaptation formula
    gamma_r_k_plus_1 = np.sqrt(d / (G**2 + np.sum(s_r_k_plus_1**2)))
    # Update the point using gradient descent
    x_j_plus_1 = x_r_k_plus_1 = x0_r - gamma_r_k_plus_1 * s_r_k_plus_1
    # Compute the new distance estimate using D-Adaptation formula
    d_hat_r_k_plus_1 = (gamma_r_k_plus_1 * np.linalg.norm(s_r_k_plus_1)**2 - np.sum(gamma * g_r_k**2)) / (2 * np.linalg.norm(s_r_k_plus_1))
    # Increment the iteration counter
    k += 1[^7^][7]
    # Check if the distance estimate has increased by a factor of 2 or more
    if d_hat_r_k_plus_1 > 2 * d:
        # Update the distance estimate to the new value
        d = d_hat_r_k_plus_1
        # Reset the initial point, gradient sum and iteration counter to the current values
        x0_r = x_r_k_plus_1
        s_r = 0[^8^][8]
        k = 0[^1^][1][^2^][2]
        # Increment the reset counter
        r += 1

# Return the average of all points visited
x_bar_n = np.mean(x, axis=0)

FurkanGozukara · 2023-07-11T18:52:15Z

Works well for LORAs on SDXL. Convergence to the optimal LR can be a bit slow (1000 steps) compared to DAdapt, or maybe it's just SDXL being big. Does not blow up compared to DAdapt though. Needs more testing.
Question: which version was implemented, Prodigy (2 in the paper) or Resetting (3)?

how do you use the generated safetensors file? can you use with diffusers pipeline?

The LORA pipeline works with ComfyUI at the moment. Don't know about other impl.

can you share example json file please

jimtalksdata · 2023-07-11T22:48:10Z

Something like this? This uses the "new" refiner workflow.

I assume you can also train a LORA for the refiner, but unsure of the purpose or how to use it, or what sort of training set would you use. Thus, this is also likely not the correct final workflow.

https://pastebin.com/j6LnygzJ

FurkanGozukara · 2023-07-13T18:26:05Z

Works well for LORAs on SDXL. Convergence to the optimal LR can be a bit slow (1000 steps) compared to DAdapt, or maybe it's just SDXL being big. Does not blow up compared to DAdapt though. Needs more testing.
Question: which version was implemented, Prodigy (2 in the paper) or Resetting (3)?

I would suggest modifying the value of d0 to accommodate SDXL(5e-7) as well as dylora(5e-4), which are models that require a larger initial learning rate. you can see more experience in here https://civitai.com/articles/1022/sdxl-trainingbdsqlsz-lora-training-advanced-tutorial2best-optimizerprodigy-is-all-you-need

hlelo so many parameters are missing here

can you share full command like below?

I tried like this it didnt work

accelerate launch --num_cpu_threads_per_process=2 "./sdxl_train_network.py" --enable_bucket --pretrained_model_name_or_path="F:/0 models/sd_xl_base_0.9.safetensors" --train_data_dir="F:\sdxl_lora\img" --reg_data_dir="F:\sdxl_lora\reg" --resolution="1024,1024" --output_dir="F:\sdxl_lora\model" --logging_dir="F:\sdxl_lora\log" --network_alpha="1" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=0.0004 --unet_lr=0.0004 --network_dim=256 --output_name="test10" --lr_scheduler_num_cycles="8" --no_half_vae --learning_rate="0.0004" --lr_scheduler="constant" --train_batch_size="1" --max_train_steps="5200" --save_every_n_epochs="1" --mixed_precision="bf16" --save_precision="bf16" --cache_latents --cache_latents_to_disk --optimizer_type="Prodigy" --optimizer_args scale_parameter=False relative_step=False warmup_init=False scale_v_pred_loss_like_noise_pred=False --max_data_loader_n_workers="0" --bucket_reso_steps=64 --gradient_checkpointing --xformers --bucket_no_upscale

sdbds and others added 16 commits May 8, 2023 02:19

Update train_util.py for DAdaptLion

47ca358

Update train_README-zh.md for dadaptlion

7ea8da5

Update train_README-ja.md for DAdaptLion

9040b11

add DAdatpt V3

8e0ebc7

Alignment

557fd11

Update train_util.py for experimental

a47bfc0

Update train_util.py V3

3a984e5

Merge branch 'kohya-ss:main' into DAdapt

fdfe6db

Update train_README-zh.md

a2b3f85

Update train_README-ja.md

8be7374

Update train_util.py fix

342e253

Update train_util.py

522eeea

Merge branch 'kohya-ss:main' into DAdapt

61d511c

Merge branch 'kohya-ss:main' into DAdapt

2237c14

support Prodigy

7e185f8

add lower

acf8a1f

kohya-ss changed the base branch from main to dev June 15, 2023 12:12

kohya-ss merged commit e97d67a into kohya-ss:dev Jun 15, 2023

bmaltais mentioned this pull request Jun 17, 2023

v21.7.8 bmaltais/kohya_ss#1003

Merged

Manto reviewed Jun 23, 2023

View reviewed changes

fine_tune.py Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for Prodigy(Dadapt variety for Dylora) #585

Support for Prodigy(Dadapt variety for Dylora) #585

sdbds commented Jun 12, 2023

kohya-ss commented Jun 14, 2023

jimtalksdata commented Jul 11, 2023 •

edited

Loading

sdbds commented Jul 11, 2023 •

edited

Loading

FurkanGozukara commented Jul 11, 2023

jimtalksdata commented Jul 11, 2023 •

edited

Loading

jimtalksdata commented Jul 11, 2023

sdbds commented Jul 11, 2023

FurkanGozukara commented Jul 11, 2023

jimtalksdata commented Jul 11, 2023 •

edited

Loading

FurkanGozukara commented Jul 13, 2023

Support for Prodigy(Dadapt variety for Dylora) #585

Support for Prodigy(Dadapt variety for Dylora) #585

Conversation

sdbds commented Jun 12, 2023

kohya-ss commented Jun 14, 2023

jimtalksdata commented Jul 11, 2023 • edited Loading

sdbds commented Jul 11, 2023 • edited Loading

FurkanGozukara commented Jul 11, 2023

jimtalksdata commented Jul 11, 2023 • edited Loading

jimtalksdata commented Jul 11, 2023

sdbds commented Jul 11, 2023

FurkanGozukara commented Jul 11, 2023

jimtalksdata commented Jul 11, 2023 • edited Loading

FurkanGozukara commented Jul 13, 2023

jimtalksdata commented Jul 11, 2023 •

edited

Loading

sdbds commented Jul 11, 2023 •

edited

Loading

jimtalksdata commented Jul 11, 2023 •

edited

Loading

jimtalksdata commented Jul 11, 2023 •

edited

Loading