Question for Finetuning #36

xyIsHere · 2023-06-30T08:14:12Z

Dear author,

I tried to reproduce your work and I'm currently want to validate the result generated by the fine-tuned sd model. I used only one A100 gpu card for finetuning (currently at around 130 epoch) and test with the script "sr_val_ddpm_text_T_vqganfin_old.py" by reset the ckpt path and change the dec_w to 0.0. The rest result shows almost no difference with the input. But when I saw the training log, the validation results look pretty good. Do you have any idea for this issue?

Thanks a lot!

IceClear · 2023-06-30T10:19:43Z

Hi. Without training settings as well as the figures for comparison, it is hard to tell the problem. Maybe the test data distribution is very different from the validation one.

xyIsHere · 2023-06-30T10:43:15Z

Thanks for your very quick response. I got similar results as this issue #26.

The following zip file contains the training config. I just follow your suggestion to reset the model path.
2023-06-25T16-19-48-project.zip

And here is the results (right: result; left: input). The bottom one is a sample from validation set and it is also has been used for training. The above one did has some difference with the input, but the result is very far from the model you provided.

Thanks!

IceClear · 2023-06-30T11:45:17Z

Hi.
First, the input is different. My result is generated from a resized 128x128 image while it seems that you directly used the original 720x720 image.
Second, my results are generated with cfw weight = 0.5.
Third, my model is trained on 8 V100 GPUs, of which the training batch size, i.e., 48x4 should be much larger than yours, I guess.
You can train longer for better results, from my experience.

xyIsHere · 2023-07-03T08:33:14Z

Thanks! By the way, how long did you spend for the fine-tuning stage, maybe just around 24 hours?

IceClear · 2023-07-03T08:39:11Z

For several days. The longer the better. One week should be enough.

xyIsHere · 2023-07-03T10:36:09Z

Hi. First, the input is different. My result is generated from a resized 128x128 image while it seems that you directly used the original 720x720 image. Second, my results are generated with cfw weight = 0.5. Third, my model is trained on 8 V100 GPUs, of which the training batch size, i.e., 48x4 should be much larger than yours, I guess. You can train longer for better results, from my experience.

Thanks a lot for your help. Actually I did not train the cfw yet. Currently, I just want to make sure my fine-tuning results is reasonable. I think both of the resolution and the cfw weight should not be the reason.

As shown in the above figure, I use the same image as input and test with different fine-tuning model (stablesr_000117.ckpt that you provided and the model trained with 4 A100 cards with batch size of 12 and accumulated_grad_batches of 4). I finetune the model for about 24hrs and get the epoch_000131.ckpt).

The command that I used is "python scripts/sr_val_ddpm_text_T_vqganfin_old.py --config configs/stableSRNew/v2-finetune-test.yaml --ckpt "./pretrained_models/stablesr_000117.ckpt" --vqgan_ckpt "./pretrained_models/vqgan_cfw_00011.ckpt" --init-img ./inputs/test_example --outdir out_landscape/ --ddpm_steps 200 --dec_w 0.0 --suffix 'stablesr117'".

I set the dec_w to 0.0, so the result is achieved by only consider the fine-tuning without CFW. Did you have other suggestion for me to dubug? Or do you think I just need to train more days?

xyIsHere · 2023-07-03T10:39:49Z

For several days. The longer the better. One week should be enough.

So the provided model (stable_000117.ckpt) is achieved by fine-tuned for around a week? I thought 117 epoch model do not need to spend for so much time to get.

IceClear · 2023-07-03T11:32:13Z

The speed of A100 is more than 2x than V100, and I do not remember the exact training time of the 512 model.
It is hard to say whether there is a problem.
You may check the performance of different epochs on the real image.
The performance may vary for different epochs.
From my experience, training longer do improves the performance.

xyIsHere · 2023-07-03T12:41:16Z

Thank you so much. I will keep running the experiments and let you know if there is any update.

xyIsHere · 2023-07-04T06:55:45Z

Dear author,
Could you also show me how to use the thop package to print the params and flops of the stablesr. I tried to this use the test script (vqganfin_old.py) but not get succeed yet. Thanks!

xyIsHere · 2023-07-04T07:40:26Z

Dear author, Could you also show me how to use the thop package to print the params and flops of the stablesr. I tried to this use the test script (vqganfin_old.py) but not get succeed yet. Thanks!

I finally solved this problem.

xyIsHere · 2023-07-05T02:49:14Z

Dear author,
Here the training log is attached. I'm wondering if there is any difference with yours?
train_reproduce_4card_bs12.log

BobbyZ04 · 2023-07-06T04:00:58Z

Hi may I ask how did you generate the latent for the second stage training? it's supposed to be 4D? Cause I got the error saying dimension incorrect like this:

The latent shape I checked is:

This is where I generated them:

Thank you

BobbyZ04 · 2023-07-06T04:07:04Z

Thanks a lot!

xyIsHere · 2023-07-06T07:27:19Z

Hi may I ask how did you generate the latent for the second stage training? it's supposed to be 4D? Cause I got the error saying dimension incorrect like this:

The latent shape I checked is:

This is where I generated them:

Thank you

I currently only conducted the fine-tuning experiments and haven't trained the CFW since I found that my fine-tuning result is not good enough to train the CFW. How about your fine-tuning results? For training the CFW, I saw there is a issue #28 that might can help you.

BobbyZ04 · 2023-07-06T23:22:04Z

Hi may I ask how did you generate the latent for the second stage training? it's supposed to be 4D? Cause I got the error saying dimension incorrect like this:
The latent shape I checked is:
This is where I generated them:
Thank you

I currently only conducted the fine-tuning experiments and haven't trained the CFW since I found that my fine-tuning result is not good enough to train the CFW. How about your fine-tuning results? For training the CFW, I saw there is a issue #28 that might can help you.

I think they are making sense but yea different than the author's results hmm, maybe can try fix the seeds for inference to check if the finetuning is successful? https://huggingface.co/docs/diffusers/using-diffusers/reproducibility

xyIsHere · 2023-07-07T01:57:49Z

Hi may I ask how did you generate the latent for the second stage training? it's supposed to be 4D? Cause I got the error saying dimension incorrect like this:
The latent shape I checked is:
This is where I generated them:
Thank you

I currently only conducted the fine-tuning experiments and haven't trained the CFW since I found that my fine-tuning result is not good enough to train the CFW. How about your fine-tuning results? For training the CFW, I saw there is a issue #28 that might can help you.

I think they are making sense but yea different than the author's results hmm, maybe can try fix the seeds for inference to check if the finetuning is successful? https://huggingface.co/docs/diffusers/using-diffusers/reproducibility

I'm wondering if it is possible to share one example that you generated using only the fine-tuned model? Thanks a lot!

xyIsHere mentioned this issue Jul 6, 2023

Feature Mismatch for Replication #30

Closed

xyIsHere closed this as completed Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question for Finetuning #36

Question for Finetuning #36

xyIsHere commented Jun 30, 2023

IceClear commented Jun 30, 2023

xyIsHere commented Jun 30, 2023

IceClear commented Jun 30, 2023

xyIsHere commented Jul 3, 2023

IceClear commented Jul 3, 2023 •

edited

Loading

xyIsHere commented Jul 3, 2023

xyIsHere commented Jul 3, 2023

IceClear commented Jul 3, 2023

xyIsHere commented Jul 3, 2023

xyIsHere commented Jul 4, 2023

xyIsHere commented Jul 4, 2023

xyIsHere commented Jul 5, 2023

BobbyZ04 commented Jul 6, 2023

BobbyZ04 commented Jul 6, 2023

xyIsHere commented Jul 6, 2023

BobbyZ04 commented Jul 6, 2023

xyIsHere commented Jul 7, 2023

Question for Finetuning #36

Question for Finetuning #36

Comments

xyIsHere commented Jun 30, 2023

IceClear commented Jun 30, 2023

xyIsHere commented Jun 30, 2023

IceClear commented Jun 30, 2023

xyIsHere commented Jul 3, 2023

IceClear commented Jul 3, 2023 • edited Loading

xyIsHere commented Jul 3, 2023

xyIsHere commented Jul 3, 2023

IceClear commented Jul 3, 2023

xyIsHere commented Jul 3, 2023

xyIsHere commented Jul 4, 2023

xyIsHere commented Jul 4, 2023

xyIsHere commented Jul 5, 2023

BobbyZ04 commented Jul 6, 2023

BobbyZ04 commented Jul 6, 2023

xyIsHere commented Jul 6, 2023

BobbyZ04 commented Jul 6, 2023

xyIsHere commented Jul 7, 2023

IceClear commented Jul 3, 2023 •

edited

Loading