New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About training #7
Comments
Do you mean that training SwinIR (middle size, dim=180) sometime have doubled loss? I don't think it is a problem. When some images in a training patch is hard to reconstruct, the loss of that batch would be large. If you look at the PSNR on the validation set, the model converges smoothly. In our implementation, all settings are similar to CNN-based SR models and we did not use any special tricks. Detailed settings can be found in the supplementary. Training codes will be released in KAIR in a few days. |
I don’t mean that the loss of a batch has suddenly doubled, but the average loss of 100 batches has doubled, and psnr will also drop close to 1db and recover after a few epochs. |
Thank you, I will check my code and explore whether the slight parameter difference will have a big impact。 |
It depends on what you want. If you just care about PSNR, using pixel loss for training is enough (first stage). PSNR provides a good quantitative metric for comparing different methods, but pixel loss often does not have a good visual quality. If you want better visual quality, you should fine-tine the model from the first stage using a combination of pixel loss, perceptual loss and GAN loss, but it will decrease the PSNR. By the way, I might know why you suffer from a sudden large drop of PSNR. GAN training is very unstable. Generally you should fine-tune the model from the first stage, instead of training it from scratch. The EMA strategy can also help stabilize the convergence. Note that PSNR is also not a good metric when you are training towards good visual quality. |
Hi, @JingyunLiang Thanks for the loss plot. May I ask based on your experience, is it normal for a transformer framework that the training loss oscillates seriously? I am currently training a transformer and the loss seems just fluctuate repeatedly and there is no trend of convergence. So do you think this is a normal phenomenon for most transformers? Thanks very much. |
I don't think so. There is no such problems as you can see in Fig.3 (f) of the paper. By the way, our training code will be released in 1-2 days. Please use that for training. Thank you. |
Yes, I see. Fig. 3(f) is the PSNR plot. Just as shown in your earlier reply, the PSNR is stable. But the L1 loss oscillates. I am confused about the fluctuation of the L1 loss. Thanks a lot. @JingyunLiang |
Thanks so much for your explanation! In this case, have you tried to increase the batch size to reduce this training loss fluctuation? Does a large batch size alleviate this instability? |
No, I always use batch_size=32 (so we only need 500k iterations). You can try it later on our training code. |
I see. Thank you very much. I think 32 is large enough for the image-to-image task due to the huge cost of the transformer. |
@z625715875 @shengkelong @hcleung3325 We have release the SwinIR training code at KAIR. We also add an interactive online Colab demo for real-world image SR. |
Feel free to open it if you have more questions. |
Thank you for your work. I tried to train SwinIR but in my process of training swinir, I found that although the small swinir training is smooth, the loss often suddenly doubles when dim change to 180. Because of the memory problem, my batch_zie=16 lr=1e-4, may I have any special skills to let Is the training stable?
The text was updated successfully, but these errors were encountered: