Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loss #26

Closed
ldlshizhu opened this issue Apr 5, 2021 · 4 comments
Closed

loss #26

ldlshizhu opened this issue Apr 5, 2021 · 4 comments

Comments

@ldlshizhu
Copy link

Hello, thank you for your contribution.
I changed batchsize to 1 (in training.yml), and used two GPUs to run, but the loss was about 1200 in the first two epochs, and turned into INF in the third epoch. Could you please tell me why? (I am running debluring)

@swz30
Copy link
Owner

swz30 commented Apr 5, 2021

Hi @ldlshizhu

There is no need to use two GPUs for a batch size of 1.

I would suggest trying a low initial learning rate, e.g., 1.5e-4

@swz30 swz30 closed this as completed Apr 6, 2021
@swz30 swz30 reopened this Apr 6, 2021
@swz30 swz30 closed this as completed Apr 6, 2021
@ldlshizhu
Copy link
Author

Hello, thank you for your patience!
Your suggestion is very effective for me. After changing the learning rate to 1.5e-4, the loss was about 100 in the first epoch and dropped to 70 in the 69th epoch, but it suddenly became 800,000 in the 70th epoch! I don't know what happened…

@swz30 swz30 reopened this Apr 7, 2021
@swz30
Copy link
Owner

swz30 commented Apr 7, 2021

Hi @ldlshizhu

If you run codes with the default settings, there should not be an issue.

Try even lower learning rate: 1e-4.

@ldlshizhu
Copy link
Author

Hi @ldlshizhu

If you run codes with the default settings, there should not be an issue.

Try even lower learning rate: 1e-4.

Thank you very much! I'll try again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants