Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SILog NaN for higher batch size #54

Open
ariqshadi opened this issue Jul 26, 2023 · 1 comment
Open

SILog NaN for higher batch size #54

ariqshadi opened this issue Jul 26, 2023 · 1 comment

Comments

@ariqshadi
Copy link

Hi, so in my case is that i want to use multiple GPU (4 GPU) using batch size of 8, it works well.
but then i see that each GPU only utilized half of its capacity, then i tried to increase my batch size to 12.
but the same error always reappear not OOM but "SILog is NAN stopping training".
do you know why this is happening? or has anyone encounter similar problem?

@ariqshadi
Copy link
Author

Ive found the root cause to this problem. so at some part of the label images, all the pixel is 0 (meaning that the Log Loss would always be 0). eliminating these labels fixed this issue.
tl;dr
caused by uncleaned dataset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant