New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add check for 0 to 1 inclusive for elements of target tensor in BCE loss #97814
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/97814
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ No FailuresAs of commit aebb9ac: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually, I see several nn tests failing - it looks like some of our tests pass > 1 target values through binary_cross_entropy
. cc @mikaylagawarecki ? (let me know if I should cc someone else).
@bdhirsh thanks for the review! I was also wondering about those tests but haven't had time to circle back (and wasn't sure who to ping 🙂 ). Looking forward to hearing thoughts on how/if we want to handle those! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @kiersten-stokes, thanks for sending this fix!
The tests failures arise because of tests where target
was erroneously not constrained in [0, 1]
. For most of the target outputs, we ensure this using torch.randn(...).gt(0).double()
but this was missed for these two tests here and here). Fixing these should resolve the errors.
Additionally, we should also add the check on CUDA
@mikaylagawarecki I really appreciate you pointing me to the relevant tests! It seems embarrassingly simple now that I see it - I guess I'm still learning how to comb my way through the logs efficiently 🙈
Done! Will stand by for CI failures (apologies, I have no way to run CUDA-based tests at the moment) and circle back should there be any failures similar to the above! |
@kiersten-stokes Of course, happy to help! Looking at the logs, it looks like I might have missed another one where target is not correctly constrained: https://github.com/pytorch/pytorch/blob/master/test/test_nn.py#L8923 It's hard to tell if there are more because CI jobs terminate when a test fail :( But you could try running tests on CPU, Let me know if you have any questions about local testing! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@mikaylagawarecki thanks again for your help! |
Merge failedReason: 1 jobs have failed, first few of them are: trunk / linux-focal-rocm5.4.2-py3.8 / test (default, 1, 3, linux.rocm.gpu) Details for Dev Infra teamRaised by workflow job |
50f589c
to
a5e64ba
Compare
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…oss (#97814) TODO for @mikaylagawarecki : add BC breaking description Fixes #87373 Pull Request resolved: #97814 Approved by: https://github.com/mikaylagawarecki
Fixes #87373
cc @albanD @mruberry @jbschlosser @walterddr @mikaylagawarecki