Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check for 0 to 1 inclusive for elements of target tensor in BCE loss #97814

Closed
wants to merge 5 commits into from

Conversation

kiersten-stokes
Copy link
Contributor

@kiersten-stokes kiersten-stokes commented Mar 28, 2023

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 28, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/97814

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

✅ No Failures

As of commit aebb9ac:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Copy link
Contributor

@bdhirsh bdhirsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

Copy link
Contributor

@bdhirsh bdhirsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, I see several nn tests failing - it looks like some of our tests pass > 1 target values through binary_cross_entropy. cc @mikaylagawarecki ? (let me know if I should cc someone else).

@kiersten-stokes
Copy link
Contributor Author

actually, I see several nn tests failing - it looks like some of our tests pass > 1 target values through binary_cross_entropy. cc @mikaylagawarecki ? (let me know if I should cc someone else).

@bdhirsh thanks for the review! I was also wondering about those tests but haven't had time to circle back (and wasn't sure who to ping 🙂 ). Looking forward to hearing thoughts on how/if we want to handle those!

@mikaylagawarecki mikaylagawarecki added the topic: bc breaking topic category label Mar 31, 2023
Copy link
Contributor

@mikaylagawarecki mikaylagawarecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @kiersten-stokes, thanks for sending this fix!

The tests failures arise because of tests where target was erroneously not constrained in [0, 1]. For most of the target outputs, we ensure this using torch.randn(...).gt(0).double() but this was missed for these two tests here and here). Fixing these should resolve the errors.

Additionally, we should also add the check on CUDA

@mikaylagawarecki mikaylagawarecki added module: nn Related to torch.nn triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Mar 31, 2023
@mikaylagawarecki mikaylagawarecki self-assigned this Mar 31, 2023
@kiersten-stokes
Copy link
Contributor Author

The tests failures arise because of tests where target was erroneously not constrained in [0, 1]. For most of the target outputs, we ensure this using torch.randn(...).gt(0).double() but this was missed for these two tests here and here). Fixing these should resolve the errors.

@mikaylagawarecki I really appreciate you pointing me to the relevant tests! It seems embarrassingly simple now that I see it - I guess I'm still learning how to comb my way through the logs efficiently 🙈

Additionally, we should also add the check on CUDA

Done! Will stand by for CI failures (apologies, I have no way to run CUDA-based tests at the moment) and circle back should there be any failures similar to the above!

@mikaylagawarecki
Copy link
Contributor

@kiersten-stokes Of course, happy to help!

Looking at the logs, it looks like I might have missed another one where target is not correctly constrained: https://github.com/pytorch/pytorch/blob/master/test/test_nn.py#L8923

It's hard to tell if there are more because CI jobs terminate when a test fail :( But you could try running tests on CPU, python test/test_nn.py, you can also try to filter tests by substring with the -k option e.g. python test/test_nn.py -k cross_entropy but depending on the test naming, that might not be a catch-all.

Let me know if you have any questions about local testing!

Copy link
Contributor

@mikaylagawarecki mikaylagawarecki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@mikaylagawarecki mikaylagawarecki added the release notes: nn release notes category label Apr 4, 2023
@mikaylagawarecki
Copy link
Contributor

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Apr 4, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@kiersten-stokes
Copy link
Contributor Author

Thanks!

@mikaylagawarecki thanks again for your help!

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-focal-rocm5.4.2-py3.8 / test (default, 1, 3, linux.rocm.gpu)

Details for Dev Infra team Raised by workflow job

@mikaylagawarecki
Copy link
Contributor

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged module: nn Related to torch.nn open source release notes: nn release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Check on target being between 0-1 for binary cross entropy loss
5 participants