Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reland: Make torch.empty* deterministic by filling with NaN or max int #104995

Closed

Conversation

kurtamohler
Copy link
Collaborator

@kurtamohler kurtamohler commented Jul 11, 2023

Relands #101849 after #104302 reverted it.

torchrec PR pytorch/torchrec#1269 fixes the torchrec failure that caused #101849 to be reverted

Part of #82004

cc @mruberry

@kurtamohler kurtamohler added module: determinism ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR labels Jul 11, 2023
@kurtamohler kurtamohler requested a review from albanD July 11, 2023 18:47
@pytorch-bot
Copy link

pytorch-bot bot commented Jul 11, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/104995

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit b214d41:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/mps Run MPS tests (subset of trunk) release notes: mps Release notes category labels Jul 11, 2023
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pushing the fix on the torchrec side!

@kurtamohler
Copy link
Collaborator Author

Hey @albanD, if possible, would you be able to import this PR and run the internal torchrec tests, just to make sure they pass before we merge it?

@albanD
Copy link
Collaborator

albanD commented Jul 11, 2023

Sure, I'll do a rebase because there was a revert in trunk responsible for the current CI failure.

@albanD
Copy link
Collaborator

albanD commented Jul 11, 2023

@pytorchbot -r

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 11, 2023

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: -r

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci} ...

Try @pytorchbot --help for more info.

@albanD
Copy link
Collaborator

albanD commented Jul 11, 2023

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Tried to rebase and push PR #104995, but it was already up to date. Try rebasing against main by issuing:
@pytorchbot rebase -b main

@albanD
Copy link
Collaborator

albanD commented Jul 11, 2023

@pytorchbot rebase -b main

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased empty-deterministic-reland-0 onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout empty-deterministic-reland-0 && git pull --rebase)

@facebook-github-bot
Copy link
Contributor

@albanD has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@albanD
Copy link
Collaborator

albanD commented Jul 12, 2023

There seems to be other issues where code expects empty() to return 0s :(
Working on fixing these call sites!

@kurtamohler
Copy link
Collaborator Author

Darn, that's unfortunate. Thanks for the help @albanD, let me know if there's anything I can do

@albanD
Copy link
Collaborator

albanD commented Jul 12, 2023

First change seems to fix some. Waiting on CI to run again.

@albanD
Copy link
Collaborator

albanD commented Jul 13, 2023

@pytorchbot merge -r

Ok I landed internal fixes. This should be green internally now!

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased empty-deterministic-reland-0 onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout empty-deterministic-reland-0 && git pull --rebase)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@kurtamohler
Copy link
Collaborator Author

Awesome, thank you @albanD!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/mps Run MPS tests (subset of trunk) ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/trunk Trigger trunk jobs on your pull request Merged module: determinism open source release notes: mps Release notes category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants