Skip to content

Conversation

@desertfire
Copy link
Contributor

@desertfire desertfire commented Feb 8, 2023

Stack from ghstack (oldest at bottom):

Summary: It looks like setting torch.backends.cudnn.deterministic to
True is not enough for eliminating non-determinism when testing
benchmarks with --accuracy, so let's turn off cudnn completely.
With this change, mobilenet_v3_large does not show random failure on my
local environment. Also take this chance to clean up CI skip lists.

cc @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx

Summary: It looks like setting torch.backends.cudnn.deterministic to
True is not enough for eliminating non-determinism when testing
benchmarks with --accuracy, so let's turn off cudnn completely.
With this change, mobilenet_v3_large does not show random failure on my
local environment. Also take this chance to clean up CI skip lists.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented Feb 8, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/94363

Note: Links to docs will display an error until the docs builds have been completed.

❗ 2 Active SEVs

There are 2 currently active SEVs. If your PR is affected, please view them below:

❌ 3 Failures

As of commit e2cd8a6:

NEW FAILURES - The following jobs have failed:

BROKEN TRUNK - The following jobs failed but were present on the merge base 76ed1a8:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

desertfire added a commit that referenced this pull request Feb 8, 2023
Summary: It looks like setting torch.backends.cudnn.deterministic to
True is not enough for eliminating non-determinism when testing
benchmarks with --accuracy, so let's turn off cudnn completely.
With this change, mobilenet_v3_large does not show random failure on my
local environment. Also take this chance to clean up CI skip lists.

ghstack-source-id: c2c353c
Pull Request resolved: #94363
@desertfire desertfire added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Feb 8, 2023
@desertfire desertfire added the topic: not user facing topic category label Feb 8, 2023
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, but cc @kurtamohler, it really should be deterministic...

…uracy"

Summary: It looks like setting torch.backends.cudnn.deterministic to
True is not enough for eliminating non-determinism when testing
benchmarks with --accuracy, so let's turn off cudnn completely.
With this change, mobilenet_v3_large does not show random failure on my
local environment. Also take this chance to clean up CI skip lists.

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

[ghstack-poisoned]
…uracy"

Summary: It looks like setting torch.backends.cudnn.deterministic to
True is not enough for eliminating non-determinism when testing
benchmarks with --accuracy, so let's turn off cudnn completely.
With this change, mobilenet_v3_large does not show random failure on my
local environment. Also take this chance to clean up CI skip lists.

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

[ghstack-poisoned]
@desertfire
Copy link
Contributor Author

Sigh, there is still inconsistency between local-run results and CI results.

@desertfire
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 8, 2023
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@huydhn
Copy link
Contributor

huydhn commented Feb 9, 2023

@pytorchbot revert -m 'This change fails in trunk https://hud.pytorch.org/pytorch/pytorch/commit/7bfc59993d25c444eccb6cd77e85e4dd0a348b7e running out of memory. Mark this as weird because it was green in PR' -c weird

@huydhn huydhn reopened this Feb 9, 2023
@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

@desertfire your PR has been successfully reverted.

pytorchmergebot added a commit that referenced this pull request Feb 9, 2023
…cy (#94363)"

This reverts commit 7bfc599.

Reverted #94363 on behalf of https://github.com/huydhn due to This change fails in trunk https://hud.pytorch.org/pytorch/pytorch/commit/7bfc59993d25c444eccb6cd77e85e4dd0a348b7e running out of memory.  Mark this as weird because it was green in PR
@desertfire
Copy link
Contributor Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a rebase job. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Rebase failed due to

Aborting rebase because rebasing the branch resulted in the same sha as the target branch.
This usually happens because the PR has already been merged.  Please rebase locally and push.

Raised by https://github.com/pytorch/pytorch/actions/runs/4144664499

@desertfire desertfire removed ciflow/trunk Trigger trunk jobs on your pull request ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR labels Feb 10, 2023
@desertfire desertfire closed this Feb 10, 2023
desertfire added a commit that referenced this pull request Feb 12, 2023
Summary: This is the reland of #94363.

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Feb 12, 2023
Summary: This is the reland of #94363.

ghstack-source-id: c859ff1
Pull Request resolved: #94712
pytorchmergebot pushed a commit that referenced this pull request Feb 13, 2023
…to false when testing accuracy"

Summary: This is the reland of #94363.

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Feb 13, 2023
…sting accuracy"

Summary: This is the reland of #94363.

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Feb 13, 2023
…to false when testing accuracy"

Summary: This is the reland of #94363.

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Feb 13, 2023
…sting accuracy"

Summary: This is the reland of #94363.

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Feb 13, 2023
Summary: This is the reland of #94363.

ghstack-source-id: 2837cc4
Pull Request resolved: #94712
pytorchmergebot pushed a commit that referenced this pull request Feb 14, 2023
…to false when testing accuracy"

Summary: This is the reland of #94363.

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Feb 14, 2023
Summary: This is the reland of #94363.

ghstack-source-id: f871094
Pull Request resolved: #94712
pytorchmergebot pushed a commit that referenced this pull request Feb 14, 2023
…sting accuracy"

Summary: This is the reland of #94363.

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Feb 15, 2023
Summary: This is the reland of #94363.

ghstack-source-id: 0308943
Pull Request resolved: #94712
desertfire added a commit that referenced this pull request Feb 15, 2023
…to false when testing accuracy"

Summary: This is the reland of #94363.

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Feb 15, 2023
…sting accuracy"

Summary: This is the reland of #94363.

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Feb 16, 2023
…to false when testing accuracy"


Summary: This is a reland of #94363.

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Feb 16, 2023
Summary: This is the reland of #94363.

ghstack-source-id: 4c1d48a
Pull Request resolved: #94712
pytorchmergebot pushed a commit that referenced this pull request Feb 16, 2023
…sting accuracy"


Summary: This is a reland of #94363.

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Feb 16, 2023
…to false when testing accuracy"


Summary: This is a reland of #94363.

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Feb 16, 2023
…sting accuracy"


Summary: This is a reland of #94363.

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Feb 16, 2023
Summary: This is the reland of #94363.

ghstack-source-id: aa4d986
Pull Request resolved: #94712
desertfire added a commit that referenced this pull request Feb 17, 2023
…to false when testing accuracy"


Summary: This is a reland of #94363.

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

[ghstack-poisoned]
desertfire added a commit that referenced this pull request Feb 17, 2023
Summary: This is the reland of #94363.

ghstack-source-id: 218ebd8
Pull Request resolved: #94712
desertfire added a commit that referenced this pull request Feb 17, 2023
…sting accuracy"


Summary: This is a reland of #94363.

cc mlazos soumith voznesenskym yanboliang penguinwu anijain2305 EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx

[ghstack-poisoned]
@facebook-github-bot facebook-github-bot deleted the gh/desertfire/65/head branch June 8, 2023 16:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants