Skip to content

Conversation

kwen2501
Copy link
Contributor

@kwen2501 kwen2501 commented Jan 18, 2024

Copy link

pytorch-bot bot commented Jan 18, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/117804

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures, 1 Unrelated Failure

As of commit 3cc481e with merge base c317bf2 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: distributed (c10d) release notes category label Jan 18, 2024
@github-actions github-actions bot added the oncall: distributed Add this issue/PR to distributed oncall triage queue label Jan 18, 2024
@kwen2501 kwen2501 marked this pull request as ready for review January 19, 2024 02:55
@kwen2501
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 19, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@kwen2501 kwen2501 force-pushed the remove_dev_sync_from_barrier branch from 81b2693 to 27d5884 Compare January 19, 2024 04:41
@kwen2501
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@awgu
Copy link
Collaborator

awgu commented Jan 19, 2024

I think this change might be BC breaking. Should we call this out?

I have previously used dist.barrier() for timing code where I relied on it doing a device sync, not just a stream sync.

@kwen2501
Copy link
Contributor Author

@awgu Good idea. I added a note to the documentation. Thanks!

@kwen2501
Copy link
Contributor Author

@pytorchbot merge -f "Previous CI trunk failure does not seem related"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@clee2000
Copy link
Contributor

clee2000 commented Jan 22, 2024

@pytorchbot revert -m "sorry the docs test failure is real, I think it wants the lines after the .. note to be indented https://hud.pytorch.org/pytorch/pytorch/commit/0f6bbb1c070c3a9713893659377e20e147c125f6 https://github.com/pytorch/pytorch/actions/runs/7616827874/job/20745016788. There are also some libtorch builds failing. This looks to be some combination of ignoredsignal (docs build) and nosignal (Dr CI classification for docs_test was wrong, pending libtorch jobs)" -c nosignal

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Jan 22, 2024
…)"

This reverts commit 0f6bbb1.

Reverted #117804 on behalf of https://github.com/clee2000 due to sorry the docs test failure is real, I think it wants the lines after the .. note to be indented https://github.com/pytorch/pytorch/actions/runs/7616827874/job/20745016788.  Marking as nosignal due to bad Dr. CI categorization ([comment](#117804 (comment)))
@pytorchmergebot
Copy link
Collaborator

@kwen2501 your PR has been successfully reverted.

@kwen2501
Copy link
Contributor Author

@pytorchbot merge -f "Failures were from main and unrelated"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@malfet
Copy link
Contributor

malfet commented Jan 26, 2024

@kwen2501 why this PR updated PocketFFT submodule?

malfet added a commit that referenced this pull request Jan 26, 2024
Accidentally downgraded by force merge of #117804
pytorchmergebot pushed a commit that referenced this pull request Jan 26, 2024
Accidentally downgraded by force merge of #117804

Pull Request resolved: #118348
Approved by: https://github.com/kit1980
wconstab pushed a commit that referenced this pull request Jan 26, 2024
wconstab pushed a commit that referenced this pull request Jan 26, 2024
…)"

This reverts commit 0f6bbb1.

Reverted #117804 on behalf of https://github.com/clee2000 due to sorry the docs test failure is real, I think it wants the lines after the .. note to be indented https://github.com/pytorch/pytorch/actions/runs/7616827874/job/20745016788.  Marking as nosignal due to bad Dr. CI categorization ([comment](#117804 (comment)))
wconstab pushed a commit that referenced this pull request Jan 26, 2024
@github-actions github-actions bot deleted the remove_dev_sync_from_barrier branch February 25, 2024 01:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (c10d) release notes category Reverted
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants