Skip to content

Conversation

masnesral
Copy link
Contributor

@masnesral masnesral commented Jul 7, 2025

Stack from ghstack (oldest at bottom):

Summary: There's some evidence that some very long compile times are actually attributable to the sync. This should make it easier to say for sure.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

…hers

Summary: There's some evidence that some very long compile times are actually attributable to the sync. This should make it easier to say for sure.

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Jul 7, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157747

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit a8098df with merge base 28aae93 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

masnesral added a commit that referenced this pull request Jul 7, 2025
…hers

Summary: There's some evidence that some very long compile times are actually attributable to the sync. This should make it easier to say for sure.

ghstack-source-id: 468057e
Pull Request resolved: #157747
@masnesral masnesral added the topic: not user facing topic category label Jul 7, 2025
@masnesral masnesral requested review from aorenste and jamesjwu July 7, 2025 23:43
@masnesral masnesral marked this pull request as ready for review July 7, 2025 23:43
@aorenste
Copy link
Contributor

aorenste commented Jul 7, 2025

This is already recorded in ODS - do we want it here too? (One advantage of being here is that it'll be in the perfetto trace)

@masnesral
Copy link
Contributor Author

This is already recorded in ODS - do we want it here too? (One advantage of being here is that it'll be in the perfetto trace)

I was only shooting for perfetto. But I didn't know it was already recorded in ODS. Where is that happening? Is there some underlying counter that captures all cuda.synchronize?

@aorenste
Copy link
Contributor

aorenste commented Jul 8, 2025

This is already recorded in ODS - do we want it here too? (One advantage of being here is that it'll be in the perfetto trace)

I was only shooting for perfetto. But I didn't know it was already recorded in ODS. Where is that happening? Is there some underlying counter that captures all cuda.synchronize?

It's in c10/cuda/CUDAFunctions.cpp::device_synchronize():

STATIC_SCOPED_WAIT_COUNTER(pytorch.wait_counter.cuda_device_synchronize);

all STATIC_SCOPED_WAIT_COUNTERS automatically get logged to ods.

@masnesral
Copy link
Contributor Author

@aorenste , ah thanks. This PR is in response to https://fburl.com/workplace/it2q0qzs. I thought it would be easier to see how much overhead there is in this specific cuda synchronize.

@masnesral masnesral added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 8, 2025
@masnesral
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@masnesral
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@masnesral
Copy link
Contributor Author

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

[ghstack-poisoned]
@pytorchmergebot
Copy link
Collaborator

Successfully rebased gh/masnesral/216/orig onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via ghstack checkout https://github.com/pytorch/pytorch/pull/157747)

pytorchmergebot pushed a commit that referenced this pull request Jul 9, 2025
…hers

Summary: There's some evidence that some very long compile times are actually attributable to the sync. This should make it easier to say for sure.

ghstack-source-id: 141a091
Pull Request resolved: #157747
@masnesral
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@github-actions github-actions bot deleted the gh/masnesral/216/head branch August 10, 2025 02:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants