Skip to content

Conversation

c00w
Copy link
Contributor

@c00w c00w commented Nov 22, 2024

[ghstack-poisoned]
Copy link

pytorch-bot bot commented Nov 22, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/141307

Note: Links to docs will display an error until the docs builds have been completed.

❌ 3 New Failures

As of commit 420ac2d with merge base da94ab0 (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

c00w added a commit that referenced this pull request Nov 22, 2024
This adds a basic waitcounter to help show if we're spending a lot of
time doing gets and sets to remote caches

ghstack-source-id: c93d1b0
Pull Request resolved: #141307
@c00w c00w requested review from masnesral and oulgen November 22, 2024 00:14
raise
self._log_sample(sample)
return result
with _WaitCounter("pytorch.remote_cache.get"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a convo on another diff: are you instead planning to add this Waitcounter functionality to dynamo_timed? If so, can you sync w/ me on that?

And also: logging here will conflate different use cases, which may or may not be what we want. If we want to distinguish inductor remote cache from autotune remote cache, then we'd want to log "lower down"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sam and I talked directly, recording conclusions here for others to see :).

We currently have two parallel collection systems - dynamo_timed (which writes to a bunch of stuff). and WaitCounter (which is purely for counters).

Here we are purely trying to measure "remote network activity in cache", whether it is in inductor, or other things doing remote access. This is being done in WaitCounters due to where people want to consume it.

Not blocking this PR, but there is interest in writing to waitcounters from dynamo_timed, and unless it breaks stuff (i.e. multiple process dynamo timed has issues with events), we should probably have a span + a waitcounter for most spots within the compiler.

I am going to cut a PR for the base infrastructure (hopefully today, if not after thanksgiving). Then we'll work on consolidating the logging within compile.

@c00w c00w added the topic: not user facing topic category label Nov 22, 2024
[ghstack-poisoned]
c00w added a commit that referenced this pull request Nov 22, 2024
This adds a basic waitcounter to help show if we're spending a lot of
time doing gets and sets to remote caches

ghstack-source-id: 80f4d09
Pull Request resolved: #141307
[ghstack-poisoned]
c00w added a commit that referenced this pull request Nov 22, 2024
This adds a basic waitcounter to help show if we're spending a lot of
time doing gets and sets to remote caches

ghstack-source-id: d51fa29
Pull Request resolved: #141307
@c00w c00w requested a review from masnesral November 23, 2024 00:03
@c00w
Copy link
Contributor Author

c00w commented Nov 23, 2024

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 23, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

c00w added a commit that referenced this pull request Dec 2, 2024
This adds a basic waitcounter to help show if we're spending a lot of
time doing gets and sets to remote caches

ghstack-source-id: d51fa29
Pull Request resolved: #141307
@c00w
Copy link
Contributor Author

c00w commented Dec 2, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

[ghstack-poisoned]
c00w added a commit that referenced this pull request Dec 2, 2024
This adds a basic waitcounter to help show if we're spending a lot of
time doing gets and sets to remote caches

ghstack-source-id: 07acb7b
Pull Request resolved: #141307
@c00w
Copy link
Contributor Author

c00w commented Dec 2, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

@c00w
Copy link
Contributor Author

c00w commented Dec 2, 2024

@pytorchbot merge -f "msvcc failure, unrelated to this code AFAICT".

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pobin6 pushed a commit to pobin6/pytorch that referenced this pull request Dec 5, 2024
This adds a basic waitcounter to help show if we're spending a lot of
time doing gets and sets to remote caches

Pull Request resolved: pytorch#141307
Approved by: https://github.com/masnesral
@github-actions github-actions bot deleted the gh/c00w/13/head branch January 2, 2025 02:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants