Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[optim][adagrad] group tensors in foreach to maximize perf #92362

Closed
wants to merge 1 commit into from

Conversation

janeyx99
Copy link
Contributor

another one

@janeyx99 janeyx99 added the topic: performance topic category label Jan 18, 2023
@janeyx99 janeyx99 requested a review from albanD as a code owner January 18, 2023 01:09
@pytorch-bot
Copy link

pytorch-bot bot commented Jan 18, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/92362

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 Failures

As of commit 67a1880:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: nn release notes category label Jan 18, 2023
@janeyx99 janeyx99 added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 18, 2023
if maximize:
device_grads = torch._foreach_neg(device_grads)

device_has_sparse_grad = any(grad.is_sparse for grad in device_grads)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You now unconditionally recompute this even if it was already passed in? That doesn't sound right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, my thinking was that even if the whole batch may contain sparse grads, groups may not, so we can still use foreach to optimize these subgroups.

Is it more common for all grads to either be sparse or unsparse? If so, there may be other places I need to patch 😬

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine if we say that we need this info. I'm just more worried about the ignored arg ;)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing whatever resolution we have here should be applied to https://github.com/pytorch/pytorch/pull/92338/files#r1072880613 as well?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. Can be done later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, okay, my initial proposal is that we deprecate has_sparse_grad across the codebase.

@janeyx99
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Not merging any PRs at the moment because there is a merge blocking https://github.com/pytorch/pytorch/labels/ci:%20sev issue open at:
#92626

Details for Dev Infra team Raised by workflow job

@janeyx99
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-bionic-cuda11.6-py3.10-gcc7-sm86 / test (default, 3, 4, linux.g5.4xlarge.nvidia.gpu)

Details for Dev Infra team Raised by workflow job

@janeyx99
Copy link
Contributor Author

@pytorchbot merge -f "irrelevant failures"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request Merged release notes: nn release notes category topic: performance topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants