Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix generator exhaustion in SparseAdam #47724

Closed
wants to merge 2 commits into from

Conversation

mariosasko
Copy link
Contributor

Fixes #47594

@dr-ci
Copy link

dr-ci bot commented Nov 11, 2020

💊 CI failures summary and remediations

As of commit 2a6b375 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 7 times.

@ailzhang ailzhang added module: optimizer Related to torch.optim triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Nov 11, 2020
@mariosasko
Copy link
Contributor Author

mariosasko commented Nov 12, 2020

@heitorschueroff Can you please check this PR and merge it if everything is fine?

EDIT:
@albanD Could you please take a look or cc someone else before this PR becomes stale.

@@ -32,8 +33,12 @@ def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8):
if not 0.0 <= betas[1] < 1.0:
raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))

# if params are in the form of a generator, the next for-loop exhausts it,
# so the copy is passed to the loop
params, params_copy = itertools.tee(params)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there an advantage to doing this copy as opposed to changing params to a list? Like params = list(params)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we usually simply do list(params) yes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no difference in behavior, but using list instead of itertools.tee seems more readable and easier to understand IMO.

@codecov
Copy link

codecov bot commented Dec 3, 2020

Codecov Report

Merging #47724 (2a6b375) into master (de46369) will increase coverage by 6.48%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #47724      +/-   ##
==========================================
+ Coverage   74.34%   80.83%   +6.48%     
==========================================
  Files        1856     1860       +4     
  Lines      200606   200737     +131     
==========================================
+ Hits       149143   162258   +13115     
+ Misses      51463    38479   -12984     

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

@@ -32,8 +33,12 @@ def __init__(self, params, lr=1e-3, betas=(0.9, 0.999), eps=1e-8):
if not 0.0 <= betas[1] < 1.0:
raise ValueError("Invalid beta parameter at index 1: {}".format(betas[1]))

# if params are in the form of a generator, the next for-loop exhausts it,
# so the copy is passed to the loop
params, params_copy = itertools.tee(params)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we usually simply do list(params) yes.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@albanD merged this pull request in f2c3efd.

@facebook-github-bot
Copy link
Contributor

@albanD merged this pull request in f2c3efd.

janeyx99 added a commit that referenced this pull request Nov 28, 2023
…sparse params"

Context:

#47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal.

So why this PR?

I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking but would be minor, I believe.




[ghstack-poisoned]
janeyx99 added a commit that referenced this pull request Nov 28, 2023
Context:

#47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal.

So why this PR?

I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking but would be minor, I believe.




[ghstack-poisoned]
janeyx99 added a commit that referenced this pull request Nov 28, 2023
…sparse params"

Context:

#47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal.

So why this PR?

I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking but would be minor, I believe.




[ghstack-poisoned]
janeyx99 added a commit that referenced this pull request Nov 28, 2023
Context:

#47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal.

So why this PR?

I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking but would be minor, I believe.




[ghstack-poisoned]
janeyx99 added a commit that referenced this pull request Nov 29, 2023
…sparse params"

Context:

#47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal.

So why this PR?

I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking SO I've added a deprecation warning that we should remove in May 2024.

(But is it really BC breaking when we've said in the docs that params should be an iterable this whole time? Maybe this is just a bug fix....😛)




[ghstack-poisoned]
janeyx99 added a commit that referenced this pull request Nov 29, 2023
Context:

#47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal.

So why this PR?

I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking SO I've added a deprecation warning that we should remove in May 2024.

(But is it really BC breaking when we've said in the docs that params should be an iterable this whole time? Maybe this is just a bug fix....😛)




[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Nov 29, 2023
Context:

#47724 fixed the problem that SparseAdam could not handle generators by using the `list(...)` construct. However, this meant that SparseAdam deviated from other optimizers in that it could _accept_ a raw Tensors/Parameter vs requiring a container of them. This is not really a big deal.

So why this PR?

I do think this PR is cleaner. It uses the fact that the Optimizer parent class already containerizes parameters into parameter groups, so we could reuse that here by calling `super().__init__` first and then filter the param_groups after. This change would also make SparseAdam consistent with the rest of our optimizers in that only containerized params are accepted, which technically is BC breaking SO I've added a deprecation warning that we should remove in May 2024.

(But is it really BC breaking when we've said in the docs that params should be an iterable this whole time? Maybe this is just a bug fix....😛)

Pull Request resolved: #114425
Approved by: https://github.com/drisspg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla signed Merged module: optimizer Related to torch.optim open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SparseAdam exhausts generator params before initializing Optimizer
6 participants