Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug with state initialization #4

Closed
tmabraham opened this issue Mar 31, 2021 · 3 comments · Fixed by #5
Closed

Bug with state initialization #4

tmabraham opened this issue Mar 31, 2021 · 3 comments · Fixed by #5

Comments

@tmabraham
Copy link

I think there may be a bug with state initialization in the optimizer. Specifically, because the gradients are on the GPU and the states are initialized on the CPU, there is an error coming because of tensors on two different devices. I investigated the code and compared to code of other PyTorch optimizers and noticed a couple of things that could be causing this issue.

Typically when the states are initialized, the torch.zeros_like function is passed with memory_format=torch.preserve_format so that it has the same format as the input tensor, which is usually the model parameters. However, in this case, since it's happening in the __init__ function, the model parameters might not be on the GPU yet. So often, the PyTorch optimizer step includes the initialization code, where there is a check for len(state)==0 in order to initialize.

I changed the optimizer code to follow this sort of pattern and the code runs without issue. I will point out that I am using fastai, so it is possible that this is a fastai-specific issue, but to me it seems like this could be a major issue for other users as well.

@adefazio
Copy link
Contributor

I've had another user run into this issue as well. I've created the "inline" branch which has an implementation that initializes the optimizer within the step instead. I'm considering merging that branch, sounds like it would be a good idea.

@adefazio
Copy link
Contributor

Try this pull request and let me know if it works for you: #5

@tmabraham
Copy link
Author

@adefazio yes, this version of the optimizer code works for me! thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants