Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The counting of 'NFE-F' does not work in the case of multiple GPUs #53

Closed
HanshuYAN opened this issue May 22, 2019 · 2 comments
Closed

Comments

@HanshuYAN
Copy link

HanshuYAN commented May 22, 2019

Hi, I simply modified the code, odenet_mnist.py, by adding 'model = nn.DataParallel(model, devices)'.

Then, the code can be run on multiple GPUs, but the counting of NFE-F kept unchanged, always 0. Do you know why?

image

@HanshuYAN HanshuYAN changed the title The counting of 'nfe' does not work in the case of multiple GPUs The counting of 'NFE-F' does not work in the case of multiple GPUs May 22, 2019
@rtqichen
Copy link
Owner

rtqichen commented May 26, 2019

This is because the replicated models don't support mutation. See the first warning in https://pytorch.org/docs/stable/nn.html?highlight=nn%20dataparallel#dataparallel-layers-multi-gpu-distributed.

Changing this line https://github.com/rtqichen/torchdiffeq/blob/master/examples/odenet_mnist.py#L102 to self.register_buffer("nfe", torch.tensor(0.)) will make it work with nn.DataParallel, but it'll only count the number of evaluations on GPU 0.

@HanshuYAN
Copy link
Author

Thx~

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants