The counting of 'NFE-F' does not work in the case of multiple GPUs #53

HanshuYAN · 2019-05-22T12:43:23Z

Hi, I simply modified the code, odenet_mnist.py, by adding 'model = nn.DataParallel(model, devices)'.

Then, the code can be run on multiple GPUs, but the counting of NFE-F kept unchanged, always 0. Do you know why?

rtqichen · 2019-05-26T01:02:49Z

This is because the replicated models don't support mutation. See the first warning in https://pytorch.org/docs/stable/nn.html?highlight=nn%20dataparallel#dataparallel-layers-multi-gpu-distributed.

Changing this line https://github.com/rtqichen/torchdiffeq/blob/master/examples/odenet_mnist.py#L102 to self.register_buffer("nfe", torch.tensor(0.)) will make it work with nn.DataParallel, but it'll only count the number of evaluations on GPU 0.

HanshuYAN · 2019-05-26T02:15:18Z

Thx~

HanshuYAN changed the title ~~The counting of 'nfe' does not work in the case of multiple GPUs~~ The counting of 'NFE-F' does not work in the case of multiple GPUs May 22, 2019

rtqichen closed this as completed May 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The counting of 'NFE-F' does not work in the case of multiple GPUs #53

The counting of 'NFE-F' does not work in the case of multiple GPUs #53

HanshuYAN commented May 22, 2019 •

edited

Loading

rtqichen commented May 26, 2019 •

edited

Loading

HanshuYAN commented May 26, 2019

The counting of 'NFE-F' does not work in the case of multiple GPUs #53

The counting of 'NFE-F' does not work in the case of multiple GPUs #53

Comments

HanshuYAN commented May 22, 2019 • edited Loading

rtqichen commented May 26, 2019 • edited Loading

HanshuYAN commented May 26, 2019

HanshuYAN commented May 22, 2019 •

edited

Loading

rtqichen commented May 26, 2019 •

edited

Loading