Fix bug of device and dtype of `WgtScaleBatchNorm.std` #116

fangwei123456 · 2022-10-15T10:23:36Z

Issue Number: 115

Objective of pull request:

Pull request checklist

Your PR fulfills the following requirements:

issue 115 created that explains the change and why it's needed
Tests are part of the PR (for bug fixes / features)
[Docs](https://github.com/lava-nc/docs) reviewed and added / updated if needed (for bug fixes / features)
PR conforms to [Coding Conventions](https://lava-nc.org/developer_guide.html#coding-conventions)
[PR applys BSD 3-clause or LGPL2.1+ Licenses](https://lava-nc.org/developer_guide.html#add-a-license) to all code files
Lint (flakeheaven lint src/lava tests/) and (bandit -r src/lava/.) pass locally
Build tests (pytest) passes locally

Pull request type

Please check your PR type:

What is the current behavior?

Run the following codes:

from lava.lib.dl import slayer
import torch

net = slayer.neuron.cuba.Neuron(
    threshold=1.,
    current_decay=1.,
    voltage_decay=0.,
    scale=1 << 6,
    norm=slayer.neuron.norm.WgtScaleBatchNorm
)
device = 'cuda:0'
net.to(device)
with torch.no_grad():
    x = torch.rand([4, 4, 4], device=device)
    net(x)

We will get the error:

Traceback (most recent call last):
  File "/home/wfang/spikingjelly_dev/spikingjelly/test4.py", line 15, in <module>
    net(x)
  File "/home/wfang/anaconda3/envs/lava-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/wfang/anaconda3/envs/lava-env/lib/python3.10/site-packages/lava/lib/dl/slayer/neuron/cuba.py", line 439, in forward
    _, voltage = self.dynamics(input)
  File "/home/wfang/anaconda3/envs/lava-env/lib/python3.10/site-packages/lava/lib/dl/slayer/neuron/cuba.py", line 365, in dynamics
    current = self.norm(current)
  File "/home/wfang/anaconda3/envs/lava-env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/wfang/anaconda3/envs/lava-env/lib/python3.10/site-packages/lava/lib/dl/slayer/neuron/norm.py", line 209, in forward
    std = self.std(var)
  File "/home/wfang/anaconda3/envs/lava-env/lib/python3.10/site-packages/lava/lib/dl/slayer/neuron/norm.py", line 170, in std
    return torch.ones(1) << torch.ceil(torch.log2(std)).clamp(
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

What is the new behavior?

We can run the codes without any error.

Does this introduce a breaking change?

Yes
No

Supplemental information

The error RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! can be solved by change File "/home/wfang/anaconda3/envs/lava-env/lib/python3.10/site-packages/lava/lib/dl/slayer/neuron/norm.py", line 170, in std from

return torch.ones(1) << torch.ceil(torch.log2(std)).clamp(

to

return torch.ones(1, device=std.device) << torch.ceil(torch.log2(std)).clamp(

But it will raise a new error:

RuntimeError: "lshift_cuda" not implemented for 'Float'

We can solve this error by cast both torch.ones(1, device=std.device) and torch.ceil(torch.log2(std)).clamp( ... to torch.int.

However, considering that the return value std is used for a float computation ... / std.view(1, -1), I think using float directly is better than using << with torch.int.

bamsumit

Thanks @fangwei123456 for identifying the issue and fixing the problem.

fix bug of device and dtype

3078abd

bamsumit approved these changes Oct 17, 2022

View reviewed changes

bamsumit linked an issue Oct 17, 2022 that may be closed by this pull request

Bug of device and dtype of WgtScaleBatchNorm.std #115

Closed

13 tasks

bamsumit requested review from tim-shea, mgkwill and timcheck October 17, 2022 15:12

timcheck approved these changes Oct 17, 2022

View reviewed changes

Merge branch 'main' into main

51ea00f

bamsumit merged commit b0e2866 into lava-nc:main Oct 24, 2022

tim-shea added this to the Release v0.3.1 milestone Oct 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug of device and dtype of `WgtScaleBatchNorm.std` #116

Fix bug of device and dtype of `WgtScaleBatchNorm.std` #116

fangwei123456 commented Oct 15, 2022 •

edited

bamsumit left a comment

Fix bug of device and dtype of WgtScaleBatchNorm.std #116

Fix bug of device and dtype of WgtScaleBatchNorm.std #116

Conversation

fangwei123456 commented Oct 15, 2022 • edited

Pull request checklist

Pull request type

What is the current behavior?

What is the new behavior?

Does this introduce a breaking change?

Supplemental information

bamsumit left a comment

Choose a reason for hiding this comment

Fix bug of device and dtype of `WgtScaleBatchNorm.std` #116

Fix bug of device and dtype of `WgtScaleBatchNorm.std` #116

fangwei123456 commented Oct 15, 2022 •

edited