-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unet3+ with resnet34 (and higher) crash #46
Comments
Right, ResNet50 and up have a different internal representation of their feature maps... Sorry for missing that in the first implementation. Should be an easy fix, will look into it 👍 |
I have investigated potential error sources:
|
Interesting, could be related to the multi-GPU then. Could you try what happens when you remove this line: https://github.com/initze/thaw-slump-segmentation/blob/master/train.py#L75? |
Unet3+ with a backbone > resnet34, e.g. resnet50, resnet101, ... caused the following error
resnet 18 and resnet34 are working OK
version 0.8.0
´´´
File "train.py", line 153, in run
self.train_epoch(data_loader)
File "train.py", line 203, in train_epoch
y_hat = self.model(img)
File "/home/pd/initze/anaconda3/envs/aicore/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/pd/initze/anaconda3/envs/aicore/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/home/pd/initze/anaconda3/envs/aicore/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/home/pd/initze/anaconda3/envs/aicore/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/home/pd/initze/anaconda3/envs/aicore/lib/python3.7/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
RuntimeError: Caught RuntimeError in replica 0 on device 0.
Original Traceback (most recent call last):
File "/home/pd/initze/anaconda3/envs/aicore/lib/python3.7/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/home/pd/initze/anaconda3/envs/aicore/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/isipd/projects/p_aicore_pf/initze/code/training/lib/models/unet3p/unet3p.py", line 208, in forward
h2_PT_hd4 = self.h2_PT_hd4_relu(self.h2_PT_hd4_bn(self.h2_PT_hd4_conv(self.h2_PT_hd4(h2))))
File "/home/pd/initze/anaconda3/envs/aicore/lib/python3.7/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/pd/initze/anaconda3/envs/aicore/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 423, in forward
return self._conv_forward(input, self.weight)
File "/home/pd/initze/anaconda3/envs/aicore/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 420, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [64, 64, 3, 3], expected input[2, 256, 64, 64] to have 64 channels, but got 256 channels instead
´´´
The text was updated successfully, but these errors were encountered: