[Error] n_gpu model encounted errors #24

wujunnan0929 · 2021-09-29T10:18:19Z

When I use n_gpu=1, everything is OK.
When I use n_gpu=4 in trainning, the procedure makes error as bellow:

Traceback (most recent call last):
File "train.py", line 110, in
main(config)
File "train.py", line 75, in main
trainer.train()
File "/home/xxx/workspace/Oracle/RIDE_IR/base/base_trainer.py", line 76, in train
result = self._train_epoch(epoch)
File "/home/xxx/workspace/Oracle/RIDE_IR/trainer/trainer.py", line 133, in _train_epoch
"logits": self.real_model.backbone.logits
File "/home/xxx/.conda/envs/pytorch_jhon/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1131, in getattr
type(self).name, name))
AttributeError: 'ResNet_s' object has no attribute 'logits'

When I try to fix this, I add "self.logits = []" at Resnet_s intit stage of ride_resenet_cifa.py , another mistake is occured as below:

Traceback (most recent call last):
File "train.py", line 110, in
main(config)
File "train.py", line 75, in main
trainer.train()
File "/home/xxx/workspace/Oracle/RIDE_IR/base/base_trainer.py", line 76, in train
result = self._train_epoch(epoch)
File "/home/xxx/workspace/Oracle/RIDE_IR/trainer/trainer.py", line 148, in _train_epoch
loss.backward()
AttributeError: 'int' object has no attribute 'backward'

Have you ever encounted errors like this and could you offer any help? Thanks so much.
My conda torch related packages version are as below:
ffmpeg 4.3 hf484d3e_0 pytorch
pytorch 1.9.0 py3.7_cuda10.2_cudnn7.6.5_0 pytorch
torchaudio 0.9.0 py37 pytorch
torchvision 0.10.0 py37_cu102 pytorch
python 3.7.11 h12debd9_0 defaults

frank-xwang · 2021-09-30T17:00:30Z

We have never encountered this error before. Can you try to run with pytorch1.7? This is the version of pytorch that we used to experiment.

frank-xwang · 2021-10-05T15:40:02Z

We are closing this issue, please feel free to reopen it if you are still suffering from this error.

frank-xwang closed this as completed Oct 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Error] n_gpu model encounted errors #24

[Error] n_gpu model encounted errors #24

wujunnan0929 commented Sep 29, 2021 •

edited

frank-xwang commented Sep 30, 2021

frank-xwang commented Oct 5, 2021

[Error] n_gpu model encounted errors #24

[Error] n_gpu model encounted errors #24

Comments

wujunnan0929 commented Sep 29, 2021 • edited

frank-xwang commented Sep 30, 2021

frank-xwang commented Oct 5, 2021

wujunnan0929 commented Sep 29, 2021 •

edited