Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Error] n_gpu model encounted errors #24

Closed
wujunnan0929 opened this issue Sep 29, 2021 · 2 comments
Closed

[Error] n_gpu model encounted errors #24

wujunnan0929 opened this issue Sep 29, 2021 · 2 comments

Comments

@wujunnan0929
Copy link

wujunnan0929 commented Sep 29, 2021

When I use n_gpu=1, everything is OK.
When I use n_gpu=4 in trainning, the procedure makes error as bellow:

Traceback (most recent call last):
File "train.py", line 110, in
main(config)
File "train.py", line 75, in main
trainer.train()
File "/home/xxx/workspace/Oracle/RIDE_IR/base/base_trainer.py", line 76, in train
result = self._train_epoch(epoch)
File "/home/xxx/workspace/Oracle/RIDE_IR/trainer/trainer.py", line 133, in _train_epoch
"logits": self.real_model.backbone.logits
File "/home/xxx/.conda/envs/pytorch_jhon/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1131, in getattr
type(self).name, name))
AttributeError: 'ResNet_s' object has no attribute 'logits'

When I try to fix this, I add "self.logits = []" at Resnet_s intit stage of ride_resenet_cifa.py , another mistake is occured as below:

Traceback (most recent call last):
File "train.py", line 110, in
main(config)
File "train.py", line 75, in main
trainer.train()
File "/home/xxx/workspace/Oracle/RIDE_IR/base/base_trainer.py", line 76, in train
result = self._train_epoch(epoch)
File "/home/xxx/workspace/Oracle/RIDE_IR/trainer/trainer.py", line 148, in _train_epoch
loss.backward()
AttributeError: 'int' object has no attribute 'backward'

Have you ever encounted errors like this and could you offer any help? Thanks so much.
My conda torch related packages version are as below:
ffmpeg 4.3 hf484d3e_0 pytorch
pytorch 1.9.0 py3.7_cuda10.2_cudnn7.6.5_0 pytorch
torchaudio 0.9.0 py37 pytorch
torchvision 0.10.0 py37_cu102 pytorch
python 3.7.11 h12debd9_0 defaults

@frank-xwang
Copy link
Owner

We have never encountered this error before. Can you try to run with pytorch1.7? This is the version of pytorch that we used to experiment.

@frank-xwang
Copy link
Owner

We are closing this issue, please feel free to reopen it if you are still suffering from this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants