Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: dimension specified as 0 but tensor has no dimensions #42

Open
hahakid opened this issue Jun 28, 2018 · 8 comments
Open

Comments

@hahakid
Copy link

hahakid commented Jun 28, 2018

I try the newest code update 6.28. And the test_1024p.sh still meet the out of memory problem.
And the train_512p.sh works fine on single GPU, but when using multiple GPUs, I always get

Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f95756ddf90>> ignored
Traceback (most recent call last):
File "train.py", line 61, in
Variable(data['image']), Variable(data['feat']), infer=save_fake)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 115, in forward
return self.gather(outputs, self.output_device)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 127, in gather
return gather(outputs, output_device, dim=self.dim)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
return gather_map(outputs)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
return Gather.apply(target_device, dim, *outputs)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in forward
ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in
ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
RuntimeError: dimension specified as 0 but tensor has no dimensions

I also try to modify the GPUs with --gpu_ids=1,2 or 1,2,3, same error occurred.

when using train_1024p.sh, I get
Traceback (most recent call last):
File "train.py", line 38, in
model = create_model(opt)
File "/media/f214/workspace/gan/pix2pixHD/models/models.py", line 15, in create_model
model.initialize(opt)
File "/media/f214/workspace/gan/pix2pixHD/models/pix2pixHD_model.py", line 60, in initialize
self.load_network(self.netG, 'G', opt.which_epoch, pretrained_path)
File "/media/f214/workspace/gan/pix2pixHD/models/base_model.py", line 60, in load_network
raise('Generator must exist!')
TypeError: exceptions must be old-style classes or derived from BaseException, not str

I try the code on both servers with 41080ti and 3Titan X.
tensorrt4.0
conda environment
cuda9.0 and cudnn7.1.3

@ghost
Copy link

ghost commented Jun 28, 2018

I have the same issue.
My code is working on single GPU (EC2 p2.xlarge instance), but get similar error running on multiple GPU (EC2 p2.8xlarge).

I launch the train command as:
python train.py --name xxxx --dataroot ./datasets/xxxx/ --resize_or_crop none --loadSize 512 --fineSize 512 --label_nc 0 --no_instance --no_flip --verbose --batchSize 8 --gpu_ids 0,1,2,3,4,5,6,7

The model is created but then I get this error:

Traceback (most recent call last): File "train.py", line 61, in <module> Variable(data['image']), Variable(data['feat']), infer=save_fake) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 115, in forward return self.gather(outputs, self.output_device) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 127, in gather return gather(outputs, output_device, dim=self.dim) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather return gather_map(outputs) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map return type(out)(map(gather_map, zip(*outputs))) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map return type(out)(map(gather_map, zip(*outputs))) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map return Gather.apply(target_device, dim, *outputs) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 54, in forward ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs)) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 54, in <lambda> ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs)) RuntimeError: dimension specified as 0 but tensor has no dimensions

Pytorch version: 0.4.0

Maybe this is related...

@cientgu
Copy link

cientgu commented Aug 2, 2018

@ouyangkid I have the same issue, did you find out how to fix it? i think maybe we should rewrite the multigpu code.

@tcwang0509
Copy link
Contributor

This is because new pytorch version does not accept scalars as losses. Just add something like
loss_list = [loss.unsqueeze(0) for loss in loss_list] before the model returns and it should work.

@cientgu
Copy link

cientgu commented Aug 3, 2018 via email

@cientgu
Copy link

cientgu commented Aug 3, 2018

i find a simple solution to fix it: in pix2pixHD_model.py, reshape the five losses in forward function like: loss_G_GAN = loss_G_GAN.reshape(1)

@hahakid
Copy link
Author

hahakid commented Aug 3, 2018

@cientgu great work, I will try your solution when I finished some of my works.
And what's your pytorch version?

@cientgu
Copy link

cientgu commented Aug 3, 2018

@ouyangkid 0.4.0

@lkkchung
Copy link

lkkchung commented Dec 9, 2018

Easiest fix for me was to roll back pytorch. conda install pytorch=0.3.1 did the trick for me.

SikkeyHuang added a commit to SikkeyHuang/pix2pixHD that referenced this issue Mar 13, 2019
SikkeyHuang added a commit to SikkeyHuang/pix2pixHD that referenced this issue Mar 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants