RuntimeError: dimension specified as 0 but tensor has no dimensions #42

hahakid · 2018-06-28T09:39:18Z

I try the newest code update 6.28. And the test_1024p.sh still meet the out of memory problem.
And the train_512p.sh works fine on single GPU, but when using multiple GPUs, I always get

Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f95756ddf90>> ignored
Traceback (most recent call last):
File "train.py", line 61, in
Variable(data['image']), Variable(data['feat']), infer=save_fake)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 115, in forward
return self.gather(outputs, self.output_device)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 127, in gather
return gather(outputs, output_device, dim=self.dim)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
return gather_map(outputs)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
return Gather.apply(target_device, dim, *outputs)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in forward
ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in
ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
RuntimeError: dimension specified as 0 but tensor has no dimensions

I also try to modify the GPUs with --gpu_ids=1,2 or 1,2,3, same error occurred.

when using train_1024p.sh, I get
Traceback (most recent call last):
File "train.py", line 38, in
model = create_model(opt)
File "/media/f214/workspace/gan/pix2pixHD/models/models.py", line 15, in create_model
model.initialize(opt)
File "/media/f214/workspace/gan/pix2pixHD/models/pix2pixHD_model.py", line 60, in initialize
self.load_network(self.netG, 'G', opt.which_epoch, pretrained_path)
File "/media/f214/workspace/gan/pix2pixHD/models/base_model.py", line 60, in load_network
raise('Generator must exist!')
TypeError: exceptions must be old-style classes or derived from BaseException, not str

I try the code on both servers with 41080ti and 3Titan X.
tensorrt4.0
conda environment
cuda9.0 and cudnn7.1.3

ghost · 2018-06-28T17:33:56Z

I have the same issue.
My code is working on single GPU (EC2 p2.xlarge instance), but get similar error running on multiple GPU (EC2 p2.8xlarge).

I launch the train command as:
python train.py --name xxxx --dataroot ./datasets/xxxx/ --resize_or_crop none --loadSize 512 --fineSize 512 --label_nc 0 --no_instance --no_flip --verbose --batchSize 8 --gpu_ids 0,1,2,3,4,5,6,7

The model is created but then I get this error:

Traceback (most recent call last): File "train.py", line 61, in <module> Variable(data['image']), Variable(data['feat']), infer=save_fake) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/modules/module.py", line 491, in __call__ result = self.forward(*input, **kwargs) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 115, in forward return self.gather(outputs, self.output_device) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 127, in gather return gather(outputs, output_device, dim=self.dim) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather return gather_map(outputs) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map return type(out)(map(gather_map, zip(*outputs))) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map return type(out)(map(gather_map, zip(*outputs))) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map return Gather.apply(target_device, dim, *outputs) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 54, in forward ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs)) File "/home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages/torch/nn/parallel/_functions.py", line 54, in <lambda> ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs)) RuntimeError: dimension specified as 0 but tensor has no dimensions

Pytorch version: 0.4.0

Maybe this is related...

cientgu · 2018-08-02T02:48:06Z

@ouyangkid I have the same issue, did you find out how to fix it? i think maybe we should rewrite the multigpu code.

tcwang0509 · 2018-08-02T22:45:32Z

This is because new pytorch version does not accept scalars as losses. Just add something like
loss_list = [loss.unsqueeze(0) for loss in loss_list] before the model returns and it should work.

cientgu · 2018-08-03T01:54:24Z

I have rewrited the torch/nn/parallel/scatter_gather.py code and it works, Thanks for your reply. From: noreply@github.com <noreply@github.com> On Behalf Of Ting-Chun Wang Sent: Friday, August 3, 2018 6:46 AM To: NVIDIA/pix2pixHD <pix2pixHD@noreply.github.com> Cc: Shuyang Gu <gsy777@mail.ustc.edu.cn>; Manual <manual@noreply.github.com> Subject: Re: [NVIDIA/pix2pixHD] RuntimeError: dimension specified as 0 but tensor has no dimensions (#42) This is because new pytorch version does not accept scalars as losses. Just add something like loss_list = [loss.unsqueeze(0) for loss in loss_list] before the model returns and it should work. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#42 (comment)> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AVdfefh1dIkt9SDYzslTHC1OE8AVu_0pks5uM4EQgaJpZM4U7EOW> . <https://github.com/notifications/beacon/AVdfeRN3OU0mT1XnJywJOMF6jWSMEvKZks5uM4EQgaJpZM4U7EOW.gif>

cientgu · 2018-08-03T09:12:34Z

i find a simple solution to fix it: in pix2pixHD_model.py, reshape the five losses in forward function like: loss_G_GAN = loss_G_GAN.reshape(1)

hahakid · 2018-08-03T10:41:03Z

@cientgu great work, I will try your solution when I finished some of my works.
And what's your pytorch version?

cientgu · 2018-08-03T11:10:18Z

@ouyangkid 0.4.0

lkkchung · 2018-12-09T15:45:19Z

Easiest fix for me was to roll back pytorch. conda install pytorch=0.3.1 did the trick for me.

SikkeyHuang added a commit to SikkeyHuang/pix2pixHD that referenced this issue Mar 13, 2019

try to fix the NVIDIA#42 bug

fe71859

SikkeyHuang added a commit to SikkeyHuang/pix2pixHD that referenced this issue Mar 13, 2019

try to fix the NVIDIA#42 bug

d7eee71

NavpreetDevpuri mentioned this issue Jun 25, 2020

Runtime error ValueError: Generator must exist! #202

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: dimension specified as 0 but tensor has no dimensions #42

RuntimeError: dimension specified as 0 but tensor has no dimensions #42

hahakid commented Jun 28, 2018 •

edited

ghost commented Jun 28, 2018 •

edited by ghost

cientgu commented Aug 2, 2018

tcwang0509 commented Aug 2, 2018

cientgu commented Aug 3, 2018 via email

cientgu commented Aug 3, 2018

hahakid commented Aug 3, 2018

cientgu commented Aug 3, 2018

lkkchung commented Dec 9, 2018

RuntimeError: dimension specified as 0 but tensor has no dimensions #42

RuntimeError: dimension specified as 0 but tensor has no dimensions #42

Comments

hahakid commented Jun 28, 2018 • edited

ghost commented Jun 28, 2018 • edited by ghost

cientgu commented Aug 2, 2018

tcwang0509 commented Aug 2, 2018

cientgu commented Aug 3, 2018 via email

cientgu commented Aug 3, 2018

hahakid commented Aug 3, 2018

cientgu commented Aug 3, 2018

lkkchung commented Dec 9, 2018

hahakid commented Jun 28, 2018 •

edited

ghost commented Jun 28, 2018 •

edited by ghost