-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: dimension specified as 0 but tensor has no dimensions #42
Comments
I have the same issue. I launch the train command as: The model is created but then I get this error:
Pytorch version: 0.4.0 Maybe this is related... |
@ouyangkid I have the same issue, did you find out how to fix it? i think maybe we should rewrite the multigpu code. |
This is because new pytorch version does not accept scalars as losses. Just add something like |
I have rewrited the torch/nn/parallel/scatter_gather.py code and it works, Thanks for your reply.
From: noreply@github.com <noreply@github.com> On Behalf Of Ting-Chun Wang
Sent: Friday, August 3, 2018 6:46 AM
To: NVIDIA/pix2pixHD <pix2pixHD@noreply.github.com>
Cc: Shuyang Gu <gsy777@mail.ustc.edu.cn>; Manual <manual@noreply.github.com>
Subject: Re: [NVIDIA/pix2pixHD] RuntimeError: dimension specified as 0 but tensor has no dimensions (#42)
This is because new pytorch version does not accept scalars as losses. Just add something like
loss_list = [loss.unsqueeze(0) for loss in loss_list] before the model returns and it should work.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub <#42 (comment)> , or mute the thread <https://github.com/notifications/unsubscribe-auth/AVdfefh1dIkt9SDYzslTHC1OE8AVu_0pks5uM4EQgaJpZM4U7EOW> . <https://github.com/notifications/beacon/AVdfeRN3OU0mT1XnJywJOMF6jWSMEvKZks5uM4EQgaJpZM4U7EOW.gif>
|
i find a simple solution to fix it: in pix2pixHD_model.py, reshape the five losses in forward function like: loss_G_GAN = loss_G_GAN.reshape(1) |
@cientgu great work, I will try your solution when I finished some of my works. |
@ouyangkid 0.4.0 |
Easiest fix for me was to roll back pytorch. |
I try the newest code update 6.28. And the test_1024p.sh still meet the out of memory problem.
And the train_512p.sh works fine on single GPU, but when using multiple GPUs, I always get
Exception NameError: "global name 'FileNotFoundError' is not defined" in <bound method _DataLoaderIter.del of <torch.utils.data.dataloader._DataLoaderIter object at 0x7f95756ddf90>> ignored
Traceback (most recent call last):
File "train.py", line 61, in
Variable(data['image']), Variable(data['feat']), infer=save_fake)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/modules/module.py", line 491, in call
result = self.forward(*input, **kwargs)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 115, in forward
return self.gather(outputs, self.output_device)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/data_parallel.py", line 127, in gather
return gather(outputs, output_device, dim=self.dim)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
return gather_map(outputs)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(*outputs)))
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/scatter_gather.py", line 55, in gather_map
return Gather.apply(target_device, dim, *outputs)
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in forward
ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
File "/home/f214/anaconda2/lib/python2.7/site-packages/torch/nn/parallel/_functions.py", line 54, in
ctx.input_sizes = tuple(map(lambda i: i.size(ctx.dim), inputs))
RuntimeError: dimension specified as 0 but tensor has no dimensions
I also try to modify the GPUs with --gpu_ids=1,2 or 1,2,3, same error occurred.
when using train_1024p.sh, I get
Traceback (most recent call last):
File "train.py", line 38, in
model = create_model(opt)
File "/media/f214/workspace/gan/pix2pixHD/models/models.py", line 15, in create_model
model.initialize(opt)
File "/media/f214/workspace/gan/pix2pixHD/models/pix2pixHD_model.py", line 60, in initialize
self.load_network(self.netG, 'G', opt.which_epoch, pretrained_path)
File "/media/f214/workspace/gan/pix2pixHD/models/base_model.py", line 60, in load_network
raise('Generator must exist!')
TypeError: exceptions must be old-style classes or derived from BaseException, not str
I try the code on both servers with 41080ti and 3Titan X.
tensorrt4.0
conda environment
cuda9.0 and cudnn7.1.3
The text was updated successfully, but these errors were encountered: