Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip this point data_size = 1 #21

Closed
rharadgithub opened this issue Apr 25, 2018 · 15 comments
Closed

skip this point data_size = 1 #21

rharadgithub opened this issue Apr 25, 2018 · 15 comments

Comments

@rharadgithub
Copy link

rharadgithub commented Apr 25, 2018

Dear sir,When I run the script:bash ./scripts/train_edges2shoes.sh,the following RuntimeError occurs.
(epoch: 1, iters: 49400, time: 0.311) , z_encoded_mag: 0.577, G_total: 4.293, G_L1_encoded: 2.367, z_L1: 0.259, KL: 0.076, G_GAN: 1.002, D_GAN: 0.498, G_GAN2: 0.589, D_GAN2: 0.988
(epoch: 1, iters: 49600, time: 0.322) , z_encoded_mag: 0.409, G_total: 2.001, G_L1_encoded: 0.385, z_L1: 0.302, KL: 0.069, G_GAN: 0.794, D_GAN: 0.960, G_GAN2: 0.450, D_GAN2: 1.137
(epoch: 1, iters: 49800, time: 0.311) , z_encoded_mag: 0.939, G_total: 3.441, G_L1_encoded: 1.597, z_L1: 0.373, KL: 0.079, G_GAN: 0.833, D_GAN: 0.774, G_GAN2: 0.560, D_GAN2: 1.015
skip this point data_size = 1
Traceback (most recent call last):
File "./train.py", line 28, in
model.update_G()
File "/home/rharad/junyanz/BicycleGAN/models/bicycle_gan_model.py", line 148, in update_G
self.backward_EG()
File "/home/rharad/junyanz/BicycleGAN/models/bicycle_gan_model.py", line 114, in backward_EG
self.loss_G.backward(retain_graph=True)
File "/home/rharad/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 167, in backward
torch.autograd.backward(self, gradient, retain_graph, create_graph, retain_variables)
File "/home/rharad/anaconda3/lib/python3.6/site-packages/torch/autograd/init.py", line 99, in backward
variables, grad_variables, retain_graph)
RuntimeError: Trying to backward through the graph a second time, but the buffers have already been freed. Specify retain_graph=True when calling backward the first time.

How can I solve this problem.....

@junyanz
Copy link
Owner

junyanz commented Apr 27, 2018

Could you report your PyTorch version? The current code only works well with PyTorch 0.1-0.3.

@rharadgithub
Copy link
Author

rharadgithub commented Apr 27, 2018

@junyanz ,Dear
In [1]: import torch

In [2]: print(torch.version)
0.3.1.post2
I try
Firstly ,torch version=0.3.1.post2 with Anaconda3(python 3.6.4)
Secondly,torch version=0.2.0_1 with python 2.7.12
Thirdly,torch version=0.2.0_0 with anaconda2(python 2.7.14)
Fourthly,torch version=0.3.0.post4 with anaconda3(python 3.6.4)

But the problem remains same like before.

I am eagerly waiting for your reply.
Thank you.

@HelenMao
Copy link

I meet the same problems with @rharadgithub

@HelenMao
Copy link

Is the problem of the number of the datasets of edge2shoes
the number of trainsets in edge2shoes is 49825
however, the batch_size = 2
so the code in def forward
self.skip = self.opt.isTrain and self.input_A.size(0) < self.opt.batchSize if self.skip: print('skip this point data_size = %d' % self.input_A.size(0)) return
will return, so the bug is that there is no forward function. @rharadgithub

In fact the code
self.real_A_random = self.real_A[half_size:]
and
self.real_B_random = self.real_B[half_size:]
are not used at all
I suggest to delete the codes so that it will not be any bug when training the edge2shoes codes.
@junyanz

@rharadgithub
Copy link
Author

rharadgithub commented May 2, 2018

Dear @HelenMao ,yes I am trying for datasets of edge2shoes. According to your suggestion,I just delete the L38 and L40 and run the scripts:bash ./scripts/train_edges2shoes.sh but still the following error
Traceback (most recent call last):
File "./train.py", line 27, in
model.update_D(data)
File "/home/hr/BicycleGAN/models/bicycle_gan_model.py", line 117, in update_D
self.forward()
File "/home/hr/BicycleGAN/models/bicycle_gan_model.py", line 45, in forward
self.z_random = self.get_z_random(self.real_A_random.size(0), self.opt.nz, 'gauss')
AttributeError: 'BiCycleGANModel' object has no attribute 'real_A_random'

On the other hand,When I just delete " self.skip = self.opt.isTrain and self.input_A.size(0) < self.opt.batchSize if self.skip: print('skip this point data_size = %d' % self.input_A.size(0)) return" keeping the L38 and L40 and run the scripts:bash ./scripts/train_edges2shoes.sh but still the following error:
Traceback (most recent call last):
File "./train.py", line 27, in
model.update_D(data)
File "/home/hr/BicycleGAN/models/bicycle_gan_model.py", line 119, in update_D
self.forward()
File "/home/hr/BicycleGAN/models/bicycle_gan_model.py", line 38, in forward
self.real_A_random=self.real_A[half_size:]
File "/home/hr/anaconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 78, in getitem
return Index.apply(self, key)
File "/home/hr/anaconda3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 89, in forward
result = i.index(ctx.index)
ValueError: result of slicing is an empty tensor

Am I wrong with your suggestion?

@HelenMao
Copy link

HelenMao commented May 2, 2018

you can’ delete the L38 and L40 directly, the most convenient way is that you delete one image from train dataset from 49825 into 49824.

the number of the dataset should be even @rharadgithub

@rharadgithub
Copy link
Author

@HelenMao ,I tried to train maps dataset,it has even number of samples and its working. But for odd number of samples,the problem remains(AttributeError: 'BiCycleGANModel' object has no attribute 'real_A_random') even after deleting L38 and L40.

@HelenMao
Copy link

HelenMao commented May 4, 2018

If you delete the L38 and L40 directly, the latter codes which use of these variables will be effected. And this is why it has arrtibuteError. @rharadgithub

I don't get the point why Junyan write the codes including the variable self.real_A_random. @junyanz

@twak
Copy link

twak commented May 4, 2018

I think I fixed it in this patch pull request #24

@junyanz
Copy link
Owner

junyanz commented May 4, 2018

Thanks for the bug report. I think I fixed the issue with the latest commit. self.real_A_random is sometimes used in conditional D case. I refactored the code. self.real_A_random will not be used for unconditional D.

@HelenMao
Copy link

HelenMao commented May 5, 2018

@junyanz thanks for your replying, I am also wondering why in the unconditional D
self.real_data_random = self.real_B_random
why we need use self.real_B_random? I think it is another picture, why we need another picture in using the pair, self.real_A_encoded and self.real_B_encoded
@junyanz

@WorkingCC
Copy link

WorkingCC commented May 9, 2018

I have the same question as HelenMao. By the way, why is the batchsize set to two instead of one?
@junyanz

@rharadgithub
Copy link
Author

Thanks to @junyanz

@junyanz
Copy link
Owner

junyanz commented May 23, 2018

@HelenMao @WorkingCC Sorry for the late reply. This is a minor implementation detail that should not significantly affect the results. We chose to use different images for training cVAE-GAN and training cLR-GAN. That is also the reason for batchSize=2.

@WorkingCC
Copy link

@junyanz Thank you very much for your reply .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants