Issues with Loss1 + 0*Loss2 and graph computation 

Hello, 

I found some absurd errors when trying to replicate my luatorch code to pytorch. After some serious debugging, my intuition tells me it goes down to some intrinsic parts of PyTorch. 

I run on master brunch; to run the code I provided, place it into examples/vae/ folder, then run python xxx.py to reproduce the errors. 

I attached a few modifications of examples/vae/main.py in the zip file [pytorch_debug.zip](https://github.com/pytorch/pytorch/files/1175297/pytorch_debug.zip)

(1) main_conv_new_rep.py: this code runs, but note that I modified the reparametrization part and got rid of KLD error.

(2) main_conv_new_rep_KLD.py: this code DOES NOT run, and gives the error message (below) with the only difference between the previous code to be KLD (line 144),I set KLD to be 0.  set KLD to be 0.** 
```

oat, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [65,0,0], thread: [59,0,0] Assertion `input >= 0. && input <= 1.` failed.
/home/shangw/pytorch/torch/lib/THCUNN/BCECriterion.cu:30: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::tuple<float, float, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [65,0,0], thread: [60,0,0] Assertion `input >= 0. && input <= 1.` failed.
/home/shangw/pytorch/torch/lib/THCUNN/BCECriterion.cu:30: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::tuple<float, float, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [65,0,0], thread: [61,0,0] Assertion `input >= 0. && input <= 1.` failed.
/home/shangw/pytorch/torch/lib/THCUNN/BCECriterion.cu:30: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::tuple<float, float, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [65,0,0], thread: [62,0,0] Assertion `input >= 0. && input <= 1.` failed.
/home/shangw/pytorch/torch/lib/THCUNN/BCECriterion.cu:30: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::tuple<float, float, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [65,0,0], thread: [63,0,0] Assertion `input >= 0. && input <= 1.` failed.
CUDA error after cudaEventDestroy in future dtor: device-side assert triggeredTraceback (most recent call last):
  File "main_conv_new_rep_kl.py", line 188, in <module>
    train(epoch)
  File "main_conv_new_rep_kl.py", line 159, in train
    loss = loss_function(recon_batch, data, mu, logvar)
  File "main_conv_new_rep_kl.py", line 135, in loss_function
    BCE = reconstruction_function(recon_x, x)
  File "/home/shangw/local/anaconda3/envs/py35s/lib/python3.5/site-packages/torch/nn/modules/module.py", line 225, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/shangw/local/anaconda3/envs/py35s/lib/python3.5/site-packages/torch/nn/modules/loss.py", line 34, in forward
    return backend_fn(self.size_average, weight=self.weight)(input, target)
  File "/home/shangw/local/anaconda3/envs/py35s/lib/python3.5/site-packages/torch/nn/_functions/thnn/loss.py", line 28, in forward
    result = super(BCELoss, self).forward(input, target)
  File "/home/shangw/local/anaconda3/envs/py35s/lib/python3.5/site-packages/torch/nn/_functions/thnn/auto.py", line 41, in forward
    output, *self.additional_args)
RuntimeError: cudaEventSynchronize in future::wait: device-side assert triggered
```

(3) main_conv_old_rep.py: this code DOES NOT run, gives the same error, and the only difference between this and the first one main_conv_new_rep.py is that I used the original reparametrization function (line 120), which is mathematically equivalent (and should be computationally equivalent as well). 

If someone can possibly look into this, I would greatly appreciate the effort! since it is a little bit time sensitive... Thanks! 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Issues with Loss1 + 0*Loss2 and graph computation #2209

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issues with Loss1 + 0*Loss2 and graph computation #2209

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions