Bad squeeze in CPUForgetMult #11

santi-pdp · 2017-11-24T17:55:16Z

Hi,

It looks like I've encountered a lil bug when batch_size=1 at CPU inference ( haven't checked on GPU yet ). I've found that, whilst forwarding in CPUForgetMult, there is a general squeeze for all dimensions when appending each h to the resulting list of tensors, concretely:

result.append(h.squeeze())

It turns out the size of h at each iteration is (1, batch_size, feats), so when we squeeze with batch_size=1 the resulting tensor is of size (feats,), resulting in a final stack torch.stack(result) of size (seq_len, feats).
This will cause an error when, in QRNN forward, we do C[-1:, :, :] trying to access every sample in batch dimension (i.e. 1) which does not exist because of the squeeze. We can just specify the specific squeeze dimension to be 0 (in batch_first=False option, which is the only one available atm).

The text was updated successfully, but these errors were encountered:

mhart · 2017-11-25T15:55:58Z

Was going to file what I think is a similar issue, just thought I'd check if it's the same root cause.

Have been following the steps in https://github.com/salesforce/awd-lstm-lm to generate a QRNN model, which completed successfully, but am getting this when trying use it to generate text:

$ python generate.py --data ./data/mydata --checkpoint MYQRNNMODEL.pt --cuda
Traceback (most recent call last):
  File "generate.py", line 65, in <module>
    output, hidden = model(input, hidden)
  File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/awd-lstm-lm/model.py", line 82, in forward
    raw_output, new_h = rnn(raw_output, hidden[l])
  File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/miniconda3/lib/python3.6/site-packages/torchqrnn/qrnn.py", line 60, in forward
    Xm1 = [self.prevX if self.prevX is not None else X[:1, :, :] * 0, X[:-1, :, :]]
  File "/miniconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 76, in __getitem__
    return Index.apply(self, key)
  File "/miniconda3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 16, in forward
    result = i.index(ctx.index)
ValueError: result of slicing is an empty tensor

(same with or without --cuda flag)

Any idea what I can do to get generate.py to work with this model? Or should I file a separate issue?

Smerity · 2017-11-25T23:02:42Z

Hey @mhart - that's a separate issue but one that's fixed in 2ffbd32

@santi-pdp - thanks, yes, this would be a problem, and your analysis is entirely correct. I've fixed it in d045e72

If you both pip install the issues should be resolved. Thanks for reporting them! =]

santi-pdp closed this as completed Nov 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bad squeeze in CPUForgetMult #11

Bad squeeze in CPUForgetMult #11

santi-pdp commented Nov 24, 2017

mhart commented Nov 25, 2017

Smerity commented Nov 25, 2017

Bad squeeze in CPUForgetMult #11

Bad squeeze in CPUForgetMult #11

Comments

santi-pdp commented Nov 24, 2017

mhart commented Nov 25, 2017

Smerity commented Nov 25, 2017