Skip to content
This repository has been archived by the owner on Feb 12, 2022. It is now read-only.

Bad squeeze in CPUForgetMult #11

Closed
santi-pdp opened this issue Nov 24, 2017 · 2 comments
Closed

Bad squeeze in CPUForgetMult #11

santi-pdp opened this issue Nov 24, 2017 · 2 comments

Comments

@santi-pdp
Copy link

Hi,

It looks like I've encountered a lil bug when batch_size=1 at CPU inference ( haven't checked on GPU yet ). I've found that, whilst forwarding in CPUForgetMult, there is a general squeeze for all dimensions when appending each h to the resulting list of tensors, concretely:

result.append(h.squeeze())

It turns out the size of h at each iteration is (1, batch_size, feats), so when we squeeze with batch_size=1 the resulting tensor is of size (feats,), resulting in a final stack torch.stack(result) of size (seq_len, feats).
This will cause an error when, in QRNN forward, we do C[-1:, :, :] trying to access every sample in batch dimension (i.e. 1) which does not exist because of the squeeze. We can just specify the specific squeeze dimension to be 0 (in batch_first=False option, which is the only one available atm).

@mhart
Copy link

mhart commented Nov 25, 2017

Was going to file what I think is a similar issue, just thought I'd check if it's the same root cause.

Have been following the steps in https://github.com/salesforce/awd-lstm-lm to generate a QRNN model, which completed successfully, but am getting this when trying use it to generate text:

$ python generate.py --data ./data/mydata --checkpoint MYQRNNMODEL.pt --cuda
Traceback (most recent call last):
  File "generate.py", line 65, in <module>
    output, hidden = model(input, hidden)
  File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/awd-lstm-lm/model.py", line 82, in forward
    raw_output, new_h = rnn(raw_output, hidden[l])
  File "/miniconda3/lib/python3.6/site-packages/torch/nn/modules/module.py", line 224, in __call__
    result = self.forward(*input, **kwargs)
  File "/miniconda3/lib/python3.6/site-packages/torchqrnn/qrnn.py", line 60, in forward
    Xm1 = [self.prevX if self.prevX is not None else X[:1, :, :] * 0, X[:-1, :, :]]
  File "/miniconda3/lib/python3.6/site-packages/torch/autograd/variable.py", line 76, in __getitem__
    return Index.apply(self, key)
  File "/miniconda3/lib/python3.6/site-packages/torch/autograd/_functions/tensor.py", line 16, in forward
    result = i.index(ctx.index)
ValueError: result of slicing is an empty tensor

(same with or without --cuda flag)

Any idea what I can do to get generate.py to work with this model? Or should I file a separate issue?

@Smerity
Copy link
Contributor

Smerity commented Nov 25, 2017

Hey @mhart - that's a separate issue but one that's fixed in 2ffbd32

@santi-pdp - thanks, yes, this would be a problem, and your analysis is entirely correct. I've fixed it in d045e72

If you both pip install the issues should be resolved. Thanks for reporting them! =]

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants