Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistency problem with check and modules regarding bias #10

Closed
ClementPinard opened this issue Jun 11, 2018 · 3 comments
Closed

Consistency problem with check and modules regarding bias #10

ClementPinard opened this issue Jun 11, 2018 · 3 comments

Comments

@ClementPinard
Copy link
Contributor

The problem only occurs on pytorch master, because it's backprop engine is less compliant :
when running benchmark.py cpp (or cuda) :

Traceback (most recent call last):
  File "benchmark.py", line 43, in <module>
    (new_h.sum() + new_C.sum()).backward()
  File "/home/cpinard/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/cpinard/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 89, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Function LLTMFunctionBackward returned an invalid gradient at index 2 - expected shape [384] but got [1, 384]

This is due to modules bias parameter to be of size 3 * state_size while the backward outputs a tensor of size 1 x 3 * state_size . The problem is still here for torch 0.4.0, but the backprop engine doesn't complaint as the number of elements is the same.

So the solution could be to remove the keepdim=True in the d_bias computing e.g. here (but it's the same for python baseline, cpp and cuda)

But then you get the opposite error message when running check.py and grad_check.py :

Traceback (most recent call last):
  File "check.py", line 107, in <module>
    check_backward(variables, options.cuda, options.verbose)
  File "check.py", line 53, in check_backward
    (baseline_values[0] + baseline_values[1]).sum().backward()
  File "/home/cpinard/anaconda3/lib/python3.6/site-packages/torch/tensor.py", line 93, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)
  File "/home/cpinard/anaconda3/lib/python3.6/site-packages/torch/autograd/__init__.py", line 89, in backward
    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: Function LLTMFunctionBackward returned an invalid gradient at index 2 - expected shape [1, 15] but got [15]

This is because now the bias given to the function is of size 1 x 15 !

The solution is pretty simple, but needs to decide on what to do :

  • either make bias parameter in every nn module dimension 1 x ...
  • or squeeze bias in check.py and grad_check.py and remove the keepdim=True arguments when computing  d_bias sums.
@goldsborough
Copy link
Contributor

We recently added code to verify the gradient shape (pytorch/pytorch#8168), so it's expected that this would break. I'll fix it

@goldsborough
Copy link
Contributor

Fixed on master

@ClementPinard
Copy link
Contributor Author

Thanks for fixing it. However you forgot to change it on python/lltm_baseline.py. The module is actually never called whether from benchmark.py or [grad_]check.py so it doesn't trigger an error, but if you call it, you will also have the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants