Adding support for CuDNN-based LSTM with projections #47725

Kipok · 2020-11-11T02:22:48Z

I didn't yet update the documentation, will add those change soon. A few other things that I didn't do, but want to clarify if I maybe should.

I didn't expose projections in c++ API: torch/csrc/api/src/nn/modules/rnn.cpp. Let me know if this is desirable and I will add those changes.
I didn't expose projections in "lstm_cell" function and "_thnn_differentiable_lstm_cell_backward" functions from aten/src/ATen/native/RNN.cpp. As far as I understand, they are not needed for nn.LSTM CPU execution. For lstm_cell, projections don't bring any real benefit, since if cell is used separately, it can be easily added in Python. For "_thnn_differentiable_lstm_cell_backward", I'm actually not sure where exactly that function is used, so I also disabled projections there for now. Please let me know if I should change that.
I added check that projections are not supported for quantized LSTMs to quantized_lstm_<data/input> functions. But I didn't add any checks to LSTMCell code. It seems that since I disabled projections in "lstm_cell" function, they should also not be available for quantized models through any other API than quantized_lstm_<data/input>. Please let me know if I'm not correct and I will add checks to other places.
Projections are not supported for CuDNN versions < 7.1.2. Should I add the check for CuDNN version and disable projections in that case? If so, what will be the best way to do that?
Currently I added projection weight as the last weight, so the layout is "w_ih, w_hh, b_ih, b_hh, w_hr". This breaks the assumption that biases come after weights and thus I had to add additional if-s in various places. Alternative way would be to have "w_ih, w_hh, w_hr, b_ih, b_hh" layout, in which case the assumption will be true. But in that case I will need to split the loop in get_parameters function from aten/src/ATen/native/cudnn/RNN.cpp. And in some cases, I will still need to add an "undefined" tensor in the 3rd position, because we get all 5 weights from CuDNN most of the time. So I'm not sure which way is better. Let me know if you think I should change to the weights-then-biases layout.

Kipok · 2020-12-09T23:18:47Z

Thanks @BowenBao for helping with onnx questions! I have changed the asserts and also added error message for the onnx export as LSTMs with projections are not supported there (will require modifying onnxruntime code to enable this support).

I think now only backward compatibility question remains open. @ngimel, @zou3519, please let me know which option you prefer here. If you are ok with breaking backward compatibility in this case, please let me know how to "add the operators to the allow-list".

ngimel · 2020-12-09T23:22:20Z

Let's try breaking bc, option 2. To make bc tests pass, you should add the functions you are breaking to allow_list in test/backward_compatibility/check_backward_compatibility.py with some future date (say, couple weeks ahead), look at the examples of functions already there.

ngimel · 2020-12-10T00:02:17Z

Importing to see what internal CI says.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Kipok · 2020-12-10T18:00:45Z

@ngimel, @zou3519, both onnx and bc tests are passing now, but there are still 2 failures related to some specific builds. Any suggestions on what needs to be done to fix those? Or is it ok that they fail?

ngimel · 2020-12-10T18:04:28Z

Bazel build looks unrelated, probably should go away with rebase. ROCm build should be fixed - you should modify projection tests to expect to raise RuntimeError on ROCm, and adjust tolerance for the failing fp16 test.

Kipok · 2020-12-11T18:50:57Z

@ngimel, what would be the right way to test if ROCm is used? I tried to add "TEST_WITH_ROCM" if, but it seems that in some cases it's still using cudnn? Is it possible to check it on the model level itself somehow, e.g. like I can distinguish between cpu and cuda by checking .device() property?

Kipok · 2020-12-15T18:30:39Z

@ngimel, @zou3519, I fixed ROCm tests, no everything passes, except for some test case that seems to be unrelated to my PR (at least it was not failing on the previous test runs). Is there anything else that I need to do to complete the merge? Should I rebase with master one more time to check if the failing test will be fixed?

ngimel · 2020-12-15T19:12:58Z

asan failure is unrelated. Let me try importing, thank you!

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-12-16T21:22:50Z

@ngimel merged this pull request in 1b6d18a.

Summary: Somehow `mypy torch/quantization` got broken in the past couple of days: https://gist.github.com/vkuzo/07af454246f0a68e6fa8929beeec7e0d . I didn't see any relevant PRs other than #47725, which doesn't seem related. The error doesn't seem real, as the arguments to `_cudnn_rnn_flatten_weight` seem correct. For now, ignoring the failure so we have a clean `mypy` run on `torch/quantization`. Test Plan: ``` mypy torch/quantization ``` Reviewers: Subscribers: Tasks: Tags: [ghstack-poisoned]

Summary: Somehow `mypy torch/quantization` got broken in the past couple of days: https://gist.github.com/vkuzo/07af454246f0a68e6fa8929beeec7e0d . I didn't see any relevant PRs other than #47725, which doesn't seem related. The error doesn't seem real, as the arguments to `_cudnn_rnn_flatten_weight` seem correct. For now, ignoring the failure so we have a clean `mypy` run on `torch/quantization`. Test Plan: ``` mypy torch/quantization ``` Reviewers: Subscribers: Tasks: Tags: ghstack-source-id: cd1c4213022d7efb71946ca5325b1126845a014e Pull Request resolved: #49549

Summary: Somehow `mypy torch/quantization` got broken in the past couple of days: https://gist.github.com/vkuzo/07af454246f0a68e6fa8929beeec7e0d . I didn't see any relevant PRs other than #47725, which doesn't seem related. The error doesn't seem real, as the arguments to `_cudnn_rnn_flatten_weight` seem correct. For now, ignoring the failure so we have a clean `mypy` run on `torch/quantization`. Test Plan: ``` mypy torch/quantization ``` Reviewers: Subscribers: Tasks: Tags: Differential Revision: [D25616972](https://our.internmc.facebook.com/intern/diff/D25616972) [ghstack-poisoned]

Summary: Pull Request resolved: #49549 Somehow `mypy torch/quantization` got broken in the past couple of days: https://gist.github.com/vkuzo/07af454246f0a68e6fa8929beeec7e0d . I didn't see any relevant PRs other than #47725, which doesn't seem related. The error doesn't seem real, as the arguments to `_cudnn_rnn_flatten_weight` seem correct. For now, ignoring the failure so we have a clean `mypy` run on `torch/quantization`. Test Plan: ``` mypy torch/quantization ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25616972 fbshipit-source-id: 46c207fe1565ec949c0b1f57d6cd0c93f627e6bd

Summary: Fixes pytorch#46213 I didn't yet update the documentation, will add those change soon. A few other things that I didn't do, but want to clarify if I maybe should. 1. I didn't expose projections in c++ API: torch/csrc/api/src/nn/modules/rnn.cpp. Let me know if this is desirable and I will add those changes. 2. I didn't expose projections in "lstm_cell" function and "_thnn_differentiable_lstm_cell_backward" functions from aten/src/ATen/native/RNN.cpp. As far as I understand, they are not needed for nn.LSTM CPU execution. For lstm_cell, projections don't bring any real benefit, since if cell is used separately, it can be easily added in Python. For "_thnn_differentiable_lstm_cell_backward", I'm actually not sure where exactly that function is used, so I also disabled projections there for now. Please let me know if I should change that. 3. I added check that projections are not supported for quantized LSTMs to quantized_lstm_<data/input> functions. But I didn't add any checks to LSTMCell code. It seems that since I disabled projections in "lstm_cell" function, they should also not be available for quantized models through any other API than quantized_lstm_<data/input>. Please let me know if I'm not correct and I will add checks to other places. 4. Projections are not supported for CuDNN versions < 7.1.2. Should I add the check for CuDNN version and disable projections in that case? If so, what will be the best way to do that? 5. Currently I added projection weight as the last weight, so the layout is "w_ih, w_hh, b_ih, b_hh, w_hr". This breaks the assumption that biases come after weights and thus I had to add additional if-s in various places. Alternative way would be to have "w_ih, w_hh, w_hr, b_ih, b_hh" layout, in which case the assumption will be true. But in that case I will need to split the loop in get_parameters function from aten/src/ATen/native/cudnn/RNN.cpp. And in some cases, I will still need to add an "undefined" tensor in the 3rd position, because we get all 5 weights from CuDNN most of the time. So I'm not sure which way is better. Let me know if you think I should change to the weights-then-biases layout. Pull Request resolved: pytorch#47725 Reviewed By: zou3519 Differential Revision: D25449794 Pulled By: ngimel fbshipit-source-id: fe6ce59e481d1f5fd861a8ff7fa13d1affcedb0c

Summary: Pull Request resolved: pytorch#49549 Somehow `mypy torch/quantization` got broken in the past couple of days: https://gist.github.com/vkuzo/07af454246f0a68e6fa8929beeec7e0d . I didn't see any relevant PRs other than pytorch#47725, which doesn't seem related. The error doesn't seem real, as the arguments to `_cudnn_rnn_flatten_weight` seem correct. For now, ignoring the failure so we have a clean `mypy` run on `torch/quantization`. Test Plan: ``` mypy torch/quantization ``` Imported from OSS Reviewed By: jerryzh168 Differential Revision: D25616972 fbshipit-source-id: 46c207fe1565ec949c0b1f57d6cd0c93f627e6bd

Igor Gitman added 30 commits October 21, 2020 11:46

Expose proj_dim parameter

2aa911e

Fix int -> int_64t issue

a6c987c

Start exposing proj_size throughout the code base

2a5fdce

Exposed through get_weight_buf and flatten_weights

ee74637

Expose proj_size through _cudnn_rnn

c72c9ce

Update get_parameters to work with projections

cd3acc2

Remove redundant proj_size, fix get_expected_data_ptrs

54a9849

Fix try_get_weight_buf

24e47c2

Fix _cudnn_rnn function

dab693d

Fix _cudnn_rnn_backward functions

4d07a2a

Add correct hx creation on python side

5481d77

Fix incorrect projection layers initialization

d0dc6db

Correct weight initialization on python side

956ed26

Fix output size issue

4d0feae

Fix multi-layer projections issue

134ad1d

Expose proj_size in setstate and extra_repr

9557035

Fix AutocastRNN to accept models with projections

2da10ee

Merge branch 'master' into cudnn_projections

b63facc

Add test TODOs, add check for non-cudnn code

b142a0a

Fix incorrect hidden states init for LSTM

44d0c67

Fix error for RNN/GRU of accessing undefined cx

da32e77

Fix no-bias projections lstm for fp32

2a2c3f9

Fix no-bias projection fp16 case

de5519d

Add check for rnn/gru, add initial unit tests

06fadd2

Add proj_size to test_variable_sequence

09d0fe1

Add projections to rnn_weight_norm test

ae84ba6

Add projections to cudnn_weight_format test

8b0544d

Add projections to cudnn_weight_tying test

db24db7

Add projections to rnn_args_check test

515341c

Add projections to rnn_check_device test

337b0e7

Igor Gitman added 2 commits December 9, 2020 15:09

Add onnx test to check projections not supported

8f95f8e

Merge branch 'master' into cudnn_projections

bb87f91

Add _cudnn_rnn functions to allow_list for bc

4d7d6ca

facebook-github-bot reviewed Dec 10, 2020

View reviewed changes

Igor Gitman added 2 commits December 10, 2020 17:21

Adjust tests to work on rocm

6bb8bff

Merge branch 'master' into cudnn_projections

6834b59

Igor Gitman added 2 commits December 14, 2020 14:44

Add more precise check for runtime error in tests

42f1563

Merge branch 'master' into cudnn_projections

f696eda

facebook-github-bot reviewed Dec 15, 2020

View reviewed changes

facebook-github-bot closed this in 1b6d18a Dec 16, 2020

facebook-github-bot added the Merged label Dec 16, 2020

vkuzo mentioned this pull request Dec 17, 2020

unbreak mypy torch/quantization #49549

Closed

mszhanyi mentioned this pull request Dec 18, 2020

pytorch_windows_vs2019_py36_cuda10.1_test1 started to fail frequently, which doesn't look like a regression specific to a particular PR #49558

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding support for CuDNN-based LSTM with projections #47725

Adding support for CuDNN-based LSTM with projections #47725

Kipok commented Nov 11, 2020

Kipok commented Dec 9, 2020

ngimel commented Dec 9, 2020

ngimel commented Dec 10, 2020

facebook-github-bot left a comment

Kipok commented Dec 10, 2020

ngimel commented Dec 10, 2020

Kipok commented Dec 11, 2020

Kipok commented Dec 15, 2020

ngimel commented Dec 15, 2020

facebook-github-bot left a comment

facebook-github-bot commented Dec 16, 2020

Adding support for CuDNN-based LSTM with projections #47725

Adding support for CuDNN-based LSTM with projections #47725

Conversation

Kipok commented Nov 11, 2020

Kipok commented Dec 9, 2020

ngimel commented Dec 9, 2020

ngimel commented Dec 10, 2020

facebook-github-bot left a comment

Choose a reason for hiding this comment

Kipok commented Dec 10, 2020

ngimel commented Dec 10, 2020

Kipok commented Dec 11, 2020

Kipok commented Dec 15, 2020

ngimel commented Dec 15, 2020

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Dec 16, 2020