Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inductor: enable weight prepack for LSTM #103071

Closed
wants to merge 35 commits into from

Conversation

chunyuan-w
Copy link
Collaborator

@chunyuan-w chunyuan-w commented Jun 6, 2023

Stack from ghstack (oldest at bottom):

  • Enabled LSTM weight prepack in inductor.
  • Added a mkldnn decomposition for lstm which won't change for different seq_lens. With the previous decomposition, for dynamic shapes use case where seq_lens changes, the graph will be different.
  • Extended several inductor utility functions to support List(Tensor) as input. Previously those functions only supported Tensor input.

Update 2023-07-26:

  • inductor: move the CPU weight packing path after of AOTAutograd #103851 has moved CPU weight packing to be after AOTAutograd. Fixed the support in this PR to follow the same way (mainly in 3b207f7#diff-6dffed1ade0ba3e887f9a4eafa3bfcec267ab2365b8adcb91bd391f49b3fd2e3).
    LSTM is decomposed in aten.mkldnn_rnn_layer by layer and by direction. The weight prepack is done at the mkldnn_rnn_layer level.
  • Add a fix in rnn __get_state__ function in case we need to recompile an LSTM module.
    When compiling the module, the weights tensors which are the named_parameters of the module are converted to functional_tensor here:
    orig_parameters_and_buffers, _ = accessor.swap_tensors_dict(
    untied_parameters_and_buffers, allow_missing=True
    )
    yield

    The forward function of LSTM will be called:
    out = Interpreter(mod).run(*args[params_len:], **kwargs)
    else:
    out = mod(*args[params_len:], **kwargs)

    In the forward function, the _flat_weights are updated to be the same as the weights, thus becoming functional_tensor:
    def forward(self, input, hx=None): # noqa: F811
    if not torch.jit.is_scripting():
    if self._weights_have_changed():
    self._init_flat_weights()

    The weights tensors are converted back to the original tensors (which are not functional_tensor anymore) before exiting the _reparametrize_module context here:
    new_parameters_and_buffers, _ = accessor.swap_tensors_dict(
    orig_parameters_and_buffers, allow_missing=True
    )
    # Sometimes the module is not completely stateless and has some in-place modifications on
    # the _parameters and _buffers dictionaries.
    # Write the changed parameters and buffers back to the original dict.
    parameters_and_buffers.update(
    {
    k: new_parameters_and_buffers[k]
    for k in parameters_and_buffers
    if k in new_parameters_and_buffers
    }
    )

    But since _flat_weights is not in the named_parameters of the module, it's still functional_tensor (link of the parameters that will be converted to functional and reverted back).
    At this moment, if we need to recompile the model, deepcopy will be called:
    def deepcopy_to_fake_tensor(obj, fake_mode):
    with torch._subclasses.fake_tensor.FakeCopyMode(fake_mode):
    return wrap_fake_exception(lambda: copy.deepcopy(obj))

    And it will report UnImplemented since we have functional_tensor (_flat_weights) and will trigger graph break which is not what we expect:
    torch._is_functional_tensor(t),

    Added a fix in the __get_state__ to update the _flat_weights if ever weights have changed to fix this issue. The fix is covered in the test_lstm_packed UT.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @ngimel @yf225 @chenyang78 @kadeng @muchulee8 @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 6, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/103071

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 572a694:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions github-actions bot added ciflow/inductor module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor labels Jun 6, 2023
chunyuan-w added a commit that referenced this pull request Jun 6, 2023
ghstack-source-id: 737c90dff8df6b21f3312b0f0292c27beb3ba0bc
Pull Request resolved: #103071
@chunyuan-w chunyuan-w added the topic: not user facing topic category label Jun 6, 2023
@chunyuan-w chunyuan-w marked this pull request as draft June 6, 2023 09:12
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Jun 6, 2023
ghstack-source-id: 0505af5e2b11cc766729c47f127a320878169b43
Pull Request resolved: #103071
@albanD albanD removed their request for review June 6, 2023 14:13
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Jun 7, 2023
ghstack-source-id: e0d0d9660c3b071d5d3bf44d06a3d887c514d880
Pull Request resolved: #103071
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Jun 7, 2023
ghstack-source-id: cd712c39a1cc7b60961edea4315a0fd036ad75e9
Pull Request resolved: #103071
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Jun 7, 2023
ghstack-source-id: 20bead60448b895bb6424ce963a6516eee796592
Pull Request resolved: #103071
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Jun 7, 2023
ghstack-source-id: e79ad2a25f0fb95d45802011126611193777f3f7
Pull Request resolved: #103071
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Jun 7, 2023
ghstack-source-id: 7369d20ccac4c998176e6b65a5881f4864d39d8a
Pull Request resolved: #103071
cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Jun 9, 2023
ghstack-source-id: 3545e5a28d23a8312a1781ee45844b6388ddc15e
Pull Request resolved: #103071
Enabled LSTM weight prepack in inductor.
Added a mkldnn decomposition for lstm which won't change for different `seq_lens`.
Extended several inductor utility functions to support `List(Tensor`) as input. Previously those functions only supported `Tensor` input.
Modified the forward function of `LSTM` module so that the child class of `LSTM` could define its own forward OP to call.


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Jun 9, 2023
ghstack-source-id: 75f0d346360938c13c93b894a2c0055a3b8549c3
Pull Request resolved: #103071
@chunyuan-w chunyuan-w marked this pull request as ready for review June 9, 2023 02:44
- Enabled LSTM weight prepack in inductor.
- Added a mkldnn decomposition for lstm which won't change for different `seq_lens`. With the previous decomposition, for dynamic shapes use case where `seq_lens` changes, the graph will be different.
- Extended several inductor utility functions to support `List(Tensor`) as input. Previously those functions only supported `Tensor` input.

**Update 2023-07-26:**
- #103851 has moved CPU weight packing to be after AOTAutograd. Fixed the support in this PR to follow the same way (mainly in 3b207f7#diff-6dffed1ade0ba3e887f9a4eafa3bfcec267ab2365b8adcb91bd391f49b3fd2e3).
LSTM is decomposed in `aten.mkldnn_rnn_layer` by layer and by direction. The weight prepack is done at the `mkldnn_rnn_layer` level.
- Add a fix in rnn `__get_state__` function in case we need to recompile an `LSTM` module.
When compiling the module, the weights tensors which are the `named_parameters` of the module are converted to `functional_tensor` here:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/nn/utils/stateless.py#L125-L128
The forward function of LSTM will be called:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_functorch/aot_autograd.py#L3379-L3381
In the forward function, the `_flat_weights` are updated to be the same as the weights, thus becoming `functional_tensor`:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/nn/modules/rnn.py#L775-L778
The weights tensors are converted back to the original tensors (which are not `functional_tensor` anymore) before exiting the `_reparametrize_module` context here:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/nn/utils/stateless.py#L130-L142
But since `_flat_weights` is not in the `named_parameters` of the module, it's still `functional_tensor` ([link of the parameters that will be converted to functional and reverted back](https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_functorch/aot_autograd.py#L3695-L3698)).
At this moment, if we need to recompile the model, `deepcopy` will be called:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_dynamo/utils.py#L915-L917
And it will report `UnImplemented` since we have `functional_tensor` (`_flat_weights`) and will trigger graph break which is not what we expect:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_subclasses/meta_utils.py#L514
Added a fix in the `__get_state__`  to update the `_flat_weights` if ever weights have changed to fix this issue. The fix is covered in the `test_lstm_packed` UT.



cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Jul 27, 2023
ghstack-source-id: 98a914ee892eac94baf0b48b9116eb26f77e66cf
Pull Request resolved: #103071
self.assertFalse(torch._is_functional_tensor(_flat_weight))

self.assertTrue("aten.mkldnn_rnn_layer" in code)
self.assertEqual(fn_opt(*inps), mod(*inps))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we also cover the case of changing input sizes?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the test for input of different sequence length to cover this case.

@@ -2636,6 +2686,43 @@ def one_layer_lstm_data(inp, hidden, params, has_biases, batch_sizes, reverse=Fa
return out, hidden_out


def use_mkldnn(input, hx, params):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a docstring to summarize the condition here? Also, it is specific to lstm, is the name use_mkldnn too general? Or maybe move the functions as local to lstm_impl?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring has been updated. Moved use_mkldnn to be a local function.

- Enabled LSTM weight prepack in inductor.
- Added a mkldnn decomposition for lstm which won't change for different `seq_lens`. With the previous decomposition, for dynamic shapes use case where `seq_lens` changes, the graph will be different.
- Extended several inductor utility functions to support `List(Tensor`) as input. Previously those functions only supported `Tensor` input.

**Update 2023-07-26:**
- #103851 has moved CPU weight packing to be after AOTAutograd. Fixed the support in this PR to follow the same way (mainly in 3b207f7#diff-6dffed1ade0ba3e887f9a4eafa3bfcec267ab2365b8adcb91bd391f49b3fd2e3).
LSTM is decomposed in `aten.mkldnn_rnn_layer` by layer and by direction. The weight prepack is done at the `mkldnn_rnn_layer` level.
- Add a fix in rnn `__get_state__` function in case we need to recompile an `LSTM` module.
When compiling the module, the weights tensors which are the `named_parameters` of the module are converted to `functional_tensor` here:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/nn/utils/stateless.py#L125-L128
The forward function of LSTM will be called:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_functorch/aot_autograd.py#L3379-L3381
In the forward function, the `_flat_weights` are updated to be the same as the weights, thus becoming `functional_tensor`:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/nn/modules/rnn.py#L775-L778
The weights tensors are converted back to the original tensors (which are not `functional_tensor` anymore) before exiting the `_reparametrize_module` context here:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/nn/utils/stateless.py#L130-L142
But since `_flat_weights` is not in the `named_parameters` of the module, it's still `functional_tensor` ([link of the parameters that will be converted to functional and reverted back](https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_functorch/aot_autograd.py#L3695-L3698)).
At this moment, if we need to recompile the model, `deepcopy` will be called:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_dynamo/utils.py#L915-L917
And it will report `UnImplemented` since we have `functional_tensor` (`_flat_weights`) and will trigger graph break which is not what we expect:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_subclasses/meta_utils.py#L514
Added a fix in the `__get_state__`  to update the `_flat_weights` if ever weights have changed to fix this issue. The fix is covered in the `test_lstm_packed` UT.



cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Jul 27, 2023
ghstack-source-id: b24aaafb4d80787db30d50cd0211dbb31dac876c
Pull Request resolved: #103071
- Enabled LSTM weight prepack in inductor.
- Added a mkldnn decomposition for lstm which won't change for different `seq_lens`. With the previous decomposition, for dynamic shapes use case where `seq_lens` changes, the graph will be different.
- Extended several inductor utility functions to support `List(Tensor`) as input. Previously those functions only supported `Tensor` input.

**Update 2023-07-26:**
- #103851 has moved CPU weight packing to be after AOTAutograd. Fixed the support in this PR to follow the same way (mainly in 3b207f7#diff-6dffed1ade0ba3e887f9a4eafa3bfcec267ab2365b8adcb91bd391f49b3fd2e3).
LSTM is decomposed in `aten.mkldnn_rnn_layer` by layer and by direction. The weight prepack is done at the `mkldnn_rnn_layer` level.
- Add a fix in rnn `__get_state__` function in case we need to recompile an `LSTM` module.
When compiling the module, the weights tensors which are the `named_parameters` of the module are converted to `functional_tensor` here:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/nn/utils/stateless.py#L125-L128
The forward function of LSTM will be called:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_functorch/aot_autograd.py#L3379-L3381
In the forward function, the `_flat_weights` are updated to be the same as the weights, thus becoming `functional_tensor`:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/nn/modules/rnn.py#L775-L778
The weights tensors are converted back to the original tensors (which are not `functional_tensor` anymore) before exiting the `_reparametrize_module` context here:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/nn/utils/stateless.py#L130-L142
But since `_flat_weights` is not in the `named_parameters` of the module, it's still `functional_tensor` ([link of the parameters that will be converted to functional and reverted back](https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_functorch/aot_autograd.py#L3695-L3698)).
At this moment, if we need to recompile the model, `deepcopy` will be called:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_dynamo/utils.py#L915-L917
And it will report `UnImplemented` since we have `functional_tensor` (`_flat_weights`) and will trigger graph break which is not what we expect:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_subclasses/meta_utils.py#L514
Added a fix in the `__get_state__`  to update the `_flat_weights` if ever weights have changed to fix this issue. The fix is covered in the `test_lstm_packed` UT.



cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Jul 27, 2023
ghstack-source-id: dfbf504eca0beef01b52cec1ac1c4928f98255e9
Pull Request resolved: #103071
@chunyuan-w
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x 5e48cbabafa4c6d576c64b41c1735ee57b8b9034 returned non-zero exit code 1

Auto-merging aten/src/ATen/native/native_functions.yaml
Auto-merging test/inductor/test_cpu_repro.py
Auto-merging torch/_inductor/fx_passes/mkldnn_fusion.py
CONFLICT (content): Merge conflict in torch/_inductor/fx_passes/mkldnn_fusion.py
Auto-merging torch/_inductor/graph.py
error: could not apply 5e48cbabafa... inductor: enable weight prepack for LSTM
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
Details for Dev Infra team Raised by workflow job

- Enabled LSTM weight prepack in inductor.
- Added a mkldnn decomposition for lstm which won't change for different `seq_lens`. With the previous decomposition, for dynamic shapes use case where `seq_lens` changes, the graph will be different.
- Extended several inductor utility functions to support `List(Tensor`) as input. Previously those functions only supported `Tensor` input.

**Update 2023-07-26:**
- #103851 has moved CPU weight packing to be after AOTAutograd. Fixed the support in this PR to follow the same way (mainly in 3b207f7#diff-6dffed1ade0ba3e887f9a4eafa3bfcec267ab2365b8adcb91bd391f49b3fd2e3).
LSTM is decomposed in `aten.mkldnn_rnn_layer` by layer and by direction. The weight prepack is done at the `mkldnn_rnn_layer` level.
- Add a fix in rnn `__get_state__` function in case we need to recompile an `LSTM` module.
When compiling the module, the weights tensors which are the `named_parameters` of the module are converted to `functional_tensor` here:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/nn/utils/stateless.py#L125-L128
The forward function of LSTM will be called:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_functorch/aot_autograd.py#L3379-L3381
In the forward function, the `_flat_weights` are updated to be the same as the weights, thus becoming `functional_tensor`:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/nn/modules/rnn.py#L775-L778
The weights tensors are converted back to the original tensors (which are not `functional_tensor` anymore) before exiting the `_reparametrize_module` context here:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/nn/utils/stateless.py#L130-L142
But since `_flat_weights` is not in the `named_parameters` of the module, it's still `functional_tensor` ([link of the parameters that will be converted to functional and reverted back](https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_functorch/aot_autograd.py#L3695-L3698)).
At this moment, if we need to recompile the model, `deepcopy` will be called:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_dynamo/utils.py#L915-L917
And it will report `UnImplemented` since we have `functional_tensor` (`_flat_weights`) and will trigger graph break which is not what we expect:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_subclasses/meta_utils.py#L514
Added a fix in the `__get_state__`  to update the `_flat_weights` if ever weights have changed to fix this issue. The fix is covered in the `test_lstm_packed` UT.



cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 voznesenskym penguinwu EikanWang Guobing-Chen zhuhaozhe blzheng Xia-Weiwen wenzhe-nrv jiayisunx peterbell10 ipiszy ngimel yf225 chenyang78 kadeng muchulee8 aakhundov

[ghstack-poisoned]
chunyuan-w added a commit that referenced this pull request Jul 28, 2023
ghstack-source-id: edeae6eeca0bf8459f2f08d7963b51dedb674382
Pull Request resolved: #103071
@chunyuan-w
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cpu-py3 / test (default, 2, 3, windows.4xlarge.nonephemeral)

Details for Dev Infra team Raised by workflow job

@chunyuan-w
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

bobby-palmer pushed a commit to bobby-palmer/pytorch that referenced this pull request Jul 29, 2023
- Enabled LSTM weight prepack in inductor.
- Added a mkldnn decomposition for lstm which won't change for different `seq_lens`. With the previous decomposition, for dynamic shapes use case where `seq_lens` changes, the graph will be different.
- Extended several inductor utility functions to support `List(Tensor`) as input. Previously those functions only supported `Tensor` input.

**Update 2023-07-26:**
- pytorch#103851 has moved CPU weight packing to be after AOTAutograd. Fixed the support in this PR to follow the same way (mainly in pytorch@3b207f7#diff-6dffed1ade0ba3e887f9a4eafa3bfcec267ab2365b8adcb91bd391f49b3fd2e3).
LSTM is decomposed in `aten.mkldnn_rnn_layer` by layer and by direction. The weight prepack is done at the `mkldnn_rnn_layer` level.
- Add a fix in rnn `__get_state__` function in case we need to recompile an `LSTM` module.
When compiling the module, the weights tensors which are the `named_parameters` of the module are converted to `functional_tensor` here:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/nn/utils/stateless.py#L125-L128
The forward function of LSTM will be called:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_functorch/aot_autograd.py#L3379-L3381
In the forward function, the `_flat_weights` are updated to be the same as the weights, thus becoming `functional_tensor`:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/nn/modules/rnn.py#L775-L778
The weights tensors are converted back to the original tensors (which are not `functional_tensor` anymore) before exiting the `_reparametrize_module` context here:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/nn/utils/stateless.py#L130-L142
But since `_flat_weights` is not in the `named_parameters` of the module, it's still `functional_tensor` ([link of the parameters that will be converted to functional and reverted back](https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_functorch/aot_autograd.py#L3695-L3698)).
At this moment, if we need to recompile the model, `deepcopy` will be called:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_dynamo/utils.py#L915-L917
And it will report `UnImplemented` since we have `functional_tensor` (`_flat_weights`) and will trigger graph break which is not what we expect:
https://github.com/pytorch/pytorch/blob/76fb72e24a5a4a47ad1f50c5c94d5c0b7e703531/torch/_subclasses/meta_utils.py#L514
Added a fix in the `__get_state__`  to update the `_flat_weights` if ever weights have changed to fix this issue. The fix is covered in the `test_lstm_packed` UT.

Pull Request resolved: pytorch#103071
Approved by: https://github.com/jgong5, https://github.com/jansel
@facebook-github-bot facebook-github-bot deleted the gh/chunyuan-w/51/head branch July 31, 2023 14:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor open source topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants