Enable oneDNN implementation in LSTM op #91158

yanbing-j · 2022-12-20T07:46:11Z

Description

This PR is to enable oneDNN implementation in LSTM op to improve the performance of it. Both FP32 and BF16 are supported.

Performance improvement

In CPX 28C, with setting iomp and jemalloc.
We choose 8 LSTM input options (including input_size, hidden_size, num_layers, bidirectional, bias, batch_first, dropout, batch_size, seq_len), and the final option is a real input from train-clean-100 in LibriSpeech dataset. The performance improvements are shown in the following figures. We can see that LSTM with oneDNN implementation can perform better than the original.

In single socket:

In single core:

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @mcarilli @ptrblck @leslie-fang-intel

pytorch-bot · 2022-12-20T07:46:14Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91158

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 836a7dd:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jgong5 · 2023-01-09T01:37:43Z

test/test_mkldnn.py

+def get_rand_seed():
+    return int(time.time() * 1000000000)


this would cause the test result indeterministic. Can we use fixed seed?

Sure. Done. I fix it as 2023.

jgong5 · 2023-01-09T02:05:49Z

torch/_meta_registrations.py

+        cy = torch.empty(0, device=input.device)
+    else:
+        cy = cx_.new_empty(cx_.shape)
+    workspace = input.new_empty([hidden_size * 1024], dtype=torch.uint8)


I guess workspace doesn't matter here, just creating an empty tensor would be good enough?

Yes. An empty tensor works correctly here. Done.

jgong5 · 2023-01-09T02:17:27Z

aten/src/ATen/native/mkldnn/RNN.cpp

+  auto nblks = desc.blocking_desc().inner_nblks;
+  std::vector<int64_t> at_sizes(ndims + nblks);
+  auto padded_dims = desc.padded_dims();
+  auto blk_sizes = desc.blocking_desc().inner_blks;
+  auto blk_idxs = desc.blocking_desc().inner_idxs;


Do we really have to parse the internal blocking descriptors of onednn to get the workspace aten tensor? Can we just model it as a 1D tensor buffer from aten side?

Done with desc.get_size().

jgong5 · 2023-01-09T02:20:29Z

aten/src/ATen/native/mkldnn/RNN.cpp

+  }
+
+  auto input = input_;
+  bool is_input_packed = batch_sizes.size() != 0;


This is always false, why bother check?

Remove this replicate.

yanbing-j · 2023-01-10T07:22:19Z

This PR depends on Meta internal ideep/oneDNN upgrade. Do not merge it before the issue of Meta internal ideep/oneDNN upgrade is fixed.

yanbing-j · 2023-01-12T02:40:14Z

Hi @malfet , could you please help review this PR? Thanks!

malfet

Please address comments and add more comments explaining what this code is trying to do, though overall looks fine

aten/src/ATen/native/RNN.cpp

aten/src/ATen/native/mkldnn/RNN.cpp

yanbing-j · 2023-01-18T04:39:06Z

Hi @malfet , I have addressed all the comments. It's much better to use the suggested changes. I will try to merge this PR then.

yanbing-j · 2023-01-18T04:39:20Z

@pytorchbot merge

pytorchmergebot · 2023-01-18T04:41:13Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

xuzhao9 · 2023-01-20T23:40:44Z

We observe 40~60% speedup in tts_angular model on CPU in TorchBench: pytorch/benchmark#1376 because of this PR. Congrats!

github-actions bot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Dec 20, 2022

pytorchbot added the open source label Dec 20, 2022

yanbing-j force-pushed the yanbing/lstm_onednn branch 5 times, most recently from 0b97b20 to 6c902ea Compare December 26, 2022 08:17

yanbing-j force-pushed the yanbing/lstm_onednn branch 3 times, most recently from 66d3bc7 to c3f6977 Compare January 4, 2023 08:05

github-actions bot added the module: amp (automated mixed precision) autocast label Jan 4, 2023

yanbing-j force-pushed the yanbing/lstm_onednn branch from 3d1b5ea to 8f2cde3 Compare January 5, 2023 05:53

yanbing-j added the ciflow/trunk Trigger trunk jobs on your pull request label Jan 5, 2023

yanbing-j force-pushed the yanbing/lstm_onednn branch 3 times, most recently from 2cdb1b1 to 03920f7 Compare January 8, 2023 06:52

yanbing-j marked this pull request as ready for review January 8, 2023 11:01

yanbing-j requested review from albanD and soulitzer as code owners January 8, 2023 11:01

yanbing-j requested review from jgong5 and mingfeima January 8, 2023 11:01

jgong5 requested changes Jan 9, 2023

View reviewed changes

yanbing-j added the intel This tag is for PR from Intel label Jan 9, 2023

yanbing-j force-pushed the yanbing/lstm_onednn branch from 03920f7 to f9d6257 Compare January 9, 2023 07:51

yanbing-j requested a review from jgong5 January 9, 2023 07:55

jgong5 approved these changes Jan 9, 2023

View reviewed changes

yanbing-j force-pushed the yanbing/lstm_onednn branch from f9d6257 to c54ee3c Compare January 10, 2023 02:33

atalman added this to the 2.0.0 milestone Jan 11, 2023

malfet approved these changes Jan 12, 2023

View reviewed changes

malfet reviewed Jan 12, 2023

View reviewed changes

aten/src/ATen/native/mkldnn/RNN.cpp Outdated Show resolved Hide resolved

yanbing-j force-pushed the yanbing/lstm_onednn branch 6 times, most recently from d7d0523 to 1c3f210 Compare January 12, 2023 09:39

yanbing-j added 10 commits January 17, 2023 14:00

Enable lstm oneDNN path

1d109aa

Refactor code and remove lstm_mkldnn_stub as op

bcf3ffa

Enable backward

1df1c72

Fix CI failures

335a063

Fix CI

1bd3ea9

Enable BF16 support

d421507

Fix CI

5f51113

Add meta registeration

900d680

Update based on comments

5b51073

Update according to comments

836a7dd

yanbing-j force-pushed the yanbing/lstm_onednn branch from 1c3f210 to 836a7dd Compare January 17, 2023 06:13

pytorchmergebot added the Merged label Jan 18, 2023

pytorchmergebot closed this in 94a7c01 Jan 18, 2023

chunyuan-w mentioned this pull request Jan 30, 2023

[Inductor] [CPU] LSTM is not using oneDNN in tts_angular #93447

Closed

bdhirsh mentioned this pull request Apr 19, 2023

Backward step of derivative LSTM #99413

Closed

zhuhaozhe mentioned this pull request May 15, 2024

allow to use bf16 as fp32 internal precision for mkldnn rnn #126051

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable oneDNN implementation in LSTM op #91158

Enable oneDNN implementation in LSTM op #91158

yanbing-j commented Dec 20, 2022 •

edited

pytorch-bot bot commented Dec 20, 2022 •

edited

jgong5 Jan 9, 2023

yanbing-j Jan 9, 2023

jgong5 Jan 9, 2023

yanbing-j Jan 9, 2023

jgong5 Jan 9, 2023

yanbing-j Jan 9, 2023

jgong5 Jan 9, 2023

yanbing-j Jan 9, 2023

yanbing-j commented Jan 10, 2023

yanbing-j commented Jan 12, 2023

malfet left a comment

yanbing-j commented Jan 18, 2023

yanbing-j commented Jan 18, 2023

pytorchmergebot commented Jan 18, 2023

xuzhao9 commented Jan 20, 2023 •

edited

Enable oneDNN implementation in LSTM op #91158

Enable oneDNN implementation in LSTM op #91158

Conversation

yanbing-j commented Dec 20, 2022 • edited

Description

Performance improvement

pytorch-bot bot commented Dec 20, 2022 • edited

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/91158

✅ No Failures

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yanbing-j commented Jan 10, 2023

yanbing-j commented Jan 12, 2023

malfet left a comment

Choose a reason for hiding this comment

yanbing-j commented Jan 18, 2023

yanbing-j commented Jan 18, 2023

pytorchmergebot commented Jan 18, 2023

Merge started

xuzhao9 commented Jan 20, 2023 • edited

yanbing-j commented Dec 20, 2022 •

edited

pytorch-bot bot commented Dec 20, 2022 •

edited

xuzhao9 commented Jan 20, 2023 •

edited