Skip to content

Conversation

sanchitintel
Copy link
Collaborator

@sanchitintel sanchitintel commented Mar 22, 2022

Description

Relanding #68111
Preview4 PR of this RFC.

On the basis of #50256, the below improvements are included:

  • The preview4 release branch of the oneDNN Graph API is used
  • The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.

User API:

The optimization pass is disabled by default. Users could enable it by:

torch.jit.enable_onednn_fusion(True)

Performance:

pytorch/benchmark tool is used to compare the performance:

  • SkyLake 8180 (1 socket of 28 cores):
    image
  • SkyLake 8180 (single thread):
    image
    • By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI)
      ** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops

Directory structure of the integration code

Fuser-related code are placed under:

torch/csrc/jit/codegen/onednn/

Optimization pass registration is done in:

torch/csrc/jit/passes/onednn_graph_fuser.h

CMake for the integration code is:

caffe2/CMakeLists.txt

Limitations

  • In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as a next step.
  • We have only optimized the inference use case.

@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Mar 22, 2022

🔗 Helpful links

💊 CI failures summary and remediations

As of commit c02ece3 (more details on the Dr. CI page):


  • 4/4 failures introduced in this PR

4 failures not recognized by patterns:

Job Step Action
GitHub Actions pull / linux-xenial-cuda11.3-py3.7-gcc7 / build Unknown 🔁 rerun
GitHub Actions pull / linux-bionic-rocm4.5-py3.7 / build Unknown 🔁 rerun
GitHub Actions pull / linux-xenial-py3.7-gcc5.4-mobile-lightweight-dispatch-build / build Unknown 🔁 rerun
GitHub Actions pull / linux-vulkan-bionic-py3.7-clang9 / build Unknown 🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@facebook-github-bot facebook-github-bot added the oncall: jit Add this issue/PR to JIT oncall triage queue label Mar 22, 2022
@malfet
Copy link
Contributor

malfet commented Mar 22, 2022

@sanchitintel to make review easier, can you simply cherry-picked landed commit into the branch and then apply any other changes on top of that?

@sanchitintel
Copy link
Collaborator Author

sanchitintel commented Mar 22, 2022

to make review easier, can you simply cherry-picked landed commit into the branch and then apply any other changes on top of that?

Sorry @malfet, please clarify which landed commit you're referring to.
Please confirm if you mean first rebasing PR #68111 with the master branch, and then adding this commit to fix the lite-interpreter build. Thanks!

@malfet
Copy link
Contributor

malfet commented Mar 22, 2022

This one: cd17683

chunyuan-w and others added 2 commits March 22, 2022 14:37
Summary:
## Description
Preview4 PR of this [RFC](pytorch#49444).

On the basis of pytorch#50256, the below improvements are included:

- The [preview4 release branch](https://github.com/oneapi-src/oneDNN/releases/tag/graph-v0.4.1) of the oneDNN Graph API is used
- The fuser now works with the profiling graph executor. We have inserted type check nodes to guard the profiled tensor properties.

### User API:
The optimization pass is disabled by default. Users could enable it by:
```
torch.jit.enable_onednn_fusion(True)
```

### Performance:
[pytorch/benchmark](https://github.com/pytorch/benchmark) tool is used to compare the performance:
- SkyLake 8180 (1 socket of 28 cores):

  ![image](https://user-images.githubusercontent.com/65992142/151162305-05e44425-a24e-4d5e-94e1-743b40b87a8c.png)

- SkyLake 8180 (single thread):

  ![image](https://user-images.githubusercontent.com/65992142/151162528-69f90b79-d08d-46b8-8775-d80a6ccbce8a.png)
 \* By mapping hardswish to oneDNN Graph, it’s 8% faster than PyTorch JIT (NNC + OFI)
  \** We expect performance gain after mapping transpose, contiguous & view to oneDNN graph ops

### Directory structure of the integration code
Fuser-related code are placed under:
```
torch/csrc/jit/codegen/onednn/
```

Optimization pass registration is done in:
```
torch/csrc/jit/passes/onednn_graph_fuser.h
```

CMake for the integration code is:
```
caffe2/CMakeLists.txt
```

## Limitations

- In this PR, we have only supported the optimization on Linux platform. The support on Windows and MacOS will be enabled as the next step.
- We have only optimized the inference use case.

Pull Request resolved: pytorch#68111

Reviewed By: eellison

Differential Revision: D34584878

Pulled By: malfet

fbshipit-source-id: ce817aa8cc9052ee9ed930c9cf66be83449e61a4
@sanchitintel sanchitintel force-pushed the onednn-graph-preview4 branch from 7b7dbfc to bc4739a Compare March 22, 2022 21:47
@sanchitintel
Copy link
Collaborator Author

sanchitintel commented Mar 22, 2022

Windows build failed while compiling a lite interpreter file (test_jit/CMakeFiles/test_jit.dir/test_lite_interpreter.cpp.obj), but seems to have failed due to an unrelated cause -

image

Will rebase later to check if the issue got fixed.
Similar failures in other PRs, such as #74586.

@sanchitintel
Copy link
Collaborator Author

Closing & reopening as #74596. Thanks!

@sanchitintel
Copy link
Collaborator Author

sanchitintel commented Mar 23, 2022

Somehow CI is not running on #74596. GitHub Actions outage is over

@sanchitintel sanchitintel reopened this Mar 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed oncall: jit Add this issue/PR to JIT oncall triage queue open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants