[NVFUSER] refactor nvfuser build #89621

jjsjann123 · 2022-11-24T09:21:55Z

This PR is the first step towards refactors the build for nvfuser in order to have the coegen being a standalone library.

Contents inside this PR:

nvfuser code base has been moved to ./nvfuser, from ./torch/csrc/jit/codegen/cuda/, except for registration code for integration (interface.h/interface.cpp)
splits the build system so nvfuser is generating its own .so files. Currently there are:
- libnvfuser_codegen.so, which contains the integration, codegen and runtime system of nvfuser
- nvfuser.so, which is nvfuser's python API via pybind. Python frontend is now exposed via nvfuser._C.XXX instead of torch._C._nvfuser
nvfuser cpp tests is currently being compiled into nvfuser_tests
cmake is refactored so that:
- nvfuser now has its own CMakeLists.txt, which is under torch/csrc/jit/codegen/cuda/.
- nvfuser backend code is not compiled inside libtorch_cuda_xxx any more
- nvfuser is added as a subdirectory under ./CMakeLists.txt at the very end after torch is built.
- since nvfuser has dependency on torch, the registration of nvfuser at runtime is done via dlopen (at::DynamicLibrary). This avoids circular dependency in cmake, which will be a nightmare to handle. For details, look at torch/csrc/jit/codegen/cuda/interface.cpp::LoadingNvfuserLibrary

Future work that's scoped in following PR:

Currently since nvfuser codegen has dependency on torch, we need to refactor that out so we can move nvfuser into a submodule and not rely on dlopen to load the library. @malfet
Since we moved nvfuser into a cmake build, we effectively disabled bazel build for nvfuser. This could impact internal workload at Meta, so we need to put support back. cc'ing @vors

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @EikanWang @kevinstephano @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @Guobing-Chen @chunyuan-w @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

pytorch-bot · 2022-11-24T09:21:57Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89621

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 74e53e1:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2023-01-17T21:53:26Z

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jjsjann123 · 2023-01-17T22:26:58Z

Looks like I got a bad upstream commit. I'll keep an eye on the CI hud and grab a clean commit when it gets green again.

davidberard98 · 2023-01-17T22:29:06Z

recommend using the viable/strict branch instead of master next time to avoid that issue :)

facebook-github-bot · 2023-01-18T18:26:24Z

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-01-18T20:57:13Z

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jjsjann123 · 2023-01-18T21:09:59Z

Looks like the build failure has been patched.. Sorry that I forgot to update c++14 flag... 😛

davidberard98 · 2023-01-18T23:55:19Z

build_variables.bzl

-    "torch/csrc/jit/codegen/cuda/runtime/warp.cu",
-    "torch/csrc/jit/codegen/cuda/runtime/warp_rocm.cu",
-    "torch/csrc/jit/codegen/cuda/runtime/welford.cu",
+    "nvfuser/runtime/array.cu",


@jjsjann123 i think these need to be updated to third_party/nvfuser ?

facebook-github-bot · 2023-01-19T01:12:13Z

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jjsjann123 · 2023-01-19T08:51:54Z

windows CI seems strangely flaky.

ModuleNotFoundError: No module named 'typing_extensions'

jjsjann123 · 2023-01-19T23:57:51Z

hmmm. can't seem to get the flaky CI to pass... I'll keep trying.

jjsjann123 · 2023-01-20T17:24:17Z

hmm. looks like upstream/viable/strict shows failing tests (inductor) in HUD log. I tried to grab one that at least looks green across the column. 🤞

facebook-github-bot · 2023-01-20T17:32:07Z

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

jjsjann123 · 2023-01-23T17:53:46Z

I think I need to bump this for import/land, but HUD showing all later commits with some failures (even the commit pointed by pytorch/viable/strict).
I'll keep an eye out for that. In the mean time, feel free to ping me if there's any specific commits that you want me to pull @davidberard98

facebook-github-bot · 2023-01-23T23:32:34Z

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2023-01-23T23:34:54Z

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

davidberard98

stamp so I can attempt to land internally (don't use pytorchbot to merge this)

jjsjann123 · 2023-01-25T18:57:26Z

stamp so I can attempt to land internally (don't use pytorchbot to merge this)

So excited!!! 🥳

facebook-github-bot · 2023-01-26T02:48:50Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2023-01-26T02:50:39Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

jjsjann123 · 2023-01-26T05:45:08Z

🥳

jjsjann123 and others added 9 commits November 21, 2022 16:39

stripping nvfuser codegen source

44964f1

added CMake files for nvfuser

7d79a9a

adding nvfuser lib

83c414e

adding tests

2eccd4f

adding pybind11

1e2112b

updating pybind build

28db632

fixing torch._C._nvfuser -> torch._C_nvfuser

86880c3

opt out of nvfuser build on non-cuda

eb236f6

fixing enable flag

e709a02

pytorch-bot bot added the release notes: jit release notes category label Nov 24, 2022

github-actions bot added the NNC label Nov 24, 2022

jjsjann123 added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 24, 2022

pytorchbot added the open source label Nov 24, 2022

jjsjann123 and others added 8 commits November 24, 2022 10:20

fixing cmake find packages

28741c0

fixing build from scratch

f741944

fixing signed/unsigned comparator

7eb8a51

hacky patch?!

557d9a1

fixing signed/unsigned comparator

17ae6dc

cmake attempt again

b72a618

patching cmake for python::python

153c449

fixing unsigned/signed comparator

1cb5e4b

jjsjann123 force-pushed the build branch from 058a264 to 1d43ca6 Compare November 28, 2022 21:53

updating cmake dependencies

cc4de1e

jjsjann123 force-pushed the build branch from 1d43ca6 to cc4de1e Compare November 28, 2022 22:15

jjsjann123 and others added 5 commits November 28, 2022 14:54

attempt at patching build

6b2b9c9

updating cache update on BUILD_NVFUSER

0416b6b

fixing signed/unsigned comparator

cd3d77f

fixing more signed/unsigned for tests

a73d7b1

Merge remote-tracking branch 'upstream/master' into HEAD

a962dfb

Merge remote-tracking branch 'upstream/viable/strict' into HEAD

76e9729

updating cpp standard

fdef2f8

davidberard98 reviewed Jan 18, 2023

View reviewed changes

updating nvfuser runtime file path in bzl rule

1b4d5cb

Merge commit 'f0e3c4929b7430e37dd96cda011a0ef72d3156d3' into HEAD

f130880

Merge remote-tracking branch 'origin/viable/strict' into HEAD

74e53e1

davidberard98 approved these changes Jan 25, 2023

View reviewed changes

pytorchmergebot added the Merged label Jan 26, 2023

pytorchmergebot closed this in c11b301 Jan 26, 2023

jjsjann123 deleted the build branch January 27, 2023 19:12

crcrpar mentioned this pull request Feb 9, 2023

Instance norm nvfuser NVIDIA/apex#1582

Draft

[NVFUSER] refactor nvfuser build #89621

[NVFUSER] refactor nvfuser build #89621

Uh oh!

Conversation

jjsjann123 commented Nov 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Nov 24, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89621

✅ No Failures

Uh oh!

facebook-github-bot commented Jan 17, 2023

Uh oh!

jjsjann123 commented Jan 17, 2023

Uh oh!

davidberard98 commented Jan 17, 2023

Uh oh!

facebook-github-bot commented Jan 18, 2023

Uh oh!

facebook-github-bot commented Jan 18, 2023

Uh oh!

jjsjann123 commented Jan 18, 2023

Uh oh!

davidberard98 Jan 18, 2023

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jan 19, 2023

Uh oh!

jjsjann123 commented Jan 19, 2023

Uh oh!

jjsjann123 commented Jan 19, 2023

Uh oh!

jjsjann123 commented Jan 20, 2023

Uh oh!

facebook-github-bot commented Jan 20, 2023

Uh oh!

jjsjann123 commented Jan 23, 2023

Uh oh!

facebook-github-bot commented Jan 23, 2023

Uh oh!

facebook-github-bot commented Jan 23, 2023

Uh oh!

davidberard98 left a comment

Choose a reason for hiding this comment

Uh oh!

jjsjann123 commented Jan 25, 2023

Uh oh!

facebook-github-bot commented Jan 26, 2023

Uh oh!

pytorchmergebot commented Jan 26, 2023

Merge started

Uh oh!

jjsjann123 commented Jan 26, 2023

Uh oh!

Uh oh!

jjsjann123 commented Nov 24, 2022 •

edited

Loading

pytorch-bot bot commented Nov 24, 2022 •

edited

Loading