Skip to content

Conversation

jjsjann123
Copy link
Collaborator

@jjsjann123 jjsjann123 commented Nov 24, 2022

This PR is the first step towards refactors the build for nvfuser in order to have the coegen being a standalone library.

Contents inside this PR:

  1. nvfuser code base has been moved to ./nvfuser, from ./torch/csrc/jit/codegen/cuda/, except for registration code for integration (interface.h/interface.cpp)
  2. splits the build system so nvfuser is generating its own .so files. Currently there are:
    • libnvfuser_codegen.so, which contains the integration, codegen and runtime system of nvfuser
    • nvfuser.so, which is nvfuser's python API via pybind. Python frontend is now exposed via nvfuser._C.XXX instead of torch._C._nvfuser
  3. nvfuser cpp tests is currently being compiled into nvfuser_tests
  4. cmake is refactored so that:
    • nvfuser now has its own CMakeLists.txt, which is under torch/csrc/jit/codegen/cuda/.
    • nvfuser backend code is not compiled inside libtorch_cuda_xxx any more
    • nvfuser is added as a subdirectory under ./CMakeLists.txt at the very end after torch is built.
    • since nvfuser has dependency on torch, the registration of nvfuser at runtime is done via dlopen (at::DynamicLibrary). This avoids circular dependency in cmake, which will be a nightmare to handle. For details, look at torch/csrc/jit/codegen/cuda/interface.cpp::LoadingNvfuserLibrary

Future work that's scoped in following PR:

  • Currently since nvfuser codegen has dependency on torch, we need to refactor that out so we can move nvfuser into a submodule and not rely on dlopen to load the library. @malfet
  • Since we moved nvfuser into a cmake build, we effectively disabled bazel build for nvfuser. This could impact internal workload at Meta, so we need to put support back. cc'ing @vors

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @EikanWang @kevinstephano @mlazos @soumith @voznesenskym @yanboliang @penguinwu @anijain2305 @Guobing-Chen @chunyuan-w @zhuhaozhe @blzheng @Xia-Weiwen @wenzhe-nrv @jiayisunx @peterbell10 @desertfire

@pytorch-bot
Copy link

pytorch-bot bot commented Nov 24, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/89621

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 74e53e1:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: jit release notes category label Nov 24, 2022
@github-actions github-actions bot added the NNC label Nov 24, 2022
@jjsjann123 jjsjann123 added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 24, 2022
@facebook-github-bot
Copy link
Contributor

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jjsjann123
Copy link
Collaborator Author

Looks like I got a bad upstream commit. I'll keep an eye on the CI hud and grab a clean commit when it gets green again.

@davidberard98
Copy link
Contributor

recommend using the viable/strict branch instead of master next time to avoid that issue :)

@facebook-github-bot
Copy link
Contributor

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jjsjann123
Copy link
Collaborator Author

Looks like the build failure has been patched.. Sorry that I forgot to update c++14 flag... 😛

"torch/csrc/jit/codegen/cuda/runtime/warp.cu",
"torch/csrc/jit/codegen/cuda/runtime/warp_rocm.cu",
"torch/csrc/jit/codegen/cuda/runtime/welford.cu",
"nvfuser/runtime/array.cu",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jjsjann123 i think these need to be updated to third_party/nvfuser ?

@facebook-github-bot
Copy link
Contributor

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jjsjann123
Copy link
Collaborator Author

windows CI seems strangely flaky.

ModuleNotFoundError: No module named 'typing_extensions'

@jjsjann123
Copy link
Collaborator Author

hmmm. can't seem to get the flaky CI to pass... I'll keep trying.

@jjsjann123
Copy link
Collaborator Author

hmm. looks like upstream/viable/strict shows failing tests (inductor) in HUD log. I tried to grab one that at least looks green across the column. 🤞

@facebook-github-bot
Copy link
Contributor

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@jjsjann123
Copy link
Collaborator Author

I think I need to bump this for import/land, but HUD showing all later commits with some failures (even the commit pointed by pytorch/viable/strict).
I'll keep an eye out for that. In the mean time, feel free to ping me if there's any specific commits that you want me to pull @davidberard98

@facebook-github-bot
Copy link
Contributor

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@facebook-github-bot
Copy link
Contributor

@davidberard98 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@davidberard98 davidberard98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stamp so I can attempt to land internally (don't use pytorchbot to merge this)

@jjsjann123
Copy link
Collaborator Author

stamp so I can attempt to land internally (don't use pytorchbot to merge this)

So excited!!! 🥳

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jjsjann123
Copy link
Collaborator Author

🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: dynamo module: inductor module: nvfuser NNC open source release notes: jit release notes category release notes: quantization release notes category skip-pr-sanity-checks triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants