Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC - Creating an openxla-nvgpu project #71

Closed
stellaraccident opened this issue Mar 28, 2023 · 28 comments · Fixed by #81
Closed

RFC - Creating an openxla-nvgpu project #71

stellaraccident opened this issue Mar 28, 2023 · 28 comments · Fixed by #81

Comments

@stellaraccident
Copy link
Contributor

stellaraccident commented Mar 28, 2023

Now that RFC - Proposal to Build IREE Compiler Plugin Mechanism has been implemented and is minimally working with both in-tree and an early adopter out of tree implementation, we are ready to propose the official creation of a new openxla-nvgpu project to house:

  • Compiler plugins aimed at vendor specific integrations for NVIDIA GPUs.
  • Runtime extensions for advanced use cases of NVIDIA parts.
  • Project scaffolding and development flow tooling.

Since this will be the first plugin project of note, we expect that to a certain extent, it will co-evolve with the mechanisms.

Proposed Directory Structure

openxla-nvgpu will have a similar directory layout to IREE, upon which it depends:

compiler/
  src/
    openxla_nvgpu/
      Dialects/
runtime/
  src/
    openxla_nvgpu/
build_tools/

The build setup will use bazel_to_cmake for consistency and interop with the upstream IREE project (now that iree-org/iree#12765 has taken the steps to allow it to be used out of tree). The corresponding IREE build macros will be extended as needed.

Dependencies

This project depends directly on:

It transitively depends on:

  • LLVM/MLIR via IREE (will be disaggregated as a standalone dep at a future time to be determined).
  • xla via openxla-pjrt-plugin for plugin and runtime support.
  • stablehlo via IREE.

Releases

In the very short term, this project will not produce independent releases and will aim to be usable by project developers who are willing to build it from HEAD.

In the longer term, we would like to introduce a new top-level openxla-compiler releaser project which is responsible for packaging the platform (IREE) with stable versions of all supported vendor compiler plugins and making available as a standalone binary release. Such a project would eventually depend on this one and would effectively be the plugin-aggregated release of the present day iree-compiler packages (which will continue to be released as the "vanilla" platform without out-of-kind platform specific dependencies).

Also in the longer term, as the PJRT plugin support evolves, we anticipate releasing openxla-nvgpu-pjrt binary packages that can be used to interface NVIDIA GPUs to supported ML frameworks via pip install.

Versioning and CI

Adding this top-level project pushes us firmly into a "many-repo" layout for OpenXLA projects. This will be further reinforced as IREE's dependencies and build tools are disaggregated over time and the top-level releaser projects are established.

As part of this, we will introduce a side by side workspace layout where dependencies are found relative to each other based on parent directory. Example:

iree/
xla/
openxla-pjrt-plugin/
openxla-nvgpu/
openxla-compiler-releaser/

Such a layout will be called an "OpenXLA workspace" and we will provide a sync script and CI tooling to help manage it. Each project will pin to green release tags in its parent (or other form of stable commit tracking) by maintaining a local metadata file of its openxla dependencies. The sync script will be both a simple way for developers to track known-good sync points for the workspace and for CIs to advance. There will be a CI bot which advances dependent projects to next stable sync points automatically. We expect that for projects that already have a strong release cadence like IREE, this will update pins to new nightly releases, and others will cascade from at-head commits.

This process of versioning will be developed over time with an eye towards being the one way to manage openxla project dependencies. It will likely be somewhat manual to start with. This will be derived in spirit from the PJRT plugin sync script and enhanced to provide better release tracking and version bump capabilities.

Benchmarking

Benchmarks of the NVIDIA toolchain will be largely inherited and leveraged from the IREE project but run independently so as to provide a continuous view of the performance deltas and characteristics of the platform-independent upstream and the vendor-specific downstream.

Next steps

As a very next step, the openxla-nvgpu project will be bootsrapped in the iree-samples repository. It will be relocated, with history, to a new git repository once this RFC has matriculated.

Project Ownership

The project will be set up as a collaboration between Google and NVIDIA, and per OpenXLA governance, will share maintainer responsibility between contributors from both companies with the goal of NVIDIA engineers taking on core maintainer responsibility as the project bootstraps and evolves.

@pjannaty
Copy link

Fantastic! Looking forward to the collaboration.

@pjannaty
Copy link

Will this sit directly under https://github.com/openxla as in openxla/openxla-nvgpu?

@stellaraccident
Copy link
Contributor Author

Will this sit directly under https://github.com/openxla as in openxla/openxla-nvgpu?

Yes, that is what I'm proposing.

@pjannaty
Copy link

cc @nluehr

@mjsML
Copy link
Member

mjsML commented Mar 28, 2023

Does that translate to “write access”?

@stellaraccident
Copy link
Contributor Author

Does that translate to “write access”?

Yes (which we already do on these repos for other contributors), but I tried to phrase it as more general. Being a "component maintainer" on OpenXLA parlance grants some other privileges in terms of overall openxla direction/management.

@ezhulenev
Copy link
Member

What will be the C++ namespace for all the new code in this project? ::xla::...?

@stellaraccident
Copy link
Contributor Author

What will be the C++ namespace for all the new code in this project? ::xla::...?

Not sure I would speak to "all of the new code", but historically, dialects and components of IREE take a namespace like ::mlir::iree_compiler::IREE::<COMPONENT>, which nests well by letting everything resolve components by IREE::. For the record, I don't love that it is all rooted under mlir but that has been that way for a long time and could be cleaned up.

If not extending that namespacing scheme, I'd encourage at least something similar: ::openxla_compiler::XLA. I've deviated from this a couple of times in the past and always regretted it and come back to normalize.

@joker-eph
Copy link
Contributor

Proposed Directory Structure

openxla-nvgpu will have a similar directory layout to IREE, upon which it depends:

compiler/
  src/
    openxla_nvgpu/
      Dialects/
runtime/
  src/
    openxla_nvgpu/
build_tools/

I'm curious why the sub-directory names are repeating "openxla"? That is why compiler/src/openxla_nvgpu instead of compiler/src/nvgpu?
Even further, if the repository itself is all about nvgpu, why repeating it at all? What else will be inside compiler/src in this repo?

@stellaraccident
Copy link
Contributor Author

stellaraccident commented Mar 30, 2023

It is just following the IREE convention of the include directory being rooted at the src/ and wanting to arrive at fully qualified include paths (i.e. `#include "openxla_nvgpu/...").

I've more or less resigned myself to the fact that the least bad thing is to repeat yourself exactly once in service of having globally unique include paths.

@ezhulenev
Copy link
Member

I'm not a big fan of mlir top level namespace, and iree_compiler as nvgu really just a project on top of IREE compiler, and I don't like openxla_compiler namespace, and I don't have any suggestions :)

What if I want to write custom VM module under runtime/src/openxla_nvgpu? In IREE it's mostly under iree and iree::vm namespace, should it be ::openxla or regular ::xla here?

@joker-eph
Copy link
Contributor

Oh I missed the point about the include directory convention, makes sense!
(I believe this convention is in place in mlir-hlo as well, I remember a rationale doc for the path structure, I think @burmako originated it, I don't know if it is public)

@stellaraccident
Copy link
Contributor Author

I'm not a big fan of mlir top level namespace, and iree_compiler as nvgu really just a project on top of IREE compiler, and I don't like openxla_compiler namespace, and I don't have any suggestions :)

What if I want to write custom VM module under runtime/src/openxla_nvgpu? In IREE it's mostly under iree and iree::vm namespace, should it be ::openxla or regular ::xla here?

Just throwing things out there... ::openxla::nvgpu? We're going to regret whatever we choose. Might as well at least not start with a namespace that is already used.

@ezhulenev
Copy link
Member

I'd go with ::openxla, ::openxla::compiler, ::openxla::runtime, etc...? And not mention nvgpu at all. Do we see a single project depending on multiple "openxla compilers", e.g. openxla-nvgpu and openxla-amdgpu? Or link into single binary?

@stellaraccident
Copy link
Contributor Author

I'm game to try it. Like I said, I'm pretty sure that what we pick, we come back in a few months with some code written and apply a bit of sed, but I do think we want globally unique: there are many use cases where these will be linked together for both recommended and unrecommended reasons. Let's not set ourselves up for accidental name collisions.

Also, with C++17, the cost of namespaces (in terms of keystrokes) is a lot lower.

@joker-eph
Copy link
Contributor

I would be concerned with plugins and target specific components redefining symbols in shared namespace: I'm not sure what the benefits of eliding the target from the namespace buys us in practice?
(nesting compilers under ::mlir (as in ::mlir::mhlo:: for example) is nice in that you have an implicit using namespace mlir; (maybe just a workaround for the google style banning using namespace?), but we're still heavily using a specific sub-namespace to compartimentalize).

@stellaraccident
Copy link
Contributor Author

maybe just a workaround for the google style banning using namespace

(ftr - we're not exporting such legacy rules to our new OSS projects)

@ezhulenev
Copy link
Member

Another directory naming question, why compiler/src/openxla_nvgpu and not compiler/src/openxla/nvgpu?

Will we also have some kind of openxla-base (util, ...) for libraries shared between different "backends", or the plan to keep shared code in iree repo?

Will we depend on absl/tsl? E.g. logging, Status, StatusOr inside nvgpu compiler/runtime? Or in compiler use LLVM logging (almost non existent), and in runtime use IREE logging (no idea what's the status).

@ezhulenev
Copy link
Member

What will be the license? Copyright 2023 The OpenXLA Authors ... Licensed under the Apache License v2.0 with LLVM Exceptions.

@stellaraccident
Copy link
Contributor Author

Will we also have some kind of openxla-base (util, ...) for libraries shared between different "backends", or the plan to keep shared code in iree repo?

That is being discussed, but we are biasing towards getting moving at the moment over busting out dependencies. We have some work to do to get the dev/versioning workflow going for the level of things we have, and I'd rather get some more mileage on that before we go too crazy with a lot of repositories.

Will we depend on absl/tsl? E.g. logging, Status, StatusOr inside nvgpu compiler/runtime? Or in compiler use LLVM logging (almost non existent), and in runtime use IREE logging (no idea what's the status).

The further "core-ward" we go, we have no plan to depend on absl/tsl, and I would be somewhat resistant to doing so because they have both proven to be problematic (so much so that we excised after thinking "how bad could it be?").

Concrete thoughts...

I don't think that we should be mixing universes in the compiler code and need to "build up" from LLVM vs grafting other base libraries.

The runtime code for nvgpu has some more give to it from a dependency standpoint, but for the level of things expected to be in there, I would like to avoid the complexity that comes from taking complicated deps if possible.

Some of this stuff is preference and some has proven to be more trouble than it is worth in the past... The hard line that we can't cross in this repo is that dependencies must be thin and must have a well supported CMake build.

@stellaraccident
Copy link
Contributor Author

Another directory naming question, why compiler/src/openxla_nvgpu and not compiler/src/openxla/nvgpu?

I don't have a preference.

@sherhut
Copy link
Contributor

sherhut commented Mar 31, 2023

Newer waste a good opportunity to bike-shed 😄

Logging, as brought up by @ezhulenev, caught my eye. For the compiler, I would suggest we try to use the diagnostic handler infrastructure as much as possible and log warnings/errors there. That will force us to provide messages with good context.
For "trace what this does" debugging use cases, I agree that LLVM-DEBUG is not great but will work for now. We will need to replace this later with something that has severity and other bells and whistles but that can be done. I am used to the VLOG interface and we could have a shim that maps to LLVM-DEBUG if we decide to care already. Maybe IREE wants to provide this akin to what tsl does, so that different integrations can swap in what fits their needs. Depending on tsl just for that seems a bit heavy.

Regarding namespaces: I agree with @stellaraccident: Lets bias on progress rather than perfect choice for now. Having said that, I personally would use openxla::compiler::nvgpu for the CUDA compiler. We will have more compilers sooner than later and having them in different namespaces helps my mental model of where code belongs. Might be xla bias and I have no strong opinion. Just some 🚲 🛖 🖌️

Excited to see this project spin up!

@stellaraccident
Copy link
Contributor Author

This has been a heavy overhead week for me but I should finally get some coding time this afternoon, and since I can probably bootstrap the project somewhat efficiently, I'll take a stab at that. As noted, I'll stage it in an iree-samples directory first and will then hand off to someone Google-side to create the repo (which requires a bit of red tape).

@stellaraccident
Copy link
Contributor Author

All right... the above two commits seem to get me most of the way there. Things build, etc. Was a bit of a slog.

stellaraccident pushed a commit to iree-org/iree that referenced this issue Apr 3, 2023
…12888)

* Adds CMake scoped IREE_PACKAGE_ROOT_DIR and IREE_PACKAGE_ROOT_PREFIX
to replace hard-coded path to namespace logic in iree_package_ns (and
uses within IREE/removes the special casing).
* Adds support for `BAZEL_TO_CMAKE_PRESERVES_ALL_CONTENT_ABOVE_THIS_LINE
` to bazel_to_cmake. Been carrying this patch for a while and ended up
needing it.
* Further generalizes bazel_to_cmake target resolution so that it is
customizable as needed out of tree.
* Moves the `iree::runtime::src::defs` target to `iree::defs` and puts
it in the right place in the tree to avoid special casing.
* Ditto but for `iree::compiler::src::defs`
* Adds a bit more logging to `_DEBUG_IREE_PACKAGE_NAME` mode.
* Makes iree_tablegen_library consult a scoped
`IREE_COMPILER_TABLEGEN_INCLUDE_DIRS` var for additional include
directories (makes it possible to use out of tree).
* Adds `NVPTXDesc` and `NVPTXInfo` targets to HAL_Target_CUDA. No idea
why this was triggering for me but was getting undefined deps. Must have
been coming in elsewhere in a more full featured build.
* Fixes iree-opt initialization sequence with respect to command line
options. Also fixed the test which should have been verifying this.
* Fixed pytype issue in bazel_to_cmake that could theoretically happen.

Fixes build issues related to the out of tree build for
openxla/community#71
@ezhulenev
Copy link
Member

And initial CuDNN custom module that does nothing related to CuDNN yet: https://github.com/iree-org/iree-samples/pull/123/files

stellaraccident added a commit to iree-org/iree-experimental that referenced this issue Apr 4, 2023
@stellaraccident
Copy link
Contributor Author

Both of the initial commits have landed.

@theadactyl Can we get someone with admin rights on the org to create the openxla-nvgpu repo? It should be populated with https://github.com/iree-org/iree-samples/tree/main/openxla-nvgpu

@GMNGeoffrey
Copy link
Contributor

Coming in late and I think just agreeing with what's already been decided, but strong positions on: not taking an absl or tsl dep in the compiler and scoping the [sub]namespace to the project (so not just "iree" or "openxla" or "compiler")

I do think that the Google (Titus) advice on not having deeply nested namespaces is pretty good and not just a weird Google thing: https://abseil.io/tips/130. Every level of nesting gives us a fun new place for collisions. So I would vote for something like openxla_nvgpu as a top-level namespace. Probably nesting it under mlir would be fine, since we shouldn't define things that conflict with MLIR anyway. I think there was at some point a limitation of ODS that meant you had to define things in the mlir namespace, but I also suspect that this is a relic of the Google C++ style guide ban on using namespace mlir, which while not completely misguided IMO, seems excessively strict. using namespace mlir seems better than sneakily getting that by just defining everything nested under that namespace. I guess the difference is that you can only do the latter for one namespace. I think we can probably limit ourselves to using namespace mlir and using namespace llvm and then not have to nest our namespaces, unless there's some other reason to do so.

@ezhulenev
Copy link
Member

I don't like openxla_nvgpu namespace, because presumably we'll have some kind of openxla-base shared by all openxla compilers, and it's nice to share a namespace to skip qualifying all imports.

+HUGE to using namespace but only for small number of things (mlir and llvm)

GMNGeoffrey pushed a commit to iree-org/iree-nvgpu that referenced this issue Apr 4, 2023
jpienaar pushed a commit to iree-org/iree that referenced this issue May 1, 2023
…12888)

* Adds CMake scoped IREE_PACKAGE_ROOT_DIR and IREE_PACKAGE_ROOT_PREFIX
to replace hard-coded path to namespace logic in iree_package_ns (and
uses within IREE/removes the special casing).
* Adds support for `BAZEL_TO_CMAKE_PRESERVES_ALL_CONTENT_ABOVE_THIS_LINE
` to bazel_to_cmake. Been carrying this patch for a while and ended up
needing it.
* Further generalizes bazel_to_cmake target resolution so that it is
customizable as needed out of tree.
* Moves the `iree::runtime::src::defs` target to `iree::defs` and puts
it in the right place in the tree to avoid special casing.
* Ditto but for `iree::compiler::src::defs`
* Adds a bit more logging to `_DEBUG_IREE_PACKAGE_NAME` mode.
* Makes iree_tablegen_library consult a scoped
`IREE_COMPILER_TABLEGEN_INCLUDE_DIRS` var for additional include
directories (makes it possible to use out of tree).
* Adds `NVPTXDesc` and `NVPTXInfo` targets to HAL_Target_CUDA. No idea
why this was triggering for me but was getting undefined deps. Must have
been coming in elsewhere in a more full featured build.
* Fixes iree-opt initialization sequence with respect to command line
options. Also fixed the test which should have been verifying this.
* Fixed pytype issue in bazel_to_cmake that could theoretically happen.

Fixes build issues related to the out of tree build for
openxla/community#71
stellaraccident added a commit that referenced this issue May 25, 2023
Both repositories have long been created and are their own top level modules now.

Fixes #72
Fixes #71
NatashaKnk pushed a commit to NatashaKnk/iree that referenced this issue Jul 6, 2023
…ree-org#12888)

* Adds CMake scoped IREE_PACKAGE_ROOT_DIR and IREE_PACKAGE_ROOT_PREFIX
to replace hard-coded path to namespace logic in iree_package_ns (and
uses within IREE/removes the special casing).
* Adds support for `BAZEL_TO_CMAKE_PRESERVES_ALL_CONTENT_ABOVE_THIS_LINE
` to bazel_to_cmake. Been carrying this patch for a while and ended up
needing it.
* Further generalizes bazel_to_cmake target resolution so that it is
customizable as needed out of tree.
* Moves the `iree::runtime::src::defs` target to `iree::defs` and puts
it in the right place in the tree to avoid special casing.
* Ditto but for `iree::compiler::src::defs`
* Adds a bit more logging to `_DEBUG_IREE_PACKAGE_NAME` mode.
* Makes iree_tablegen_library consult a scoped
`IREE_COMPILER_TABLEGEN_INCLUDE_DIRS` var for additional include
directories (makes it possible to use out of tree).
* Adds `NVPTXDesc` and `NVPTXInfo` targets to HAL_Target_CUDA. No idea
why this was triggering for me but was getting undefined deps. Must have
been coming in elsewhere in a more full featured build.
* Fixes iree-opt initialization sequence with respect to command line
options. Also fixed the test which should have been verifying this.
* Fixed pytype issue in bazel_to_cmake that could theoretically happen.

Fixes build issues related to the out of tree build for
openxla/community#71
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants