Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start adding Python bindings for RTL dialect. #767

Merged
merged 12 commits into from
Mar 24, 2021
Merged

Conversation

mikeurbach
Copy link
Contributor

Sharing work in progress here, but I can't actually get this to work. In the test case, I can build an RTL constant op with the line:

rtl.ConstantOp(i32, a50)

But when I go to print out the operations, I get this error:

dialect has no registered attribute printing hook
UNREACHABLE executed at /home/mikeurbach/alloy/external/llvm-project/mlir/include/mlir/IR/Dialect.h:89!
Aborted (core dumped)

Some notes:

  • I also tried going through ctx.dialects.rtl.ConstantOp to build the op, but that leads to the same error
  • If I remove the line to build an rtl.ConstantOp, the test passes and prints out the std.ConstantOp no problem
  • If I remove the call to circt.register_dialects, and instead say ctx.allow_unregistered_dialect = True, it will correctly print out the rtl.ConstantOp in generic form
  • I was really confused by the error and couldn't figure out how to get a backtrace, so I hacked up Dialect.h to print the dialect namespace in that assertion... and it printed the empty string, which I think indicates printAttribute was invoked on the Standard dialect?
  • I've also tried registering the dialects on the context with a call like this: https://github.com/llvm/mlir-npcomp/blob/main/lib/CAPI/Registration.cpp#L16, or like this: https://github.com/llvm/llvm-project/blob/main/mlir/lib/CAPI/Registration/Registration.cpp#L15, directly inside the register_dialects method in CIRCTModule. This leads to the same error, and the registration calls in the current revision seem more in line with layering Python bindings on the CAPI, so that's what I've been testing.

I'm pretty stumped about what's going on. Since the error literally mentions dialects and registration, I'm pattern matching this as a dialect registration issue. But, I'm able to build ops no problem, and it only seems to be an issue when I try to print them. Not sure if that is any clue...

I've been trying to see what the Standard dialect does that our RTL dialect doesn't. I think I've got all the bindings set up the same, so I'm wondering if the handling in the MLIR Python bindings is subtly doing something differently for in-tree dialects versus ones we define. It seems like the MLIR Python bindings always just load the context with all the in-tree dialects on construction: https://github.com/llvm/llvm-project/blob/main/mlir/lib/Bindings/Python/IRModules.cpp#L460

@GeorgeLyon not sure if you've seen anything like this, but does it ring a bell?
@stellaraccident I believe this should be a minimal Python dialect bindings setup. If you have any free cycles, do you mind giving a once over to sanity check this?

@stellaraccident
Copy link
Contributor

stellaraccident commented Mar 14, 2021

I suspect this may be the dynamic linking "odr" bug rearing its head again. The symptom is when you get a downstream effect of the TypeID for a dialect not matching depending on which module is doing the query. The visible effect is often falling off the end of an unreachable block. If that is the case, it won't make any sense from a looking at the code perspective.

Are you building with BUILD_SHARED_LIBS or with the DYLIB options (ie. libMLIR)? In the former, I've been looking for a solid repro and suspect relative rpath canonicalization issues causing the dynamic linker to think that two instances of the same object are different. In the latter, it is fairly broken in a few subtle ways.

You may want to try a variant of the hack in npcomp: https://github.com/llvm/mlir-npcomp/blob/4fd9b4afb526692ec4943633019eaa864315240b/python/npcomp/__init__.py#L1

There are some relatively long term fixes I'm working on upstream to help the dynamic linking situation. If enabling global ld loading fixes it, it is a good sign this is the same vague linkage issue as I'm referring to. I can also pull your branch and work on it (more evidence may help me further isolate the issue). I've not built circt before: is it pretty self explanatory?

@mikeurbach
Copy link
Contributor Author

I've been using the DYLIB options, but I can try the BUILD_SHARED_LIBS approach and see what happens.

I missed that hack in npcomp, I will try that out as well.

If you want to build circt, we try to keep this up to date: https://github.com/llvm/circt/blob/main/docs/GettingStarted.md, and I hope it's unsurprising. I'm currently building mlir with circt and npcomp both as external projects, but I've also used the documented approach fairly recently.

@stellaraccident
Copy link
Contributor

stellaraccident commented Mar 14, 2021 via email

@mikeurbach
Copy link
Contributor Author

I think that makes sense. Thanks for the insight.

I just built this with BUILD_SHARED_LIBS instead, and that works. Hope this is a good data point... I'm still catching up on these issues.

@stellaraccident
Copy link
Contributor

I think that makes sense. Thanks for the insight.

I just built this with BUILD_SHARED_LIBS instead, and that works. Hope this is a good data point... I'm still catching up on these issues.

Thanks, yes, that is a good data point. The rework to shared linking upstream basically builds the aggregate dylibs using the same mechanism as BUILD_SHARED_LIBS, which maintains the dependency graph appropriately (libLLVM/libMLIR does not).

I've been meaning to also switch npcomp to this mode. Good to know it works.

This adds a -shared-libs option to llhd-sim that is analogous to the
same option for mlir-cpu-runner. If specified, those libraries are
passed to the ExecutionEngine to dynamically load and link.
@teqdruid
Copy link
Contributor

I had to install numpy to get llvm to cmake configure with your build-llvm.sh change. That's fine, but should be noted somewhere. After that, I can compile and the test runs! (Why is this an integration test? It is a targeted test of basic functionality.) I'm compiling both CIRCT and LLVM with BUILD_SHARED_LIBS.

Really exciting!

@mikeurbach
Copy link
Contributor Author

Thanks for taking a look.

The build-llvm change should go away before merging this, or at least that flag shouldn't always be set in there and numpy should only be required if you want to build python bindings. I'll make sure to document that dependency regardless.

The integration test is really a placeholder. This is the simplest test to check that the RTL dialect is registered. Eventually I'd like to be creating and wiring together rtl.modules and whatnot, and then call export verilog. In general, we could move these to be regular tests, but I thought we were just going to run them nightly for now.

Just creating an rtl.module is pretty clumsy at the moment, and I have a separate patch to add a builder with a small Python extension. I was thinking one thing at a time (at least until the integration test actually runs), and then I'll follow up with a better test as more support comes in.

@mikeurbach
Copy link
Contributor Author

FYI sorry for the apparent lack of progress here. This does work! But it currently doesn't pass the integration tests: https://github.com/llvm/circt/runs/2167346388?check_suite_focus=true. Something about where the MLIR python sources get installed versus the CIRCT python sources. I am hoping to get this wrapped up here today or tomorrow.

@mikeurbach mikeurbach force-pushed the python-rtl-bindings branch 7 times, most recently from b4f30b5 to 223e69b Compare March 23, 2021 04:02
@mikeurbach mikeurbach force-pushed the python-rtl-bindings branch 10 times, most recently from ebfab74 to 8e98e7a Compare March 23, 2021 06:50
@mikeurbach
Copy link
Contributor Author

mikeurbach commented Mar 23, 2021

I can run the Python integration tests locally, and run them with the Docker-based local integration testing setup.

However, in GitHub actions, they continue to fail. It looks like the MLIR Python bindings can't be found.

They fail when trying to import an MLIR Python module: https://github.com/llvm/circt/runs/2172901557?check_suite_focus=true#step:11:124

I made sure MLIR was building with Python bindings enabled, and busted the cache key in this branch so that Python module and others would be installed. It sure looks like the Cache LLVM action installed it: https://github.com/llvm/circt/runs/2167346388?check_suite_focus=true#step:6:8393

But the test continues to fail with that error. I added ls -alR llvm/install to the GitHub workflow to try and see what was going on, and sure enough, there is no python directory in there: https://github.com/llvm/circt/runs/2172901557?check_suite_focus=true#step:8:26

What gives? It seems like our Cache LLVM step is caching the whole llvm directory, so I would expect the Python bits to be installed. This branch uses a different cache key for nightly integration tests versus the regular tests, so I would think there is no conflict there. @teqdruid @youngar I think you guys have the most experience with this workflow, do you have any idea why the python directory is missing from llvm/install after the cache is restored?

EDIT: I found that in these workflow runs, LLVM is being restored from a different cache key than what saved it: https://github.com/llvm/circt/runs/2177073441?check_suite_focus=true#step:6:11

Sorry for the noise... I am updating that.

@teqdruid
Copy link
Contributor

It's always the dumb stuff which gets ya. My experience is that when you hit this sort of thing ("this should work, damnit!") its best to take a break then come back to it fresh... Tougher for me to do than it sounds.

I think we cache too much. We should either cache the build or install directory. I've had success building locally against the build/ directory but the install directory would be smaller and should have everything we need. The Windows build cache only caches build/ and install/ (since restore on Windows has trouble overwriting files). I've been too busy/lazy to make this change for the Linux builds.

@mikeurbach
Copy link
Contributor Author

its best to take a break then come back to it fresh

yep a cup of coffee and fresh eyes was all it took.

I think we cache too much. We should either cache the build or install directory.

I think it will work against the install directory. I am going to open a separate PR for some general GitHub workflow improvements. In my wee hours testing, I figured out these CMake hints aren't actually doing anything:

-DMLIR_DIR=../llvm/install/lib/cmake/mlir/ \
-DLLVM_DIR=../llvm/install/lib/cmake/llvm/ \
. The ../ is not needed. I believe that is the root cause of why llvm-lit wasn't being found, and why we had to stick on Ubuntu 18.04 for now in this workflow. When I go update that, I can also change it to just cache the install directory.

@teqdruid
Copy link
Contributor

Thanks for dealing with the build flows!

@mikeurbach
Copy link
Contributor Author

Moving the build changes unrelated to Python bindings over here: #810.

@mikeurbach
Copy link
Contributor Author

@teqdruid I think this is ready to go. Nightly tests pass on this branch, and for the legs of the configuration matrix where BUILD_SHARED_LIBS is ON, we run the Python tests (this is another reason they are integration tests--I didn't want to impose build restrictions on our main PR check). Do you mind taking another look when you get a chance?

Regarding the NumPy dependency and the change in build-llvm.sh: I documented this (by referencing the upstream docs), and made the build-llvm.sh change generic. People should be able to keep running it as before to the same effect, and now you can also pass in any extra CMake definitions (like -DCIRCT_BINDINGS_PYTHON_ENABLED=ON). Curious what you think about this, I took inspiration from the mlir-npcomp script here. I'm no bash expert, but this seemed decent. I tried it out locally but didn't think it was worth re-building the nightly workflow to test it.

Since the "integration" test is so bare, here some plans for the next steps, which I'd submit as separate patches:

  1. add a Python extension to customize RTL module building, and add that to the test
  2. add bindings to the SV dialect. this should be nice and clean after some of the boilerplate in this PR, and could serve as an example for how to add bindings to other dialects (as we add more C APIs for them)
  3. add bindings to ExportVerilog, including tests

That's about as far as we can get in the "add Python bindings to the C API" department, then we can talk about pass managers and actually driving passes through Python.

Copy link
Contributor

@teqdruid teqdruid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for pushing this through!

@@ -0,0 +1,15 @@
//===-- RTLOps.td - Entry point for RTLOps bindings --------*- tablegen -*-===//
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this file going to grow for the RTL dialect? If not, we will probably want to stuff all the dialects in here. No need to change now, just food for thought.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may grow a bit, but you are right, this is probably about it. However, unless something changes upstream, we will probably want to have several small files like this. This is the PYTHON_BINDING_TD_FILE mentioned here: https://mlir.llvm.org/docs/Bindings/Python/#integration-with-ods. I would expect one small one per dialect, similar to the .td files upstream: https://github.com/llvm/llvm-project/tree/main/mlir/lib/Bindings/Python

@mikeurbach mikeurbach merged commit 9b2331e into main Mar 24, 2021
@mikeurbach mikeurbach deleted the python-rtl-bindings branch March 24, 2021 02:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants