Add python bindings for lighthouse #3

Groverkss · 2025-08-31T23:17:49Z

Creates a simple build system which creates a CAPI library and python bindings for lighthouse. Starts with only MLIR Core IR, MLIR Func and exports torch-mlir fx_importer/onnx_importer under lighthouse.extras.

I'm not an expert on CMake so this is just a starting point, someone with more knowledge of CMake can probably do a much better job.

CMakeLists.txt

makslevental · 2025-09-28T18:26:29Z

FYI you prolly wanna try this https://github.com/llvm/llvm-project/blob/main/mlir/examples/standalone/pyproject.toml

rolfmorel

Thanks @Groverkss !

Could you include some build instructions somewhere as well as the most basic of tests to demonstrate what's been built and should now be available for use, e.g. mlir bindings and which torch-mlir libs (from Python).

rolfmorel · 2025-10-02T17:24:50Z

include/lighthouse-c/Init.h

+extern "C" {
+#endif
+
+/// Initializes poseidon.


What's poseidon?

lib/CAPI/CMakeLists.txt

rolfmorel · 2025-10-02T17:42:53Z

python/CMakeLists.txt

+# seperation between the compiler and the runtime. The runtime bindings have no
+# reason to ship with the mlir python bindings.
+# TODO: Add an upstream cmake param for this vs having a global here.
+add_compile_definitions("MLIR_PYTHON_PACKAGE_PREFIX=lighthouse.")


Can we please just use the mlir. prefix and instead namespace everything specific to lighthouse as mlir.lighthouse? I did it in tpp-mlir, i.e. with the mlir.tpp namespace and downstream dialects just ending up in mlir.dialects: https://github.com/libxsmm/tpp-mlir/blob/main/python/CMakeLists.txt

That every project is renaming the mlir namespace along with its upstream bindings is atrocious, IMHO. (So is the need for downstreams to build the full upstream bindings just to provide minor extensions - To Be Fixed.)

Why would we use the mlir prefix here? MLIR is a toolkit, not a compiler. We are trying to build a compiler. At max, I would say lighthouse.mlir is a better prefix than this. We can't just go and take over mlir's namespace. I disagree with mlir.lighthouse.

That every project is renaming the mlir namespace along with its upstream bindings is atrocious, IMHO.

I think there is good reason to rename it. Let's say you have two projects on different versions of mlir both trying to work on the same namespace. How are you going to gurantee that they work together? That just sounds like a disaster.

This is not an imaginary scenario. Let's say you decide to ship 2 pytorch dynamo implementations, lighthouse-1 and lighthouse-2. They don't depend on each other at all, but you ship them under the namespace mlir.lighthouse1 and mlir.lighthouse2. Now, if your mlir things don't match up, it completly breaks. There is no dependency between these packages, but you still break.

So is the need for downstreams to build the full upstream bindings just to provide minor extensions - To Be Fixed.

I don't know what you are talking about here. Clearly in this pr the only bindings built are for func dialect and mlir core. I don't know what the "full upstream bindings" is. We need all of this.

rolfmorel · 2025-10-02T17:46:03Z

python/CMakeLists.txt

+      extras/fx_importer.py
+      extras/onnx_importer.py


Where are these files?

That also brings to mind if this these build files have been tested in any sort of way.

This is how the fx_importer works and is meant to work. torch-mlir exposes these bindings and they only depend on mlir core (which you bring from somewhere) and func dialect. When building your own aot.compile, you get a torch.export.ExportedProgram and send it down fx_importer.J

Similar thing for onnx.

The fx_importer sources assume that the mlir ir core stuff is in ..ir : https://github.com/llvm/torch-mlir/blob/main/python/torch_mlir/extras/fx_importer.py#L95

That also brings to mind if this these build files have been tested in any sort of way.

I need to add tests for these things, but i'm currently only working on this on weekends and whatever time i have. We don't have anything concrete setup, so i'm mainly trying to get things in and then build a aot.compile flow on top, and then work on testing. I can add tests for checking if things are correctly improted though.

rolfmorel · 2025-10-02T17:52:33Z

.gitmodules

Rather than using submodules, can we instead use files with specified commits?

TPP-MLIR has the following: https://github.com/libxsmm/tpp-mlir/blob/main/build_tools/llvm_version.txt

We have found this pattern to lead to more flexibility. Also with an eye to that we expect people to extend lighthouse in various ways, bringing in their own weird dependencies and how they choose to manage them.

Rather than using submodules, can we instead use files with specified commits?

Why? That's just .gitmodules with extra steps.

We have found this pattern to lead to more flexibility. Also with an eye to that we expect people to extend lighthouse in various ways, bringing in their own weird dependencies and how they choose to manage them.

You do an out of build tree to extend it or fork it. I don't see what you mean by bringing own weird dependencies. If you fork, you can add whatever submodules you want. the build system can support BYOLLVM, i just didn't set it up because i want something basic to start.

rolfmorel · 2025-10-02T18:01:04Z

CMakeLists.txt

+# If building features that require Python development, find them early in
+# one invocation (some CMake versions are sensitive to resolving out of order).
+#-------------------------------------------------------------------------------
+option(LIGHTHOUSE_BUILD_PYTHON_BINDINGS "Builds the Lighthouse python bindings" ON)


Can we have one "test" file to document what we expect to be build? E.g. just a Python file containing from lighthouse import ir or rather from mlir import ir, lighthouse.

Yes, i need to add a test before i land this, i will do that.

rolfmorel · 2025-10-02T18:04:13Z

CMakeLists.txt

+  FetchContent_Declare(
+      nanobind
+      GIT_REPOSITORY https://github.com/wjakob/nanobind.git
+      GIT_TAG        0f9ce749b257fdfe701edb3cf6f7027ba029434a # v2.4.0


There's movement upstream on nanobind versions. I think llvm/llvm-project#161230 is most pertinent. Before landing, let's make sure we are aligned with what upstream is doing/about to do.

We land and then integerate the commit and change. This is the correct way of doing it until the pr lands. Once it lands, we can integerate it.

rolfmorel · 2025-10-02T18:19:28Z

As @makslevental has been making a lot of progress on the Python bindings lately -- thank you!! -- maybe he has better perspective on setting up a new downstream project extending upstream.

E.g., I would like for lighthouse to be an out-of-tree build and rely on the upstream Python bindings for the upstream dialects being built with the rest of MLIR. Any new dialects or C++-MLIR code that needs bindings for lighthouse -- god forbid we need them -- should preferably be a separate small .so that we can load alongside the upstream bindings.

@makslevental , do you have any suggestions?

makslevental · 2025-10-02T19:35:00Z

should preferably be a separate small .so that we can load alongside the upstream bindings.

just so i'm not assuming things, can you give me a user story? like what i think you're saying is you want

pip install upstream-mlir-python-bindings
pip install downstream-mlir-python-bindings

from mlir.dialects import *
from mlir.ir import Context, Location, Module, InsertionPoint
from downstream.dialects import my_dialect
with Context(), Location.unknown():
    m = Module.create()
    with InsertionPoint(m.body):
        op1 = arith.constant(...)
        op2 = my_dialect.my_op(...)
    assert m.operation.verify()

Is that accurate?

rolfmorel · 2025-10-02T21:28:05Z

should preferably be a separate small .so that we can load alongside the upstream bindings.

just so i'm not assuming things, can you give me a user story? like what i think you're saying is you want
pip install upstream-mlir-python-bindings

pip install downstream-mlir-python-bindings
from mlir.dialects import *
from mlir.ir import Context, Location, Module, InsertionPoint
from downstream.dialects import my_dialect
with Context(), Location.unknown():
    m = Module.create()
    with InsertionPoint(m.body):
        op1 = arith.constant(...)
        op2 = my_dialect.my_op(...)
    assert m.operation.verify()
Is that accurate?

Yes, essentially this! As MLIR is still quite unstable, I am okay for users to need to fiddle to get the exact versions (i.e. commits) of upstream-mlir-python-bindings and downstream-mlir-python-bindings to line up, but beyond that I would, as a user, very much like for what you suggest to Just Work.

As a developer of a non-llvm/llvm-project compiler, e.g. lighthouse, I would like to just take any* build of llvm-project with MLIR and its Python bindings enabled and 1) build a small .so that just adds a couple of dialects and 2) add a python "package", i.e. folder, for the corresponding Python bindings (and just for those dialects).

*: modulo needing make sure the API I am using is correct for the specific commit of llvm-project.

makslevental · 2025-10-02T21:55:50Z

Yes, essentially this!

Yea too bad it's impossible currently because of how the symbol resolution happens between the bindings code (the Python C extensions with nanobind code) and the MLIR (C) APIs. I can explain more shortly but I just wanted to quickly say that is indeed impossible currently.

The rest I agree with you on completely - it's what I wanted when I started a couple of years ago.

makslevental · 2025-10-03T07:32:23Z

Here's what it would take llvm/llvm-project#161782 (ie hardening that PR).

Groverkss · 2025-10-03T10:26:12Z

E.g., I would like for lighthouse to be an out-of-tree build and rely on the upstream Python bindings for the upstream dialects being built with the rest of MLIR. Any new dialects or C++-MLIR code that needs bindings for lighthouse -- god forbid we need them -- should preferably be a separate small .so that we can load alongside the upstream bindings.

I'm not sure when we decided this is the plan for lighthouse. I disagree with a lot of this.

Any new dialects or C++-MLIR code that needs bindings for lighthouse -- god forbid we need them -- should preferably be a separate small .so that we can load alongside the upstream bindings.

We will absolutely need new dialects. We will absolutely need to expose them as bindings. We cannot rely on upstream binding wheels (which upstream doesn't even ship! maks is kind enough to build and ship them for us).

rely on the upstream Python bindings for the upstream dialects being built with the rest of MLIR

That's a weird way of handling dependencies. Let's say a mlir commit completely breaks lighthouse and we need atleast 1-2 weeks of volunteer time to fix it. We cannot ask upstream to revert the commit, and we also cannot just get stuck in a cycle of not integerating new commits because a commit broke us. Instead, we locally revert it in our fork and keep iterating while someone works on unreverting the local revert we did. If we rely on upstream builds for this, this is never going to work. Also, I'm very scared of how we will handle version mistmatches this way.

My understanding of lighthouse is as a compiler, if we build a compiler we build everything with it and ship it. Does clang / rust support a shipped version where they ask you to download a specific version of llvm and then use clang with it? It just sounds like a version mismatch waiting to happen.

I'm not an expert on this python stuff, I just know the things i'll need to make the export path work. I'm sure @makslevental can correct me and has a better way, but all in all, this is a basic setup for starting out to start building a proper torch.export flow. If we want to improve it, we can land and iterate. But let's not stop because a simple setup isn't the "ideal" setup you want.

rengolin · 2025-10-03T11:30:08Z

E.g., I would like for lighthouse to be an out-of-tree build and rely on the upstream Python bindings for the upstream dialects being built with the rest of MLIR. Any new dialects or C++-MLIR code that needs bindings for lighthouse -- god forbid we need them -- should preferably be a separate small .so that we can load alongside the upstream bindings.

I'm not sure when we decided this is the plan for lighthouse. I disagree with a lot of this.

The original design was for lighthouse to be a facilitator, not a compiler on its own. We're not trying to replicate iree/tpp-mlir here, we're trying to help existing (and new) MLIR based compilers to reuse (or repeat) the infrastructure to use MLIR in a compiler.

If we create a structure here that is not only opinionated (that's fine) but hard-coded to a particular path, we'll make it harder for people to reuse, and we'll limit for repeat (ie. copy and paste). That's not a good design.

The shared object method is one that has been discussed in upstream MLIR for years and in use already, including LLVM, so not a new thing. If there's a better way, I'm all ears.

The main requirement is to be able to test different downstream compilers on the same lighthouse project just by adding components to it (say, a branch or a fork), and not have to change CMake / submodules / glue code at all.

We will absolutely need new dialects. We will absolutely need to expose them as bindings. We cannot rely on upstream binding wheels (which upstream doesn't even ship! maks is kind enough to build and ship them for us).

One of the key design principles in the forum discussions was that we most certainly don't want to have to create new dialects in lighthouse. If we need something from MLIR that it doesn't currently do, we make the case for it to do and work there, not here.

We may need to create opinionated infrastructure and glue code, but such code needs to be as independent as possible (including using shared objects, as implementation details).

That's a weird way of handling dependencies. Let's say a mlir commit completely breaks lighthouse and we need atleast 1-2 weeks of volunteer time to fix it. We cannot ask upstream to revert the commit,

I want to get into a place where we most definitely should ask upstream to fix it, and if not, revert it. This is already the case for the LLVM test-suite, and is one of the key reasons why we created the lighthouse project inside the LLVM organization.

and we also cannot just get stuck in a cycle of not integerating new commits because a commit broke us. Instead, we locally revert it in our fork and keep iterating while someone works on unreverting the local revert we did. If we rely on upstream builds for this, this is never going to work. Also, I'm very scared of how we will handle version mistmatches this way.

If you treat lighthouse as yet another downstream, then it becomes scary indeed. But that's the wrong way of looking at it.

This project is to be as canon as the LLVM test-suite in a first instance, and as clang in the long term.

My understanding of lighthouse is as a compiler, if we build a compiler we build everything with it and ship it.

Not at all. Lighthouse is a framework for building and testing compilers, bundled with a naive compiler plugin infrastructure as an example. If you want to build a new compiler, you can use that as a staring point. If you want to test your existing compiler, you can plug in using the same framework.

Does clang / rust support a shipped version where they ask you to download a specific version of llvm and then use clang with it? It just sounds like a version mismatch waiting to happen.

Clang is one compiler. LLVM also integrates with a huge number of other compilers and the LLVM test-suite can test all of them (providing you can compile the tests).

As expressed in the README, the first stage is to allow testing of different components and compilers, then we create some naive compiler example (in plugin form) while maintaining the ability of other compilers to co-exist.

Only if all of that works and we form as community around it (like clang did), that we begin talking about stronger structural guarantees (as clang does). But that's probably a decade away, so I would not worry about it now.

Add python bindings for lighthouse

4b3fcd6

rengolin requested review from rolfmorel and makslevental September 1, 2025 12:39

rengolin reviewed Sep 1, 2025

View reviewed changes

CMakeLists.txt Outdated Show resolved Hide resolved

CMakeLists.txt Outdated Show resolved Hide resolved

rengolin requested a review from nicolasvasilache September 3, 2025 08:22

Groverkss added 2 commits September 28, 2025 13:27

Working

62cc871

Create library directly

6c8271e

Groverkss marked this pull request as ready for review September 28, 2025 12:55

Groverkss mentioned this pull request Sep 28, 2025

[ingress][torch-mlir][RFC] Initial version of fx-importer script using torch-mlir #4

Draft

Groverkss requested a review from rengolin September 28, 2025 13:01

Fix naming

3537349

rolfmorel reviewed Oct 2, 2025

View reviewed changes

makslevental mentioned this pull request Oct 3, 2025

[MLIR][Python][NOMERGE] demo building bindings with no bundled CAPI aggregate and using other bindings aggregate llvm/llvm-project#161782

Open

Add python bindings for lighthouse #3

Are you sure you want to change the base?

Add python bindings for lighthouse #3

Uh oh!

Conversation

Groverkss commented Aug 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

makslevental commented Sep 28, 2025

Uh oh!

rolfmorel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Groverkss Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Groverkss Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Groverkss Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rolfmorel commented Oct 2, 2025

Uh oh!

makslevental commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rolfmorel commented Oct 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

makslevental commented Oct 2, 2025

Uh oh!

makslevental commented Oct 3, 2025

Uh oh!

Groverkss commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rengolin commented Oct 3, 2025

Uh oh!

Uh oh!

Groverkss commented Aug 31, 2025 •

edited

Loading

rolfmorel left a comment •

edited

Loading

Groverkss Oct 3, 2025 •

edited

Loading

Groverkss Oct 3, 2025 •

edited

Loading

Groverkss Oct 3, 2025 •

edited

Loading

makslevental commented Oct 2, 2025 •

edited

Loading

rolfmorel commented Oct 2, 2025 •

edited

Loading

Groverkss commented Oct 3, 2025 •

edited

Loading