refactor(cli): embed MLIR/LLVM codegen into single hew binary#261
Merged
refactor(cli): embed MLIR/LLVM codegen into single hew binary#261
Conversation
Replace the standalone hew-codegen executable with an embedded C++ MLIR/LLVM
backend linked directly into the hew Rust binary via a thin C API. The
compiler is now a single self-contained binary: parse, type-check, lower to
LLVM IR, and link — no child process, no pipe, no second binary to ship.
Architecture:
- hew-cli/build.rs invokes CMake to compile C++ into libHewCodegenCAPI.a
- CMake generates a .cargo file with link directives that Cargo reads
verbatim — zero parsing, CMake is sole authority on what to link
- HEW_EMBED_STATIC=1 (release) statically links all MLIR + LLVM + libc++
into a self-contained binary; dev mode uses shared linking for fast
incremental builds
Build system:
- Glob all libMLIR*.a / libLLVM*.a with --start-group/--end-group instead
of curating library lists (MLIR cmake helpers miss transitive deps)
- Resolve libstdc++.a via CXX compiler -print-file-name (Cargo uses cc
not c++, so -static-libstdc++ is silently ignored)
- Prefer ${LLVM_PREFIX}/bin/clang over PATH clang on macOS to avoid Apple
Clang which cannot consume LLVM 22 bitcode
- Add Homebrew lib paths (/opt/homebrew/lib, /usr/local/lib) for macOS
static linking (zstd, zlib, xml2 not on default search path)
- Reject cross-arch compilation at configure time (host CMake build would
silently embed wrong-arch static library)
Clean cutover:
- Deleted standalone codegen_main.cpp and JSON reader path
- Removed hew-codegen from all 17+ packaging files: release.yml, install
scripts, Docker images, distro packages (Alpine, Debian, RPM, Arch, Nix,
Homebrew)
- Simplified nightly sanitizer workflow to use same build directory
- Makefile default target no longer depends on codegen; release target
uses HEW_EMBED_STATIC=1 cargo build exclusively
- No fallback paths, no compatibility shims, no dead code
New CLI flags: --emit-msgpack, --link-lib
Verified: Linux x86_64 (492/492 E2E tests), macOS arm64 (dev + static
release smoke tests).
build.rs now warns and returns early instead of panicking when CMake configure fails (LLVM/MLIR not installed). This lets cargo clippy and cargo check work without LLVM — only cargo build (which actually links) needs the native codegen library. CI jobs without LLVM provisioning (Clippy, coverage, Windows, macOS x86_64) now exclude hew-cli from workspace test runs since the test binary cannot link without the embedded codegen symbols.
The generated msgpack_reader.cpp uses LF, but on Windows git may check out the file with CRLF. Normalize the checked-in content before comparing so the test passes cross-platform.
Drop the macos-13 Intel matrix entry — the runner is deprecated and jobs were being cancelled. macOS CI now runs only on arm64 (macos-14) with full codegen E2E testing. The matrix/strategy wrapper is removed since there is only one macOS configuration.
Three issues found by independent reviewers: 1. report_fatal_error → throw std::runtime_error (codegen.cpp) The embedded codegen used llvm::report_fatal_error for bad target triples and missing generator stubs. In the old standalone-process model this only killed the child; after the cutover it aborts the host hew process. Now throws exceptions caught by the C API wrapper, producing a clean error message and exit code 1. 2. Remove darwin-x86_64 from release matrix The release job ran x86_64-apple-darwin on macos-15 (ARM), but build.rs rejects cross-arch compilation. Removed the target from both the main release matrix and the VS Code extension matrix. Updated Homebrew formula to arm64-only. 3. Provision LLVM/MLIR for Docker and musl builds The Dockerfile.release and release.yml musl job rebuilt hew-cli without LLVM installed — build.rs would skip codegen gracefully but the link step would fail with unresolved symbols. Added LLVM 22 packages to the Alpine Docker build stage and the Ubuntu musl release job. Added LLVM_PREFIX pre-flight check to build-packages.sh. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This was referenced Mar 17, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replace the standalone hew-codegen process with an embedded C API. The hew binary is now self-contained: parse, type-check, MLIR lowering, LLVM IR generation, and native linking all happen in a single process.
Key changes
hew_codegen_compile_msgpack()C API instead of spawning hew-codegen over stdincodegen_main.cppremoved)--emit-msgpack,--link-libflags to hew CLIcodegentarget builds test infrastructure onlyBuild modes
HEW_EMBED_STATIC=1): self-contained binary with all MLIR/LLVM/C++ runtime statically linked (~127MB Linux, ~165MB macOS)Verification
Stats
60 files changed, 3216 insertions, 4246 deletions (net -1030 lines)