Skip to content

refactor(cli): embed MLIR/LLVM codegen into single hew binary#261

Merged
slepp merged 5 commits intomainfrom
refactor/single-binary-codegen
Mar 17, 2026
Merged

refactor(cli): embed MLIR/LLVM codegen into single hew binary#261
slepp merged 5 commits intomainfrom
refactor/single-binary-codegen

Conversation

@slepp
Copy link
Copy Markdown
Contributor

@slepp slepp commented Mar 17, 2026

Replace the standalone hew-codegen process with an embedded C API. The hew binary is now self-contained: parse, type-check, MLIR lowering, LLVM IR generation, and native linking all happen in a single process.

Key changes

  • Embed codegen via hew_codegen_compile_msgpack() C API instead of spawning hew-codegen over stdin
  • Delete standalone hew-codegen binary (codegen_main.cpp removed)
  • Remove nlohmann/json dependency (msgpack-only reader path)
  • Add --emit-msgpack, --link-lib flags to hew CLI
  • Remove hew-codegen from all 17+ packaging/installer/release files
  • Simplify Makefile: codegen target builds test infrastructure only
  • Prefer LLVM_PREFIX/bin/clang over PATH on macOS (avoids Apple Clang)
  • Reject cross-arch compilation with clear error message
  • Add Homebrew library paths for macOS static linking

Build modes

  • Dev (default): shared linking against system LLVM — fast incremental builds (~2s)
  • Static (HEW_EMBED_STATIC=1): self-contained binary with all MLIR/LLVM/C++ runtime statically linked (~127MB Linux, ~165MB macOS)

Verification

  • Linux x86_64: 492/492 E2E tests pass, lint clean, fmt clean
  • macOS arm64: dev + static release builds, smoke tests pass (hello world, actors, generics, fibonacci)

Stats

60 files changed, 3216 insertions, 4246 deletions (net -1030 lines)

slepp and others added 5 commits March 16, 2026 19:15
Replace the standalone hew-codegen executable with an embedded C++ MLIR/LLVM
backend linked directly into the hew Rust binary via a thin C API.  The
compiler is now a single self-contained binary: parse, type-check, lower to
LLVM IR, and link — no child process, no pipe, no second binary to ship.

Architecture:
- hew-cli/build.rs invokes CMake to compile C++ into libHewCodegenCAPI.a
- CMake generates a .cargo file with link directives that Cargo reads
  verbatim — zero parsing, CMake is sole authority on what to link
- HEW_EMBED_STATIC=1 (release) statically links all MLIR + LLVM + libc++
  into a self-contained binary; dev mode uses shared linking for fast
  incremental builds

Build system:
- Glob all libMLIR*.a / libLLVM*.a with --start-group/--end-group instead
  of curating library lists (MLIR cmake helpers miss transitive deps)
- Resolve libstdc++.a via CXX compiler -print-file-name (Cargo uses cc
  not c++, so -static-libstdc++ is silently ignored)
- Prefer ${LLVM_PREFIX}/bin/clang over PATH clang on macOS to avoid Apple
  Clang which cannot consume LLVM 22 bitcode
- Add Homebrew lib paths (/opt/homebrew/lib, /usr/local/lib) for macOS
  static linking (zstd, zlib, xml2 not on default search path)
- Reject cross-arch compilation at configure time (host CMake build would
  silently embed wrong-arch static library)

Clean cutover:
- Deleted standalone codegen_main.cpp and JSON reader path
- Removed hew-codegen from all 17+ packaging files: release.yml, install
  scripts, Docker images, distro packages (Alpine, Debian, RPM, Arch, Nix,
  Homebrew)
- Simplified nightly sanitizer workflow to use same build directory
- Makefile default target no longer depends on codegen; release target
  uses HEW_EMBED_STATIC=1 cargo build exclusively
- No fallback paths, no compatibility shims, no dead code

New CLI flags: --emit-msgpack, --link-lib

Verified: Linux x86_64 (492/492 E2E tests), macOS arm64 (dev + static
release smoke tests).
build.rs now warns and returns early instead of panicking when CMake
configure fails (LLVM/MLIR not installed).  This lets cargo clippy and
cargo check work without LLVM — only cargo build (which actually links)
needs the native codegen library.

CI jobs without LLVM provisioning (Clippy, coverage, Windows, macOS
x86_64) now exclude hew-cli from workspace test runs since the test
binary cannot link without the embedded codegen symbols.
The generated msgpack_reader.cpp uses LF, but on Windows git may
check out the file with CRLF.  Normalize the checked-in content
before comparing so the test passes cross-platform.
Drop the macos-13 Intel matrix entry — the runner is deprecated and
jobs were being cancelled.  macOS CI now runs only on arm64 (macos-14)
with full codegen E2E testing.  The matrix/strategy wrapper is removed
since there is only one macOS configuration.
Three issues found by independent reviewers:

1. report_fatal_error → throw std::runtime_error (codegen.cpp)
   The embedded codegen used llvm::report_fatal_error for bad target
   triples and missing generator stubs.  In the old standalone-process
   model this only killed the child; after the cutover it aborts the
   host hew process.  Now throws exceptions caught by the C API wrapper,
   producing a clean error message and exit code 1.

2. Remove darwin-x86_64 from release matrix
   The release job ran x86_64-apple-darwin on macos-15 (ARM), but
   build.rs rejects cross-arch compilation.  Removed the target from
   both the main release matrix and the VS Code extension matrix.
   Updated Homebrew formula to arm64-only.

3. Provision LLVM/MLIR for Docker and musl builds
   The Dockerfile.release and release.yml musl job rebuilt hew-cli
   without LLVM installed — build.rs would skip codegen gracefully
   but the link step would fail with unresolved symbols.  Added LLVM 22
   packages to the Alpine Docker build stage and the Ubuntu musl release
   job.  Added LLVM_PREFIX pre-flight check to build-packages.sh.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@slepp slepp enabled auto-merge (squash) March 17, 2026 04:38
@slepp slepp disabled auto-merge March 17, 2026 04:47
@slepp slepp merged commit cce70b8 into main Mar 17, 2026
12 checks passed
@slepp slepp deleted the refactor/single-binary-codegen branch March 17, 2026 04:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant