Implementation plan: compiling C shims alongside Pony (v1) #5468
SeanTAllen
started this conversation in
ponyc
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Implements discussion #5390. Drop a
.cnext to your.pony; ponyc compiles itin-process with embedded clang and links the object in. No second build system.
Settled scope
.c)..cpp/.ccdeferred (C++ runtime linking). Flag family isdesigned C/C++-shared so C++ is additive later.
PASS_C), clang diagnostics routed as ponyc errors.Divergences from current ponyc
useflag storageprogram_t->libs)package_t.ponyparsed.cin the package's top dir compiled + linkedLLVM_ENABLE_PROJECTS "lld""lld;clang"+ clang static libs linked into ponycStage 1 — build & link clang (standalone PR, merge-gated behind Stage 2)
Goal: ponyc builds with clang's libraries linked in and the resource dir shipped.
No feature yet; no behavior change.
lib/CMakeLists.txt:183:"lld"→"lld;clang".CMakeLists.txt: addfind_package(Clang CONFIG ...)mirroring the LLVM/LLDdiscovery at
:27-28; definePONYC_CLANG_LIBS. Prefer clang's exported CMaketargets (clangFrontend/CodeGen/Driver/Parse/Sema/Analysis/AST/Basic/Lex/
Serialization/Edit/Support) so transitive deps resolve; fall back to a hardcoded
list like
PONYC_LLD_LIBS(:389) if the export isn't usable.src/ponyc/CMakeLists.txt:75: add${PONYC_CLANG_LIBS}totarget_link_libraries.clang/Basic/Version.h(CLANG_VERSION_MAJOR/MINOR) or add a compile defmirroring
LLVM_VERSIONatCMakeLists.txt:416.build/libs/lib/clang/<ver>/include.Ensure the ponyc install/release packaging carries it so it sits at a known
offset from the ponyc binary (same idea as the bundled
packages/dir).make.ps1always rebuilds from source — confirm the clang projectbuilds on that path too.
Verify: ponyc builds & links;
make test-coregreen (debug + release); binaryruns. Measure & report:
make libsbuild-time delta, ponyc binary-size delta,CI wall-clock delta. These are the costs the discussion's "one line" understates.
Stage 2 — the feature
2a. Directive schemes —
src/libponyc/pkg/use.c,pkg/program.cccinclude:,cdefine:inhandlers[](use.c:32-40) withallow_guard=true,allow_name=false— so... if macosxworks for free.(
cstd:andcflag:are both dropped from v1:cstd:is sugar over-std—clang's default dialect compiles normal C glue;
cflag:, a raw escape hatch,reintroduces a flag-precedence footgun against the locked target flags. Add either
back if demand shows.)
ast_nearest(use, TK_PACKAGE)→ast_dataand storeper-package (not program-wide).
cinclude:resolves relative paths against thepackage dir exactly like
use_path(program.cc:168-173); it's additive (a searchlist), so multiple includes never conflict — no check needed.
quoted_locator.lib:/path:run their locatorthrough
quoted_locator(program.cc:114-135), which rejects any value containingINVALID_LOCATOR_CHARS("' \()$|&<>...,program.cc:109-111). C defines need exactly those —cdefine:VERSION="1.2"has quotes, include paths have spaces — socinclude:/cdefine:must bypass it: store the value raw and pass it verbatim as one clang argv element. Safe, because clang takes an argv array, not a shell command — no injection surface.cdefine:still parses the macro *name* (text before=) for the duplicate check; the value after=` is opaque.cdefine:duplicate check: error if two activecdefine:directives definethe same macro name — whether the values differ (
FOO=1vsFOO=2) or match(
FOO=1twice). A macro name is a declaration; redeclaring it is a mistake, likelet a = 1twice — deterministic ordering would otherwise let one silently shadowthe other. Macro name = text before
=; on append, scan the package'sc_definesfor a same-name entry (the
strlist_findpatternuse_libraryuses). Point theerror at the redefining directive, ideally at the original too (error frame); name
the prior value when it differs. This diverges from
use "lib:", which dedupsexact duplicates — linking a lib twice is harmless, redeclaring a macro is not.
Guards resolve cross-platform cases for free: a guarded-out directive never reaches
the handler (guard evaluated first,
use.c:138-142), socdefine:FOO=1 if linux/cdefine:FOO=2 if windowsaren't a conflict — only active-target defines are checked.2b. Per-package storage —
src/libponyc/pkg/package.c:45-60package_t:strlist_t* c_includes,c_defines,c_sources.create_package(:535-583) initializes every field by hand — there is nozero-init — so each new field needs an explicit
= NULLthere.package_free(:1074-1081), notpackage_done(which only freesglobal package state). Add
strlist_freefor each of the three strlists, or theyleak per package.
2c.
.cdiscovery —src/libponyc/pkg/package.c:201-267parse_files_in_dir, collect files ending in.cinto a separate bucketand append their full paths to
c_sources((package_t*)ast_data(package)).They must not join the
entries[]array that gets fed toparse_source_file(
:262) — that's the Pony parser and would choke on C. The extension logic issafe (
strrchr+strcmp(".c")won't match.cc/.cpp; a bare.chidden fileis already skipped by the
name[0]=='.'guard). Keep the deterministic sort.Single-directory only — vendored C in subdirs is untouched, as intended.
package.c:1056-1061errors "no source files in package" whenast_child(package) == NULL. A.cadds no Pony AST, so a directory with only.cand no
.ponystays an error. Keep that. A shim is C that travels with a Ponypackage; every real shim sits next to
.pony, which satisfies the guard. Making aC-only directory a valid package is a new concept nobody asked for — don't introduce
it quietly (easier to give than take away). Worth a small message tweak — "no Pony
source files in package" — so someone who dropped only a
.cgets a clear reason.2c-bis. Existing full-program test collision (silent shadowing, not a link break)
Five
test/full-program-tests/programs shipadditional.cnext tomain.pony,built into a
<name>-additionallibrary bytest/full-program-tests/CMakeLists.txt(
GLOB_RECURSE *.c *.cc; lib name from the first path component) and linked viause "lib:<name>-additional": c-callback, ffi-call-in-initializer,ffi-different-returns, ffi-return-arg-reachable, identity-digestof-object. Once
.cdiscovery lands, ponyc would also compile eachadditional.cas a shim object.Correction to an earlier claim in this plan: this is not a link failure. A shim
.ovs ause "lib:"library is object-vs-library — the object's symbols win and thelibrary is never pulled (a static archive's members aren't pulled once the symbol is
defined; a shared lib's symbols are overridden). The build still passes; it just
silently links the shim instead of the lib. That's the real harm: these tests exist to
cover the
use "lib:"-links-a-built-lib path, and they'd quietly stop exercising it —a silent loss of coverage, worse than a loud failure. (Only
.ccollides in v1;.ccisn't discovered yet, so C++ fixtures are safe until the C++ stage — note it for then.)
These are external-lib FFI tests, not shims, and the "
use "lib:"links aseparately-built lib" path must stay covered. So keep them as external-lib tests
and move their
.cout of the scanned package dir:git mv <test>/additional.c <test>/c-src/additional.cfor the five.GLOB_RECURSEstill finds the subdir.c, and the lib name derives from the first path component (<test>), so<test>-additionalis unchanged and theuse "lib:"line inmain.ponyisuntouched. ponyc's single-dir discovery no longer sees the
.c..cmust be builtonly by ponyc, never by the cmake glob (else it's double-built and collides).
Scope
GLOB_RECURSEto skip the shim tests (ashims/subtree the glob excludes,or place them where the glob doesn't reach). This is the cmake change.
2d. The pass —
src/libponyc/pass/pass.h, newgenc.cPASS_Cto thepass_idenum (pass.h:209-232) betweenPASS_FINALISERandPASS_REACH— before the codegen group, not among it. REACH is the boundary wherethe shared AST-pass driver (
ast_passes_program, run by both real builds and thetest harness) hands off to the split codegen drivers (
codegen()/codegen_gen_test()). Below REACH keeps PASS_C on the shared side: one call site,fail-fast, and
--pass cstops for free (generate_passesreturns before codegenwhen
limit < PASS_REACH,pass.c:376). It needs nothing reach produces."c", unused today) inpass_name(pass.c:50-77); re-check thePASS_ALL <= AST_FLAG_PASS_MASKstatic assert (ast.c:64) — headroom fine(mask
0x1F=31,PASS_ALL20→21).PASS_HELP(pass.h:236-258) — a hardcoded--passhelp string flagged"update when pass_id changes." Insert
" =c\n"in pass order (before=reach).Otherwise the static help (
options.c:175) diverges from the dynamic listing(
print_passes,options.c:194-214), which picks up the new pass automatically.PASS_Cbefore REACH renumbers REACH andeverything after it. The Pony mirror
tools/lib/ponylang/pony_compiler/.../pass.ponyuses hardcoded integers — add
PassCand renumber Reach 14→15, Paint→16,LLVMIR→17, Bitcode→18, ASM→19, Obj→20, All→21, plus the union/primitive entries.
Run
make test-pony-compiler test-pony-lint test-pony-lsp test-pony-doc.genc.c:bool genc(ast_t* program, pass_opt_t* opt)— takes(program, opt),not
compile_t: clang builds its own target fromopt->triple/cpu/features, writesthe
.oto the output dir, reads each package's flags offast_data, and records theobject paths on
program_t. Needing nothing from the LLVMcompile_tis exactly whatlets it run before codegen. Walk the package chain (
ast_child(program)then theast_siblingchain — all packages); for eachpackage_twithc_sources, compileeach source with clang in-process:
pass_opt_t, identical to Pony codegen):opt->triple,-mcpu=opt->cpu, target-featuresopt->features, PIC ifopt->pic,-O0/-O2peropt->release, debug per debug build, ABIopt->abi.-D<c_defines>,-I<c_includes>(resolved). No-std/cflag:in v1..oin the output dir (suffix_filenamepattern,codegen.c:1233— it currently takes
compile_t; factor a variant takingopt, or build the pathfrom
optdirectly).DiagnosticConsumerintoopt->check.errors(viaerrorfwithfile/line) so clang errors become ponyc errors. [UNVERIFIED — verify in the spike,
2f]. The whole error story and the compile-error tests (which assert on
errors_tcontents) rest on intercepting clang diagnostics programmatically. It'sthe API libclang/clangd are built on, so it's believed-true — but it's never been
run here (clang is unbuilt until Stage 1). Don't let Stage 2 stand on it unverified.
CrashRecoveryContext(the guard libclanguses for in-process compilation). A clang ICE /
report_fatal_error/ assertion endsin
abort()(llvm/.../ErrorHandling.cpp), and in-process that's ponyc dying —"Aborted (core dumped)," no message, no hint a shim was involved, filed as a ponyc
bug. With the recovery context the operation fails instead of the process: report
"internal error compiling
<shim>.c" througherrors_t, fail closed, withattribution. A few lines around the per-TU call. Triggers are rare (pathological C,
clang bugs) but the failure shape is the worst available.
.opath onprogram_t; addprogram_c_object_count/_ataccessorsmirroring
program_lib_count/_at(program.cc:390-412).ast_passes_program(pass.c:335) after theast_passescall, gatedif(opt->limit >= PASS_C). Both the real build (main.c:56program_load→ast_passes_program) and the test harness (util.cc:575) runast_passes_program, so one call site covers both — no edits tocodegen()orcodegen_gen_test(), no--pass cearly-return.test_expected_errorsrunsast_passes_program); the existing error-count/substring assertions work unchanged.limit ≤ FINALISER(< PASS_C), so the gateskips genc — they never invoke clang. Deliberate: shim C errors do not
surface in-editor in v1. genc emits objects, which is wrong for an editor (
.ospam per keystroke, possibly no output dir, half-edited code); in-editor errors
need a diagnostics-only clang mode (
-fsyntax-only, no codegen). That mode, ifever wanted, is a separate LSP path independent of where PASS_C sits — so this
choice neither forecloses in-editor errors nor forces a future enum renumber.
--pass objcontract: genc runs for anylimit ≥ PASS_C, so--pass c,--pass obj, etc. each produce shim.os in the output dir. They'recleaned up only on a successful
PASS_ALLlink (alongside the Pony object,genexe.cc:2198-2202). Under any non-link mode (--pass obj/asm/ir/c) the shim.os persist — same as the Pony.ounder--pass obj, which is the point ofthose modes (hand them to your own linker). A failed link leaves them too
(cleanup is success-only). Stated so it's a contract, not an accident.
2d-bis. Determinism (reproducible builds)
The
-I/-Dflag order and the.c→.ocompile/link order are deterministic byconstruction — the plan's job is to state it and protect it, not add machinery:
cinclude:/cdefine:accumulate aspass_scopevisitsTK_USEnodes (scope.c:365)in AST order = sorted-file order (
parse_files_in_dirqsort) + source order within afile. They append to per-package
strlists, which preserve insertion order(
ponyint_list_appendtail-appends,list.c:25; iteration is head→tail). genc readsthem in that order. This is the same deterministic path
use "lib:"already uses..cdiscovery uses the same qsort, so multiple.ccompile and their.oappend tothe link in a stable order.
parse_files_in_dir'ssort and (b) appending in visit order + reading in insertion order. A comment at the
discovery/append sites should say so.
several
cinclude:/cdefine:directives and assert the emitted order is stable.2e. Link append —
src/libponyc/codegen/genexe.cclink_exe(:2020) just routes to the threeplatform linkers. All three must be edited: ELF (
file_oat:1341), MachO(
:1770), COFF (:1919). After the existingfile_o+ user-libs sequence, appendprogram_c_object_count/_atpaths to the LLD arg vector. COFF differsstructurally (
file_oprecedes the lib search paths), so place the append withplatform-specific care there.
2f. System include paths —
src/libponyc/codegen/genexe.ccresolve_sysroot(:836) +opt->sysroot. Notefind_libc_crt_dirand its neighbors find library/CRTdirs, not include dirs, so there's nothing there to factor for headers; deriving
the system include dirs from the sysroot (
<sysroot>/usr/include,/usr/include/<triple>, the macOS SDK path) is new code — the compile-sidemirror of the link-side path work, sharing only the sysroot detection. These
helpers are
staticin genexe.cc, so expose them or relocate the sysroot logic toa shared spot
genc.ccan call.Driver: the clangwe embed is vanilla upstream, and the per-distro toolchain knowledge that makes a
distro's own clang reliable lives in that distro's downstream patches, which we
don't carry. The upstream
GCCInstallationDetectoris generic best-effort and candisagree with the libc our linker already picked (musl vs glibc), so we own the
include-path discovery off the same sysroot the linker uses — keeping compile and
link consistent by construction. macOS SDK / Windows MSVC discovery is the corner
to watch.
(1) compiles a deliberately broken
shim.cin-process with a customDiagnosticConsumerand confirms the error arrives structured (severity/file/line) inerrors_t; (2) compiles a goodshim.cand confirms a linkable.ocomes out;(3) pokes Windows MSVC/SDK include discovery. Converts the §2d
[UNVERIFIED]diagnostic-interception assumption to verified before genc.c is written — week-one
knowledge, not week-ten.
2g. Tests (how to run)
make test-pony-compiler. New fixture: place a.condisk (a temp package dir, or a magic package mapped to a temp dir) since
inline-source fixtures can't carry a
.c. Assert viatest_expected_errorsthat abad shim yields the expected clang-derived error text. This fixture mechanism is
the new piece of discussion open-question Prevent names differing only by case #3.
cdefine:duplicate test —make test-pony-compiler. This one is cheap: thecheck fires in
pass_scope(the use handler), needs no.con disk and no clang, soit's plain inline source with two same-name
cdefine:directives, run to thescopepass, asserting the error — one case for differing values, one for identical values
(both error). Add a passing case: distinct names, and guarded cross-platform defines
that don't clash.
make test-core(debug and release). A packagewith a tiny shim + Pony calling it over FFI; compile + link + run; assert output.
Goes through the real
codegen/genexepath, exercising the link append.Compile errors are now gtest-assertable via
test_expected_errors; link-class failures(e.g. duplicate symbols across two shims) stay untestable in the full-program runner —
the same gap as today. Deferring is fine; naming it so it isn't a silent omission.
2h. Examples
examples/ffi-structandexamples/ffi-callbacksshipstruct.c/callbacks.cnext totheir
.pony, link a hand-built lib viause "lib:..."+use "path:./", and theirREADMEs document a manual C build step. v1 discovery would compile the
.cas a shim andsilently shadow the hand-built lib (object-vs-library, same mechanism as 2c-bis).
use "lib:"linking of aprebuilt lib, which stays a real, distinct thing users do (not everything is a shim).
Apply the same fix as the full-program tests: relocate
struct.c/callbacks.cinto asubdir (e.g.
c-src/) so single-dir discovery skips them.use "lib:"/use "path:./"stay as-is; update each README's manual build step to point at the new
.clocation.examples/cshim): a.cnext to its.pony,called over FFI, with
use "cinclude:..."/use "cdefine:..."— ponyc compiles it, nomanual build, no
use "lib:". This is the feature's showcase. Updateexamples/README.md(pony-examples-readme conventions) for the new example and therelocated FFI sources.
2i. Release notes + docs
.release-notes/compile-c-shims.md—##title matching the PR title, user-facingdescription with a code example. VERSION is
0.64.0(released), so this is required.shim
.cs are compiled to objects and linked directly, not archived — so (a) Cconstructors (
__attribute__((constructor))) run, and (b) two shims defining the sameexternal symbol are a loud duplicate-definition link error (object-vs-object),
whereas a shim shadowing a
use "lib:"library is silent (object-vs-library).Include the migration warning: a
.cyou previously built into a lib and linkedvia
use "lib:"is now auto-compiled as a shim and silently shadows that lib — move itto a subdir or drop the lib. (This is the user-facing twin of the 2c-bis/2h migrations.)
useguards condition the flags (cinclude:/cdefine:…if macosx); discovery has no per-file guard, so per-OS shim code useswhole-file
#ifdef— every.cis always compiled, and a platform-specific file#ifdefs its whole body to an empty object elsewhere. That is the design; say so.cdefine:vs ponyc-D(docs):cdefine:FOOis a C-preprocessor macro (clang-D), distinct from ponyc's own-D/--definebuild flags that drive Pony'sifdef.Same word, different machines — disambiguate it.
useschemes lives inponylang/pony-tutorial— aseparate-repo follow-up, noted in the PR.
Merge sequencing
Stage 1 is reviewed but held; it merges immediately before Stage 2 so
mainnevercarries an unused clang build. If the feature is abandoned, neither merges and
there's nothing to back out.
Decisions (resolved)
cstd:andcflag:— both dropped from v1 (2a).cstd:is sugar over-std;cflag:reintroduces a flag-precedence footgun against the locked target flags.artifacts carry
lib/clang/<ver>/includeat a known offset from the binary onevery platform.
no in-editor (LSP) shim diagnostics in v1 (would need a separate diagnostics-only path).
.pony; reversesan earlier note that relaxed the guard.
cdefine:duplicates error (2a) — any same-name redefinition, conflicting oridentical value, like
let atwice.cinclude:is additive, no check. Directivevalues bypass
quoted_locator, so quotes/spaces (-DVERSION="1.2") work.ffi-struct/ffi-callbacksas external-lib FFI examples(relocate their
.clike the tests, dodge discovery); add a newexamples/cshimtoshowcase the shim feature.
Assumptions to verify (Stage-1 spike, before Stage 2 — 2f)
tests rest on clang's
DiagnosticConsumer→errors_tworking in-process. Believed-true (libclang/clangd do it), never run here. The spike confirms it — plus a good
shim → linkable
.o, plus Windows include discovery — before genc.c is written.Beta Was this translation helpful? Give feedback.
All reactions