Compiling C and C++ shims alongside Pony #5390
Replies: 3 comments
-
|
My initial thought on caching is: "that's a build system concern" and "given the stated goal it shouldn't be too expensive to recompile the shims every time". |
Beta Was this translation helpful? Give feedback.
-
|
A few corrections after reading the actual code, mostly to the "one-line change" and the header story. Turning clang on in the LLVM build really is one line: another entry in The header story is better than I first thought. I worried that compiling C would force us to reconstruct the compiler driver's include-path logic, since building a bare One piece is genuinely new. Clang's builtin headers ( None of this moves the shape of the proposal. It moves "turn clang on" from trivial to a contained piece of build work, and it confirms the header search is reachable by reusing what the embedded linker already does. |
Beta Was this translation helpful? Give feedback.
-
|
The v1 implementation plan is written up in #5468 — C-only to start, grounded in the actual code: the |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
What I want to make easy
A lot of the time, calling a C library from Pony needs no C at all. You write the FFI declarations, you call straight into the library, and you're done. But not always. Sometimes the library doesn't map cleanly onto Pony's FFI. Sometimes the function you need is really a macro. Sometimes you have to build up a struct the way the library expects before you can hand it over. In those spots a little C smooths it out. A shim. A handful of functions that sit between Pony and the library and make the Pony side pleasant.
This is about that case. When a shim would make things better, I want it to be easy.
Today it isn't. You compile the C yourself, with your own compiler and your own flags, into a library, and then you point Pony at the result with
use "lib:..."anduse "path:...". Pony never touches the C. It only links the object that fell out the other end. So every project that reaches for a shim grows a second build step that lives outside Pony, and every contributor who checks out that project has to know about it.I want that glue to just compile. You drop a
.cnext to your.pony, and when ponyc builds the package, it builds the C too and links it in. No second build system. No instructions in the README about running make first.This is a small thing, and it should stay small. It's for shims and the C that travels with a package. It is not "use Pony as a C compiler." If you have a real C project, with its own build graph and code generation and a dozen translation units leaning on each other, you should build that with the tool made for it and link the result the way you already can. More on where that ambition lives further down.
How it would work
ponyc already vendors all of LLVM. The whole monorepo is sitting right there in the submodule, clang included. We don't build clang today; we only turn on LLD, which ponyc embeds and calls directly to link every program it produces. Turning clang on is a one-line change to which LLVM projects we build.
Once clang is in the build, ponyc can compile C in process. It builds a
CompilerInvocationfrom an argument list, runs an emit-object action, and gets back an object file produced against the same LLVM that just compiled the Pony. Same target, same ABI, by construction. Those objects join the pile of objects ponyc already hands to the linker. The link step barely changes; it already takes a pile of objects and libraries and turns them into a binary.Discovery is a convention. Any
.c,.cpp, or.ccsitting in a package directory next to the.ponyfiles gets compiled as part of that package. The extension sets the language. Nothing to enumerate, nothing to register. The C travels with the package the same way the Pony does, so when corral pulls a dependency that ships a shim, the shim comes along and builds with no extra work.Where the flags come from
A C file almost never compiles on convention alone. It needs to find its headers. It needs a define or two. It needs a particular C standard. So we need a way to say those things, and the natural place to say them is the same place Pony already says "link this library" and "look for it here": a
usedirective.There's a real decision hiding under that, and it's most of the design. Pony's
use "lib:..."anduse "path:..."collect into one global pile for the whole program. That's right for linking. Every library gets handed to the linker once, at the end, and it doesn't matter which package asked for it. Compiling C is the opposite. The headers package A needs aren't the headers package B needs. If A defines a macro to one value and B defines the same macro to another, those two facts can't share a bag. So the flags for compiling a package's C have to belong to that package, not to the program as a whole. That's the one real departure from how the existingusedirectives behave.For the directives themselves, I'd keep one idea per line, the way
lib:andpath:already split the work:use "cinclude:./vendor/include"for a header search pathuse "cdefine:FOO=1"for a preprocessor defineuse "cstd:c11"for the language standarduse "cflag:..."as an escape hatch for the odd flag with no home of its ownNaming the schemes instead of taking a raw flag string buys something concrete.
use "path:..."already resolves a relative path against the package's own directory, so the library gets found no matter where you run ponyc from.cinclude:should do the same.use "cinclude:./include"then means "the include directory next to this source," and it stays true regardless of the directory clang runs in. A raw-I./includecan't promise that.And because a
usecan carry anifguard today, platform conditioning comes for free.use "cinclude:/opt/homebrew/include" if macosxalready parses and already evaluates. We let the new schemes take a guard and we're done. Cross-platform C glue, where the include paths and defines differ by operating system, is the case where that matters most, and we don't build anything to get it.The flags that have to match the Pony side aren't negotiable and don't belong to the user at all. The target triple, the CPU, the features, position-independent code, the optimization level, debug info: ponyc already has all of these, because it just used them to compile the Pony. It feeds the same values to clang. A user who could override them could produce an object that won't link, or worse, one that links and then misbehaves. Those come from the compiler, not the directive.
C and C++ share one set of flags to start. The extension still sets the language, so a
.cppcompiles as C++ and a.cas C, but they draw from the samecstd:andcinclude:and the rest. If it turns out people genuinely need different standards for C and C++ in the same package, we split the family then. Easier to add the split later than to walk back two parallel families nobody needed.The alternative: a Pony build system
The other way to solve this is bigger. Give Pony its own build system, a build file you write, the way Zig has build.zig. A real description of how to build the native pieces of your project, with whatever logic that takes. That's also where "use Pony as a C compiler" would live, because once you have arbitrary build logic, building a real C project from inside Pony is on the table.
It's overkill for what I'm after. It cuts against how Pony works today, where this kind of information lives in the source through
usedirectives rather than off in a separate build file, and it's far more than the shim problem needs. If we ever want a full build system, that's its own conversation, and these directives don't stand in its way.Open questions
A few things I don't have answers for yet.
Caching. ponyc would compile these C files on every build. A second build step normally rebuilds only what changed. Do we cache the compiled objects and skip the ones whose source and flags haven't moved, and if so, where do those objects live and how do we key the cache? Get it wrong and you get either stale objects or slow builds.
Errors. When clang can't compile the C, the person staring at the failure is sitting in front of a Pony compiler. The error has to come out in a way that fits the rest of ponyc's diagnostics rather than dumping raw clang output and leaving them to work out which file it came from. How much of clang's diagnostics we route through Pony's, and how much we pass straight through, is an open call.
Compile-error tests. ponyc has a whole category of tests that assert a given program fails to compile with a given error. C that fails to compile is a new kind of compile failure. Whether it fits that existing machinery, and how, needs working out.
Beta Was this translation helpful? Give feedback.
All reactions