-
Notifications
You must be signed in to change notification settings - Fork 457
Reconsider AOT vs. compile-time generation of syntax trees #622
Description
Our current formulation of syntax trees assumes that we’ll be able to read the contents of node-types.json files at compile time. This is only true for local development, and files pulled in via pinned Git dependencies. For all other cases, the official word is that this is not expected to work. This means that any future publishing to Hackage is off the table, though things work for local dev and our downstream dependent projects.
But even the situation as it stands is not a hugely optimal one. For example, though Bazel tends to provide better in-IDE tooling, it doesn’t know how to find node-types files in REPLs, and even during standard builds doesn’t know how to find them without preprocessor trickery.
I think it’s time to consider whether generation of this code ahead-of-time is worth exploring. Here are some upsides and downsides of AOT code generation.
Upsides
- As mentioned above, this basically only works on
cabaldue to implementation details of the build/REPL process. - We already do AOT codegen for the
Semantic_Protoserialization files. Note that that file, even though it comes out to like 8000 SLoC, is well-behaved re. compile time and IDE support, in contrast to our stuff that does complicated Template Haskell splices. Indeed, I anticipate that the authors ofproto-lensavoided TH generation because, much like us, TH has difficulty finding .proto files, and needs to work with massive protobuf definitions. - We also generate code for
lingo-haskell. - As mentioned above, our build process can become substantially simpler, our IDE tooling will work more reliably (because it won’t ever try to activate a TH splice).
- We don’t update the grammars super-often, so this shouldn’t institute a tremendous amount of code churn.
- Better caching (even with Bazel, which is much better at caching than cabal, we still encounter spurious rebuilds).
- Better project ergonomics (since the codegen splices are defined in
tree-sitter).
Downsides
- More code to write.
- Less elegant than a pure-TH solution.
- It’s an extra step we have to be aware of during the update process.
Another approach we could take is to drop cabal support entirely, which would also preclude any Hackage releases, still needs some love to get working in a REPL context, and would entail a degree of tediousl downstream changes. We could also shudder download the grammar definitions in the TH splices themselves, but I hardly think that invoking network calls in TH is something we should encourage, though that’s the only way I can envision this possibly working with cabal.