Reconsider AOT vs. compile-time generation of syntax trees

Our current formulation of syntax trees assumes that we’ll be able to read the contents of `node-types.json` files at compile time. This is only true for local development, and files pulled in via pinned Git dependencies. For all other cases, the [official word](https://github.com/haskell/cabal/issues/7021) is that this is not expected to work. This means that [any future publishing to Hackage](https://github.com/github/semantic/issues/16) is off the table, though things work for local dev and our downstream dependent projects.

But even the situation as it stands is not a hugely optimal one. For example, though Bazel tends to provide better in-IDE tooling, it doesn’t know how to find node-types files [in REPLs](https://github.com/tweag/rules_haskell/issues/1382), and even during standard builds doesn’t know how to find them [without preprocessor trickery](https://github.com/tweag/rules_haskell/issues/1337). 

I think it’s time to consider whether generation of this code ahead-of-time is worth exploring. Here are some upsides and downsides of AOT code generation.

### Upsides

* As mentioned above, this basically only works on `cabal` due to implementation details of the build/REPL process.
* We already do AOT codegen for the `Semantic_Proto` serialization files. Note that that file, even though it comes out to like 8000 SLoC, is well-behaved re. compile time and IDE support, in contrast to our stuff that does complicated Template Haskell splices. Indeed, I anticipate that the authors of `proto-lens` avoided TH generation because, much like us, TH has difficulty finding .proto files, and needs to work with massive protobuf definitions.
* We also generate code for `lingo-haskell`.
* As mentioned above, our build process can become substantially simpler, our IDE tooling will work more reliably (because it won’t ever try to activate a TH splice).
* We don’t update the grammars super-often, so this shouldn’t institute a tremendous amount of code churn.
* Better caching (even with Bazel, which is much better at caching than cabal, we still encounter spurious rebuilds).
* Better project ergonomics (since the codegen splices are defined in `tree-sitter`).

### Downsides

* More code to write.
* Less elegant than a pure-TH solution.
* It’s an extra step we have to be aware of during the update process.

Another approach we could take is to drop `cabal` support entirely, which would also preclude any Hackage releases, still needs some love to get working in a REPL context, and would entail a degree of tediousl downstream changes. We could also _shudder_ download the grammar definitions in the TH splices themselves, but I hardly think that invoking network calls in TH is something we should encourage, though that’s the only way I can envision this possibly working with `cabal`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reconsider AOT vs. compile-time generation of syntax trees #622

Upsides

Downsides

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reconsider AOT vs. compile-time generation of syntax trees #622

Description

Upsides

Downsides

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions