Skip to content

Compact AST Representation

andychu edited this page Jun 28, 2024 · 54 revisions

Compact AST Representation

Array of Homogeneous Nodes / "Inverted" Struct-of-Arrays

As opposed to array of pointers to heterogeneous structs

  • Vertical object layout and compression for fixed heaps

    • Implemented in the Virgil Language by Ben Titzer
    • Vertical Object Layout basically means transposing objects and fields. Analogous to arrays of structs -> struct of arrays. Except the "array" is every instance of a given type on the heap.
  • Co-dfns APL Compiler Thesis -- Uses an array-based representation of an AST. Section 3.2 is The AST and Its Representation. What's novel is that this is a parallel compiler hosted on the CPU. So it's not just compact, but laid out in a way amenable to parallel computation.

  • Zig PR to use Struct-of-Arrays and Indices in AST. Proof of concept for other IRs: https://github.com/ziglang/zig/pull/7920

  • Modernizing Compiler Design for Carbon's Toolchain - CppNow 2023

    • Primary Homogeneous Dense Array packed with most ubiquitous data
    • Indexed access allows most connections to use adjacency (implicit pointer, no storage required)
    • Side arrays for secondary data, referred to with compressed index
    • Flyweight Handle wrappers

Compressed Pointers

This is related: more general, but more complex.

Layout Should be Orthogonal to Logic

  • You Can Have It All: Abstraction and Good Cache Performance. (2017) Divide objects into pools and can do SoA/AoS on individual sets of fields. It seems like this is done manually, which might be too onerous for big apps? The work hasn't been integrated into a compiler yet. Good related work section, citing a lot of the work above.

Tree Pass Fusion

I think the key point is that some of the frameworks introduce restrictions that make it hard to write a big program like a compiler. There is a hard tradeoff between the amount of fusing that can be done and how expressive the metalanguage is.

Serialization, GC Pressure, Code Reuse

Reducing Garbage Collection Pressure

Old Oil Stuff

  • Oil Blog Post: What is oheap?
    • Oheap V1 was a read-only format for transporting an ASDL tree across the Python/C++ boundary. Though it turned out there wasn't a single clean boundary in the shell's architecture.
  • OHeap2 was a read-write format, for OVM2.
    • At first I used it to represent .pyc files (Python's CodeObject)
    • Meant to replace marshal, pickle, and zipimport.
    • 4-16-N model: 4 byte refs, 16 byte cells, and N byte slabs.
    • See ovm2/oheap2.py
    • OHeap2 might still be a good idea, but I judged OVM2 not to be a good idea

Related

Clone this wiki locally