Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster passing ASTs from Rust to JS #2409

Open
overlookmotel opened this issue Feb 14, 2024 · 14 comments
Open

Faster passing ASTs from Rust to JS #2409

overlookmotel opened this issue Feb 14, 2024 · 14 comments
Labels
A-ast Area - AST

Comments

@overlookmotel
Copy link
Collaborator

overlookmotel commented Feb 14, 2024

Currently OXC's parser is extremely fast, but using it from NodeJS is not. The primary cause is the overhead of the JS/Rust boundary - specifically serializing/deserializing large AST structures, in order to pass them between the two "worlds".

Right now, it's not a problem, as OXC is mainly consumed as a Rust lib. However, I suspect that as OXC's transformer, linter, and minifier are built out and gain popularity, this may become a bottleneck, because people will be asking for a way to write transformer/linter/etc plugins in JavaScript, and the performance will not be up to their expectations.

Currently OXC uses JSON as the serialization format. There's a POC implementation using Flexbuffers, which I imagine is much faster.

However, I believe that OXC is uniquely placed to go one better, and cut the overhead of serialization/deserialization practically to zero - in a way that no other current tool that I'm aware of will be able to match.

Apologies in advance this is going to be a long one...

Background: Why I think this is important

JavaScript as we know it today is the result of a great spurt of innovation over the past decade (particularly around ES6). Babel was pivotal in that process. Many of the new language features (e.g. array destructuring) are essentially syntax sugar, and a working implementation as a Babel plugin became both a requirement of the TC39 process, and an important part of the process of developing and refining features - allowing people to test them out and suggest improvements etc.

At this point, the trend towards tooling written in native languages like Rust is irreversible. This is great for DX. However, it does have the unfortunate side effect of making those tools less accessible to JavaScript developers who only "speak" JS. And of course it's JS programmers who are most familiar with the language, most aware of what its rough edges are, and most motivated to play a role in improving the language.

I believe that to enable the continued evolution of JS, it's important to ensure that, as Babel fades into the distance, the new crop of tools replacing it also fulfil the role Babel has played up until now, allowing JS developers to prototype new language features in the language they know best - JavaScript.

Therefore I feel it's important that transformer plugins written in JS continue to be a thing.

More "selfishly", from the point of view of OXC, I think there is also a real opportunity here. Most people's needs will be mostly met by the most common plugins which OXC will offer as standard, implemented in Rust.

However, I would bet that there's a very long tail of projects/companies who rely on at least one less popular Babel/ESLint plugin, and are therefore currently blocked from migrating from Babel/ESLint to OXC/SWC/etc. This is likely a major pain point for them.

Pursuing a goal of satisfying every developer's needs by re-implementing every plugin that has any user base would be an immense maintenance burden. And many companies/developers will not have the capability to do it themselves in Rust. If OXC can offer a solution for plugins in JS, and unlock their path to much faster builds, it could be a significant driver to adoption.

How to do it?

I attempted to tackle exactly this problem on SWC a couple of years ago swc-project/swc#2175.

My first prototype using rkyv as serializer did show solid performance gains vs JSON - around 4x. I had the beginnings of a 2nd version which was way faster again, based on a much faster serializer. But performance was still in roughly same ballpark as Babel, rather than the order of magnitude improvement I was hoping for.

I came to the conclusion that only way to achieve that kind of improvement was to remove serialization from the equation entirely, and this could only be achieved by using an arena allocator. It became clear that SWC's maintainers did not feel JS plugins were a priority, and so would not consider that kind of fundamental re-architecting of the project to support it. So I abandoned the effort.

OXC, of course, already has an arena allocator at its core, so the largest problem is already solved.

How to destroy the overhead

It's really simple.

The requirements of a serialization format are that it must be reasonably space-efficient, and well-specified. Such a format already exists in OXC - the native Rust types for AST nodes.

So don't serialize at all!

OXC stores the entire AST in an arena. Rust can transfer the arena allocator's memory blocks via napi-rs to JavaScript, where it becomes NodeJS Buffer objects. This transfer is just passing pointers, involves no memory copying, and the overhead is close to zero.

On the JS side, you need a deserializer which understands the memory layout of the Rust types. This is the tricky part, but the deserializer code can be generated from a schema, or even from analysis of the type layouts within Rust itself (layout_inspect is a prototype of the latter approach).

(side note: TS type defs can also be auto-generated at same time)

From my experiments on SWC, the JS deserializer can be surprisingly performant (see graph here). Deserializing on JS side was twice as fast as Rust-side serialization with rkyv. I suspect that because the deserializer code is so simple and completely monomorphic, V8 is able to optimize it very effectively.

It's also possible to do the same in reverse. JS passes Buffers back to Rust, you reconstruct the arena, and just cast a pointer back to a &mut Program. Again, this is only possible because of the arena, and because all the AST node types are non-drop.

Complications

Enabling this would require some changes to OXC's internals, some of which are a bit annoying. So there are some trade-offs, and it might only be workable if the project feels it's appropriate to make JS plugins a "first class citizen" of OXC.

Stable type layouts

  1. All AST node types would need to be #[repr(C)] to ensure a stable layout. That's not a big deal in itself, I think, but the annoyance would be that e.g. bool fields would need to move to the last fields of types, to avoid excess padding.

  2. All AST enums would likely need to be #[repr(u8)] with explicit discriminators.

  3. Maybe there'd be a problem maintaining the niche optimization for Options, as deserializer needs to know the niche value for None, which Rust does not expose (I say "maybe" as I can see potential solutions to that).

These annoyances could be largely negated by using proc macros, but at the cost of increased compile times (not sure to what degree).

Strings

2 problems here:

  1. All the data for the AST must be in the arena, or part of the source text, so JS can access it. This imposes some constraints on what you can put in an Atom.

  2. Decoding strings from UTF-8 is the most costly part of the JS deserializer. Each decode involves a call across the JS/native boundary, which is a major slow down. So by far the most efficient way to handle it is to ensure all strings are stored together in one buffer, decode the whole lot in one go, and then slice up the resulting JS string to get each individual string. The allocator would probably need a separate StringStore arena. NB: This does not apply to strings which are already in the source text, as JS has that as a string already.

I don't think either of these are a big problem in the parser, but maybe they are in transformer or minifier?

Pointers

Box and Vec contain 64-bit pointers. On JS side, the deserializer needs to be able to convert a pointer to an offset in a Buffer, but JS does not have a u64 type. A further complication is that the arena is composed of multiple buffers.

This is doable without any changes to OXC's allocator. But to make it really fast might require a new arena allocator implementation which e.g. aligns buffers on 4 GiB memory boundaries, so only the bottom 32 bits of memory addresses are relevant. Or to have a 2nd allocator implementation which uses a WebAssembly.Memory as the backing storage for the arena. WASM Memory in V8 already has the 4 GiB alignment property, and can be extended dynamically up to 4 GiB without memory copies, so entire arena could be a single buffer.

In my opinion, replacing bumpalo could be a gain in itself anyway, as I don't think it's quite as optimized as it could be for OXC's types. But obviously that's significant work.

Further optimizations

Lazy deserialization

The above assumes that the entire AST needs to be deserialized on JS side. But in most cases, a plugin only cares about a few AST node types, which will comprise a small subset of the entire AST. Lazy deserialization could reduce the overhead of deserialization to only the parts of the AST which are actually needed.

Updating the AST

A transformer visitor on JS side could make whatever changes it wants to the AST by directly mutating the data in the buffer. No need to convert to JS Objects and then serialize it all back to a buffer. The user-facing API would hide this behind a "facade" of AST node objects with getters/setters, or Proxys.

This would be difficult to make work without breaking Rust's aliasing rules, as JavaScript allows shared mutable references. And the JS code writing to the buffer would essentially be fiddling with bytes in Rust's memory, so would need to be absolutely bullet-proof to ensure no UB.

This would be a real challenge, but the reward would be extreme speed. JS plugins will never be as fast as native, but my guess is that this could get them at least in the same ball park.

I would not propose that this be part of the v1 implementation, but the potential is I think worth considering when weighing up whether this effort overall is worthwhile or not.

WASM traverser

The number-crunching of following pointers and traversing the AST could be performed in WASM, with WASM returning control back to JS when it's found the next node the visitor wants. WASM is faster than JS, but crossing the JS/WASM boundary can in some circumstances be very low cost.

Conclusion

In my personal opinion:

  • This could be a very performant solution to a common need.
  • This feature could be an opportunity for OXC to differentiate it from other JS tools. Because most other tools don't use arena allocators, they could not do this even if they wanted to.
  • I believe everything I've outlined above is technically achievable.
  • But there are significant challenges, and it would be a large effort.
  • Implementation could proceed in incremental steps. A working first version would only require a subset of the above.
  • In a few cases, some trade-offs with OXC's other aims might be required.

My questions are:

  • Do you see any potential in this?
  • If there are trade-offs required, would they be worth it?

Hopefully it goes without saying that if you are willing to consider something along these lines, I would be keen to work on it.

One last thing: I'm not sure if there's currently a solution for linter plugins on the table, but if not, perhaps this could be it?

@overlookmotel
Copy link
Collaborator Author

By the way, I've opened this as an issue rather than discussion as issues have higher visibility, and I'd be really keen to get feedback from the community. But Boshen if you feel it'd be better as discussion, please feel free to move it across.

@ArnaudBarre
Copy link
Contributor

Thanks for taking time to investigate this! I personally think that JS interop is key to the success of tooling, so that the long tail of entreprise use case can craft their custom format & lint rules by just dropping a few lines of TS into a config. (which is for me one of the biggest reason Vite is preferred over Parcel).

This weekend I tried to plug OXC's parser into Prettier. The main blocker for me is the missing support for comments in the AST but the performance was already very noticeable. I think that in a first time, having full support of the AST so that people can build custom formatter, linter, bundler or plug in into existing tools would be really great for the adoption and the community.

I know that the only way to go fast is to own the full stack, but I personally think that speeding up tools like ESLint or Prettier by 3 is already a big deal and a more manageable scope in the short term.

@overlookmotel
Copy link
Collaborator Author

overlookmotel commented Feb 14, 2024

Thanks for going through this lengthy post and giving your thoughts @ArnaudBarre.

Good point! I had not considered the use case of other tools using OXC's parser stand-alone.

Making the NodeJS interface to OXC's parser faster would certainly have to be the first step in this process (though also not without its challenges). But it's nice to know you think benefits would start to become visible from that first stage, even before the next step of implementing a JS plugin framework for OXC's linter/transformer.

@Boshen
Copy link
Member

Boshen commented Feb 16, 2024

I thought I saw your name somewhere before but never recalled until you mentioned that swc PR.

I had numerous discussions with different people and we all concluded that AST transfer is in a dead end because of the conclusion of that PR.

Let me think about this for a bit before answering all the questions.

@overlookmotel
Copy link
Collaborator Author

overlookmotel commented Feb 16, 2024

Thanks for coming back Boshen, and thanks for reading through my essay. I wrote way too long!

Wow I never had any idea anyone really noticed that SWC issue, let alone discussed it.

Yes, it was a disappointing conclusion. Personally, I felt the main thing was it was a bad fit with SWC's priorities - they were firmly committed to WASM plugins - and I was disheartened that after a lot of work, it was clearly going nowhere. But, personally, it felt more like "wrong place, wrong time" than that the concept had been proved unviable in principle. So I chalked it up as "R&D".

Of course, think about it as much or as little as you like. Obviously I'm keen, but I'm also aware there are complications, and it may not be the best path forwards. I would appreciate your thoughts when you have time.

The only point I'd like to make is that there's one fundamental difference between SWC and OXC, which completely unlocks the problem: arenas. Ultimately, the idea is not particularly novel or revolutionary - sharing state by sharing memory - but it's the arena which makes that old paradigm possible in this context.

@Boshen
Copy link
Member

Boshen commented Feb 18, 2024

Here are my requirements after some research and thoughts:

  • Change UTF-8 to UTF-16 span positions
  • Serialize to estree for maximum ecosystem compatibility
  • Target a valuable use case to maximize the cost-benefit ratio of this feature task

@overlookmotel
Copy link
Collaborator Author

overlookmotel commented Feb 19, 2024

Thanks for coming back Boshen. Some questions:

  • Change UTF-8 to UTF-16 span positions

Do you mean everywhere? (i.e. start and end fields of the Span type in Rust become UTF-16 positions) Or just in the JavaScript version of AST?

I'm aware the parser currently relies on start and end being UTF-8 positions to e.g. slice strings from the source code &source[start..end]. But we could find another mechanism for that using SourcePositions (which would also be a little more performant).

But I don't know if linter/transformer/minifier also rely on Span positions being UTF-8?

  • Serialize to estree for maximum ecosystem compatibility

As far as I'm aware, the differences between OXC's AST and estree are quite minimal, so this should be doable without translation being costly. Is there a list somewhere of the differences? (I thought I saw one, but now can't find it)

  • Target a valuable use case to maximize the cost-benefit ratio of this feature task

The first step would need to be speeding up the NodeJS parser interface (oxc-parser). Same should be possible for @oxc-parser/wasm without too much difficulty.

But after that, what do you think should be highest priority?

An AST visitor in JS with lazy deserialization which offers only a read-only interface to the AST would be much easier to implement than one in which the AST can be mutated. I assume that'd be sufficient for linter plugins?

@ArnaudBarre
Copy link
Contributor

@overlookmotel I've added you to my WIP to explore OXC as a prettier parser. You can look at it to see that the number of difference with TSESTree start to be non trivial (I've not yet finished the mapping)

@overlookmotel
Copy link
Collaborator Author

overlookmotel commented Feb 19, 2024

@ArnaudBarre Thanks for sharing.

Doesn't look too bad. Some transformations are annoying (Typescript mostly), but in many cases, OXC's JSON output could be aligned with ESTree just by using #[serde(rename)] etc. And presumably can import all the tests from another ESTree implementation (e.g. Acorn), rather than writing a ton of tests from scratch.

I'd suggest best way to go about this would be to first get the current JSON output to be ESTree-compatible, and then work from there, replacing serde with "raw" transfer. Having a set of tests which already pass would make it much easier to catch any faults.

How complete is your translation implementation? Aside from the "TODO" comments, do you think there are a lot more differences still to be found?

@Boshen
Copy link
Member

Boshen commented Feb 20, 2024

But after that, what do you think should be highest priority?

After some consideration, let's put a milestone on prettier-oxc-parser. I don't have the time and energy to work on oxc_prettier, so speeding up prettier will bring its own values.


Let's gather up the requirements and a todo list after tonight.

@ArnaudBarre
Copy link
Contributor

I think there are still differences to be discovered, the one todo is where I was at when looking at AST node one by one (by manually comparing the types). I think I can finish this tonight!

@ArnaudBarre
Copy link
Contributor

I've pushed a new commits with some new diffs. I've tried running the parser on the typescript node_module folder and it hangs /lib/lib.dom.d.ts, I will investigate tomorrow!

@matthew-dean
Copy link

Just a thought - I think this is very important research and feature, but I would also add that it might be extremely valuable as a general-purpose strategy for any projects communicating between Rust and JS. (e.g. creating Rust-based tooling that allows JS plugins) so there may be some additional community value in documenting an independent package that efficiently does this conversion.

@overlookmotel
Copy link
Collaborator Author

Thanks for your thoughts @matthew-dean. If it works, I agree it could have wider applications beyond OXC. For now, we're only at an early stage, and I think best to focus on trying to make it work within OXC. But, yes, I'd be keen to share whatever findings come up in that process further down the line.

@Boshen Boshen added the A-ast Area - AST label Mar 10, 2024
Boshen pushed a commit that referenced this issue Apr 28, 2024
OK, this is a big one...

I have done this as part of work on Traversable AST, but I believe it
has wider benefits, so thought better to spin it off into its own PR.

## What this PR does

This PR squashes all nested AST enum types (#2685).

e.g.: Previously:

```rs
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ...other Statement variants... */
    Declaration(Declaration<'a>),
}

pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>),
    /* ...other Declaration variants... */
}
```

After this PR:

```rs
#[repr(C, u8)]
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>) = 0,
    /* ...other Statement variants... */

    VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
    /* ...other Declaration variants... */
}

#[repr(C, u8)]
pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
    /* ...other Declaration variants... */
}
```

All `Declaration`'s variants are combined into `Statement`, but
`Declaration` type still exists.

As both types are `#[repr(C, u8)]`, and the discriminants are aligned, a
`Declaration` can be transmuted to a `Statement` at zero cost.

This is the same thing as #2847, but here applied to *all* nested enums
in the AST, and with improved helper methods.

No enums increase in size, and a few get smaller. Indirection is reduced
for some types (this removes multiple levels of boxing).

## Why?

1. It is a prerequisite for Traversable AST (#2987).
2. It would help a lot with AST Transfer (#2409) - it solves the only
remaining blocker for this.
3. It is a step closer to making the whole AST `#[repr(C)]`.

## Why is it a good thing for the AST to be `#[repr(C)]`?

Oxc's direction appears to be increasingly to build up control over the
fundamental primitives we use, in order to unlock performance and
features. We have our own allocator, our own custom implementations for
`Box` and `Vec`, our own `IndexVec` (TBC). The AST is the central
building block of Oxc, and taking control of its memory layout feels
like a step in this same direction.

Oxc has a major advantage over other similar libraries in that it keeps
all the AST data in an arena. This opens the door to treating the AST
either as Rust types or as *pure data* (just bytes). That data can be
moved around and manipulated beyond what Rust natively allows.

However, to enable that, the types need to be well-specified, with
completely stable layouts. `#[repr(C)]` is the only tool Rust provides
to do this.

Once the types are `#[repr(C)]`, various features become possible:

1. Cheap transfer of the AST across boundaries without ser/deser - the
property used by AST Transfer.
2. Having multiple versions of the AST (standard, read-only,
traversable), and these AST representations can be converted to one
other at zero cost via transmute - the property used by Traversable AST
scheme.
3. Caching AST data on disk (#3079) or transferring across network.
4. Stuff we haven't thought of yet!

Allowing the AST to be treated as pure data will likely unlock other
"next level" features further down the track (caching for "edge
bundling" comes to mind).

## The problem with `#[repr(C)]`

It's not *required* to squash nested enums to make the AST `#[repr(C)]`.

But the problem with `#[repr(C)]` is that it disables some compiler
optimizations. Without `#[repr(C)]`, the compiler squashes enums itself
in some cases (which is how `Statement` is currently 16 bytes). But
making the types `#[repr(C)]` as they are currently disables this
optimization.

So this PR essentially makes explicit what the compiler is already doing
- and in fact goes a bit further with the optimization than the compiler
is able to, in squashing 3 or 4 layers of nested enums (the compiler
only does up to 2 layers).

## Implementation

One enum "inheriting" variants from another is implemented with
`inherit_variants!` macro.

```rs
inherit_variants! {
#[repr(C, u8)]
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ...other Statement variants... */
    
    // `Declaration` variants added here by `inherit_variants!` macro
    @inherit Declaration
    // `ModuleDeclaration` variants added here by `inherit_variants!` macro
    @inherit ModuleDeclaration
}
}
```

The macro is *fairly* lightweight, and I think the above is quite easy
to understand. No proc macros.

The macro also implements utility methods for converting between enums
e.g. `Statement::as_declaration`. These methods are all zero-cost
(essentially transmutes).

New patterns for dealing with nested enums are introduced:

Creation:

```rs
// Old
let stmt = Statement::Declaration(Declaration::VariableDeclaration(var_decl));

// New
let stmt = Statement::VariableDeclaration(var_decl);
```

Conversion:

```rs
// Old
let stmt = Statement::Declaration(decl);

// New
let stmt = Statement::from(decl);
```

Testing:

```rs
// Old
if matches!(stmt, Statement::Declaration(_)) { }
if matches!(stmt, Statement::ModuleDeclaration(m) if m.is_import()) { }

// New
if stmt.is_declaration() { }
if matches!(stmt, Statement::ImportDeclaration(_)) { }
```

Branching:

```rs
// Old
if let Statement::Declaration(decl) = &stmt { decl.do_stuff() };

// New
if let Some(decl) = stmt.as_declaration() { decl.do_stuff() };
```

Matching:

```rs
// Old
match stmt {
    Statement::Declaration(decl) => visitor.visit(decl),
}

// New (exhaustive match)
match stmt {
    match_declaration!(Statement) => visitor.visit(stmt.to_declaration()),
}

// New (alternative)
match stmt {
    _ if stmt.is_declaration() => visitor.visit(stmt.to_declaration()),
}
```

New syntax has pluses and minuses vs the old. `match` syntax is worse,
but when working with a deeply nested enum, the code is much nicer -
it's shorter and easier to read.

This PR removes 200 lines from the linter with changes like this:


https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95

```diff
- let AssignmentTarget::SimpleAssignmentTarget(simple_assignment_target) =
-     &assignment_expr.left
- else {
-     return;
- };
- let SimpleAssignmentTarget::AssignmentTargetIdentifier(ident) =
-     simple_assignment_target
+ let AssignmentTarget::AssignmentTargetIdentifier(ident) = &assignment_expr.left
else {
    return;
};
```
todor-a pushed a commit to todor-a/oxc that referenced this issue May 19, 2024
OK, this is a big one...

I have done this as part of work on Traversable AST, but I believe it
has wider benefits, so thought better to spin it off into its own PR.

## What this PR does

This PR squashes all nested AST enum types (oxc-project#2685).

e.g.: Previously:

```rs
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ...other Statement variants... */
    Declaration(Declaration<'a>),
}

pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>),
    /* ...other Declaration variants... */
}
```

After this PR:

```rs
#[repr(C, u8)]
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>) = 0,
    /* ...other Statement variants... */

    VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
    /* ...other Declaration variants... */
}

#[repr(C, u8)]
pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
    /* ...other Declaration variants... */
}
```

All `Declaration`'s variants are combined into `Statement`, but
`Declaration` type still exists.

As both types are `#[repr(C, u8)]`, and the discriminants are aligned, a
`Declaration` can be transmuted to a `Statement` at zero cost.

This is the same thing as oxc-project#2847, but here applied to *all* nested enums
in the AST, and with improved helper methods.

No enums increase in size, and a few get smaller. Indirection is reduced
for some types (this removes multiple levels of boxing).

## Why?

1. It is a prerequisite for Traversable AST (oxc-project#2987).
2. It would help a lot with AST Transfer (oxc-project#2409) - it solves the only
remaining blocker for this.
3. It is a step closer to making the whole AST `#[repr(C)]`.

## Why is it a good thing for the AST to be `#[repr(C)]`?

Oxc's direction appears to be increasingly to build up control over the
fundamental primitives we use, in order to unlock performance and
features. We have our own allocator, our own custom implementations for
`Box` and `Vec`, our own `IndexVec` (TBC). The AST is the central
building block of Oxc, and taking control of its memory layout feels
like a step in this same direction.

Oxc has a major advantage over other similar libraries in that it keeps
all the AST data in an arena. This opens the door to treating the AST
either as Rust types or as *pure data* (just bytes). That data can be
moved around and manipulated beyond what Rust natively allows.

However, to enable that, the types need to be well-specified, with
completely stable layouts. `#[repr(C)]` is the only tool Rust provides
to do this.

Once the types are `#[repr(C)]`, various features become possible:

1. Cheap transfer of the AST across boundaries without ser/deser - the
property used by AST Transfer.
2. Having multiple versions of the AST (standard, read-only,
traversable), and these AST representations can be converted to one
other at zero cost via transmute - the property used by Traversable AST
scheme.
3. Caching AST data on disk (oxc-project#3079) or transferring across network.
4. Stuff we haven't thought of yet!

Allowing the AST to be treated as pure data will likely unlock other
"next level" features further down the track (caching for "edge
bundling" comes to mind).

## The problem with `#[repr(C)]`

It's not *required* to squash nested enums to make the AST `#[repr(C)]`.

But the problem with `#[repr(C)]` is that it disables some compiler
optimizations. Without `#[repr(C)]`, the compiler squashes enums itself
in some cases (which is how `Statement` is currently 16 bytes). But
making the types `#[repr(C)]` as they are currently disables this
optimization.

So this PR essentially makes explicit what the compiler is already doing
- and in fact goes a bit further with the optimization than the compiler
is able to, in squashing 3 or 4 layers of nested enums (the compiler
only does up to 2 layers).

## Implementation

One enum "inheriting" variants from another is implemented with
`inherit_variants!` macro.

```rs
inherit_variants! {
#[repr(C, u8)]
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ...other Statement variants... */
    
    // `Declaration` variants added here by `inherit_variants!` macro
    @inherit Declaration
    // `ModuleDeclaration` variants added here by `inherit_variants!` macro
    @inherit ModuleDeclaration
}
}
```

The macro is *fairly* lightweight, and I think the above is quite easy
to understand. No proc macros.

The macro also implements utility methods for converting between enums
e.g. `Statement::as_declaration`. These methods are all zero-cost
(essentially transmutes).

New patterns for dealing with nested enums are introduced:

Creation:

```rs
// Old
let stmt = Statement::Declaration(Declaration::VariableDeclaration(var_decl));

// New
let stmt = Statement::VariableDeclaration(var_decl);
```

Conversion:

```rs
// Old
let stmt = Statement::Declaration(decl);

// New
let stmt = Statement::from(decl);
```

Testing:

```rs
// Old
if matches!(stmt, Statement::Declaration(_)) { }
if matches!(stmt, Statement::ModuleDeclaration(m) if m.is_import()) { }

// New
if stmt.is_declaration() { }
if matches!(stmt, Statement::ImportDeclaration(_)) { }
```

Branching:

```rs
// Old
if let Statement::Declaration(decl) = &stmt { decl.do_stuff() };

// New
if let Some(decl) = stmt.as_declaration() { decl.do_stuff() };
```

Matching:

```rs
// Old
match stmt {
    Statement::Declaration(decl) => visitor.visit(decl),
}

// New (exhaustive match)
match stmt {
    match_declaration!(Statement) => visitor.visit(stmt.to_declaration()),
}

// New (alternative)
match stmt {
    _ if stmt.is_declaration() => visitor.visit(stmt.to_declaration()),
}
```

New syntax has pluses and minuses vs the old. `match` syntax is worse,
but when working with a deeply nested enum, the code is much nicer -
it's shorter and easier to read.

This PR removes 200 lines from the linter with changes like this:


https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95

```diff
- let AssignmentTarget::SimpleAssignmentTarget(simple_assignment_target) =
-     &assignment_expr.left
- else {
-     return;
- };
- let SimpleAssignmentTarget::AssignmentTargetIdentifier(ident) =
-     simple_assignment_target
+ let AssignmentTarget::AssignmentTargetIdentifier(ident) = &assignment_expr.left
else {
    return;
};
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ast Area - AST
Projects
None yet
Development

No branches or pull requests

4 participants