Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: add declaration variants to Statement #2847

Conversation

overlookmotel
Copy link
Collaborator

@overlookmotel overlookmotel commented Mar 28, 2024

An alternative solution to #2685.

Declaration variants are added to Statement, but the Declaration type is not removed, so can still be used independently.

The discriminants for the 2 enums are aligned, so Statement and Declaration have identical memory representations, and one can be transmuted to the other at zero cost. Type sizes remain the same, and no effect on performance.

Obviously it'd be preferable to avoid this entirely, but I'm having a great deal of trouble handling the nested enums on #2457, and this would really help.

Not making this PR with intent of it being merged as is, but just as a demonstration of what I'm exploring. @Boshen would you consider something like this?

NB: This could be simplified, removing the unsafe code, if Visit::visit_declaration doesn't need to be its own function and could be broken up into Visit::visit_var_declaration, Visit::visit_function_declaration etc. That would also remove a branch and probably speed up the visitor a little.

Copy link
Collaborator Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

Join @overlookmotel and the rest of your teammates on Graphite Graphite

@github-actions github-actions bot added A-linter Area - Linter A-parser Area - Parser A-semantic Area - Semantic A-minifier Area - Minifier A-ast Area - AST A-transformer Area - Transformer / Transpiler A-codegen Area - Code Generation A-prettier Area - Prettier labels Mar 28, 2024
Copy link

codspeed-hq bot commented Mar 28, 2024

CodSpeed Performance Report

Merging #2847 will not alter performance

Comparing 03-28-refactor_add_declaration_variants_to_Statement_ (6bcc103) with main (6177c2f)

Summary

✅ 36 untouched benchmarks

@Boshen
Copy link
Member

Boshen commented Mar 28, 2024

Oh this is clever for an intermediate step. Last time I tried but gave up because there were too many code changes.

Let me make a release before merging this. And then we'll continue the refactor.

@overlookmotel
Copy link
Collaborator Author

overlookmotel commented Mar 28, 2024

Thanks for coming back @Boshen.

Before we go ahead with this, can I just check you're really OK with it? The same nested enums problem also exists with other AST types, so this would be just the first of a series of PRs to do the same thing to each of them. Probably we shouldn't start down this path if you think it's going to get too much by the end.

I've started with the big one - Statement. All the other affected types are more rarely used, so changes will be lower impact (e.g. ExportDefaultDeclarationKind, AssignmentTargetMaybeDefault). But still, there'll be a good number of them - maybe 10 in total.

On the positive side, in a few cases we'll be able to reduce the size of the types by squashing 2 discriminants into 1.

2nd question: Does Visit::visit_declaration need to exist?

If we can get rid of it, we can make the changes in this PR much simpler, and remove the unsafe + transmuting.

Removing it wouldn't cause any problems within Oxc itself, but question is do external consumers need visit_declaration to exist in order to be able to override it in their own Visit impl? (they could still get whatever they need to do done by implementing visit_var_declaration, visit_function_declaration etc instead, but I don't know how Visit is used "in the wild", so I don't know if that'd be very unergonomic for users)

@Boshen
Copy link
Member

Boshen commented Mar 29, 2024

Before we go ahead with this, can I just check you're really OK with it?

It's ok.

But still, there'll be a good number of them - maybe 10 in total.

I think it's a case by case analysis.

Does Visit::visit_declaration need to exist?

Let's do the removal in another PR. Most of the usages are in the transformer, but we plan to nuke the current transformer, so we'll have less code to change.

@overlookmotel
Copy link
Collaborator Author

overlookmotel commented Mar 29, 2024

I think it's a case by case analysis.

It will only solve my problem on #2457 if we change them all, and make it a rule for AST types that there can be no nested enums. If we're not able/willing to do them all, there isn't much point in doing any of them (at least for my purposes in #2457) as I'll still have to find a way to handle the remaining cases.


The complete list of nested enums at present:

Statement -> Declaration

ArrayExpressionElement -> Expression
PropertyKey -> Expression
Argument -> Expression
JSXExpression -> Expression
ForStatementInit -> Expression
ExportDefaultDeclarationKind -> Expression
TSEnumMemberName -> Expression

ForStatementLeft -> AssignmentTarget
AssignmentTargetMaybeDefault -> AssignmentTarget
AssignmentTarget -> SimpleAssignmentTarget
AssignmentTarget -> AssignmentTargetPattern

TSTupleElement -> TSType
TSTypeQueryExprName -> TSTypeName
TSModuleReference -> TSTypeName

It's worse than I thought. Inlining all Expressions variants into ArrayExpressionElement and 6 other types seems impractical - at least without using a macro, and I know how you feel about that!

Let me think about it a bit more...


Going in the opposite direction, these enum variants could be unboxed to reduce indirection:

Statement::ModuleDeclaration: Box<ModuleDeclaration>
Expression::MemberExpression: Box<MemberExpression>
SimpleAssignmentTarget::MemberAssignmentTarget: Box<MemberExpression>
ChainElement::MemberExpression: Box<MemberExpression>

(side note: one nice thing about the types schema which #2457 produces is that it's really easy to run these kind of queries on the type definitions)

@Boshen
Copy link
Member

Boshen commented Mar 29, 2024

This is going to be impractical if we are going to bend the AST just for the sake of AST transfer 😢

@overlookmotel
Copy link
Collaborator Author

Well I had thought it's worth bending it to a degree, as in my view AST transfer is a pretty worthwhile feature (and as I said at the start, there would likely be some trade-offs involved to make it work). But, yes, probably this is too much.

Let me try again and see if I can find another solution to the problem of calculating niches for nested enums. Can we leave this PR open for now until I reach a conclusion?

It's annoying that Rust doesn't offer type introspection itself. It was going to but that effort fell apart due to a personal feud with/within the Rust Project.

@Boshen
Copy link
Member

Boshen commented Mar 29, 2024

Can we leave this PR open for now until I reach a conclusion?

Sure

It's annoying that Rust doesn't offer type introspection itself. It was going to but that effort fell apart due to a personal feud with/within the Rust Project.

I remember the drama ...

@overlookmotel
Copy link
Collaborator Author

Superceded by #3115.

Boshen pushed a commit that referenced this pull request Apr 28, 2024
OK, this is a big one...

I have done this as part of work on Traversable AST, but I believe it
has wider benefits, so thought better to spin it off into its own PR.

## What this PR does

This PR squashes all nested AST enum types (#2685).

e.g.: Previously:

```rs
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ...other Statement variants... */
    Declaration(Declaration<'a>),
}

pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>),
    /* ...other Declaration variants... */
}
```

After this PR:

```rs
#[repr(C, u8)]
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>) = 0,
    /* ...other Statement variants... */

    VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
    /* ...other Declaration variants... */
}

#[repr(C, u8)]
pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
    /* ...other Declaration variants... */
}
```

All `Declaration`'s variants are combined into `Statement`, but
`Declaration` type still exists.

As both types are `#[repr(C, u8)]`, and the discriminants are aligned, a
`Declaration` can be transmuted to a `Statement` at zero cost.

This is the same thing as #2847, but here applied to *all* nested enums
in the AST, and with improved helper methods.

No enums increase in size, and a few get smaller. Indirection is reduced
for some types (this removes multiple levels of boxing).

## Why?

1. It is a prerequisite for Traversable AST (#2987).
2. It would help a lot with AST Transfer (#2409) - it solves the only
remaining blocker for this.
3. It is a step closer to making the whole AST `#[repr(C)]`.

## Why is it a good thing for the AST to be `#[repr(C)]`?

Oxc's direction appears to be increasingly to build up control over the
fundamental primitives we use, in order to unlock performance and
features. We have our own allocator, our own custom implementations for
`Box` and `Vec`, our own `IndexVec` (TBC). The AST is the central
building block of Oxc, and taking control of its memory layout feels
like a step in this same direction.

Oxc has a major advantage over other similar libraries in that it keeps
all the AST data in an arena. This opens the door to treating the AST
either as Rust types or as *pure data* (just bytes). That data can be
moved around and manipulated beyond what Rust natively allows.

However, to enable that, the types need to be well-specified, with
completely stable layouts. `#[repr(C)]` is the only tool Rust provides
to do this.

Once the types are `#[repr(C)]`, various features become possible:

1. Cheap transfer of the AST across boundaries without ser/deser - the
property used by AST Transfer.
2. Having multiple versions of the AST (standard, read-only,
traversable), and these AST representations can be converted to one
other at zero cost via transmute - the property used by Traversable AST
scheme.
3. Caching AST data on disk (#3079) or transferring across network.
4. Stuff we haven't thought of yet!

Allowing the AST to be treated as pure data will likely unlock other
"next level" features further down the track (caching for "edge
bundling" comes to mind).

## The problem with `#[repr(C)]`

It's not *required* to squash nested enums to make the AST `#[repr(C)]`.

But the problem with `#[repr(C)]` is that it disables some compiler
optimizations. Without `#[repr(C)]`, the compiler squashes enums itself
in some cases (which is how `Statement` is currently 16 bytes). But
making the types `#[repr(C)]` as they are currently disables this
optimization.

So this PR essentially makes explicit what the compiler is already doing
- and in fact goes a bit further with the optimization than the compiler
is able to, in squashing 3 or 4 layers of nested enums (the compiler
only does up to 2 layers).

## Implementation

One enum "inheriting" variants from another is implemented with
`inherit_variants!` macro.

```rs
inherit_variants! {
#[repr(C, u8)]
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ...other Statement variants... */
    
    // `Declaration` variants added here by `inherit_variants!` macro
    @inherit Declaration
    // `ModuleDeclaration` variants added here by `inherit_variants!` macro
    @inherit ModuleDeclaration
}
}
```

The macro is *fairly* lightweight, and I think the above is quite easy
to understand. No proc macros.

The macro also implements utility methods for converting between enums
e.g. `Statement::as_declaration`. These methods are all zero-cost
(essentially transmutes).

New patterns for dealing with nested enums are introduced:

Creation:

```rs
// Old
let stmt = Statement::Declaration(Declaration::VariableDeclaration(var_decl));

// New
let stmt = Statement::VariableDeclaration(var_decl);
```

Conversion:

```rs
// Old
let stmt = Statement::Declaration(decl);

// New
let stmt = Statement::from(decl);
```

Testing:

```rs
// Old
if matches!(stmt, Statement::Declaration(_)) { }
if matches!(stmt, Statement::ModuleDeclaration(m) if m.is_import()) { }

// New
if stmt.is_declaration() { }
if matches!(stmt, Statement::ImportDeclaration(_)) { }
```

Branching:

```rs
// Old
if let Statement::Declaration(decl) = &stmt { decl.do_stuff() };

// New
if let Some(decl) = stmt.as_declaration() { decl.do_stuff() };
```

Matching:

```rs
// Old
match stmt {
    Statement::Declaration(decl) => visitor.visit(decl),
}

// New (exhaustive match)
match stmt {
    match_declaration!(Statement) => visitor.visit(stmt.to_declaration()),
}

// New (alternative)
match stmt {
    _ if stmt.is_declaration() => visitor.visit(stmt.to_declaration()),
}
```

New syntax has pluses and minuses vs the old. `match` syntax is worse,
but when working with a deeply nested enum, the code is much nicer -
it's shorter and easier to read.

This PR removes 200 lines from the linter with changes like this:


https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95

```diff
- let AssignmentTarget::SimpleAssignmentTarget(simple_assignment_target) =
-     &assignment_expr.left
- else {
-     return;
- };
- let SimpleAssignmentTarget::AssignmentTargetIdentifier(ident) =
-     simple_assignment_target
+ let AssignmentTarget::AssignmentTargetIdentifier(ident) = &assignment_expr.left
else {
    return;
};
```
todor-a pushed a commit to todor-a/oxc that referenced this pull request May 19, 2024
OK, this is a big one...

I have done this as part of work on Traversable AST, but I believe it
has wider benefits, so thought better to spin it off into its own PR.

## What this PR does

This PR squashes all nested AST enum types (oxc-project#2685).

e.g.: Previously:

```rs
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ...other Statement variants... */
    Declaration(Declaration<'a>),
}

pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>),
    /* ...other Declaration variants... */
}
```

After this PR:

```rs
#[repr(C, u8)]
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>) = 0,
    /* ...other Statement variants... */

    VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
    /* ...other Declaration variants... */
}

#[repr(C, u8)]
pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32,
    /* ...other Declaration variants... */
}
```

All `Declaration`'s variants are combined into `Statement`, but
`Declaration` type still exists.

As both types are `#[repr(C, u8)]`, and the discriminants are aligned, a
`Declaration` can be transmuted to a `Statement` at zero cost.

This is the same thing as oxc-project#2847, but here applied to *all* nested enums
in the AST, and with improved helper methods.

No enums increase in size, and a few get smaller. Indirection is reduced
for some types (this removes multiple levels of boxing).

## Why?

1. It is a prerequisite for Traversable AST (oxc-project#2987).
2. It would help a lot with AST Transfer (oxc-project#2409) - it solves the only
remaining blocker for this.
3. It is a step closer to making the whole AST `#[repr(C)]`.

## Why is it a good thing for the AST to be `#[repr(C)]`?

Oxc's direction appears to be increasingly to build up control over the
fundamental primitives we use, in order to unlock performance and
features. We have our own allocator, our own custom implementations for
`Box` and `Vec`, our own `IndexVec` (TBC). The AST is the central
building block of Oxc, and taking control of its memory layout feels
like a step in this same direction.

Oxc has a major advantage over other similar libraries in that it keeps
all the AST data in an arena. This opens the door to treating the AST
either as Rust types or as *pure data* (just bytes). That data can be
moved around and manipulated beyond what Rust natively allows.

However, to enable that, the types need to be well-specified, with
completely stable layouts. `#[repr(C)]` is the only tool Rust provides
to do this.

Once the types are `#[repr(C)]`, various features become possible:

1. Cheap transfer of the AST across boundaries without ser/deser - the
property used by AST Transfer.
2. Having multiple versions of the AST (standard, read-only,
traversable), and these AST representations can be converted to one
other at zero cost via transmute - the property used by Traversable AST
scheme.
3. Caching AST data on disk (oxc-project#3079) or transferring across network.
4. Stuff we haven't thought of yet!

Allowing the AST to be treated as pure data will likely unlock other
"next level" features further down the track (caching for "edge
bundling" comes to mind).

## The problem with `#[repr(C)]`

It's not *required* to squash nested enums to make the AST `#[repr(C)]`.

But the problem with `#[repr(C)]` is that it disables some compiler
optimizations. Without `#[repr(C)]`, the compiler squashes enums itself
in some cases (which is how `Statement` is currently 16 bytes). But
making the types `#[repr(C)]` as they are currently disables this
optimization.

So this PR essentially makes explicit what the compiler is already doing
- and in fact goes a bit further with the optimization than the compiler
is able to, in squashing 3 or 4 layers of nested enums (the compiler
only does up to 2 layers).

## Implementation

One enum "inheriting" variants from another is implemented with
`inherit_variants!` macro.

```rs
inherit_variants! {
#[repr(C, u8)]
pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ...other Statement variants... */
    
    // `Declaration` variants added here by `inherit_variants!` macro
    @inherit Declaration
    // `ModuleDeclaration` variants added here by `inherit_variants!` macro
    @inherit ModuleDeclaration
}
}
```

The macro is *fairly* lightweight, and I think the above is quite easy
to understand. No proc macros.

The macro also implements utility methods for converting between enums
e.g. `Statement::as_declaration`. These methods are all zero-cost
(essentially transmutes).

New patterns for dealing with nested enums are introduced:

Creation:

```rs
// Old
let stmt = Statement::Declaration(Declaration::VariableDeclaration(var_decl));

// New
let stmt = Statement::VariableDeclaration(var_decl);
```

Conversion:

```rs
// Old
let stmt = Statement::Declaration(decl);

// New
let stmt = Statement::from(decl);
```

Testing:

```rs
// Old
if matches!(stmt, Statement::Declaration(_)) { }
if matches!(stmt, Statement::ModuleDeclaration(m) if m.is_import()) { }

// New
if stmt.is_declaration() { }
if matches!(stmt, Statement::ImportDeclaration(_)) { }
```

Branching:

```rs
// Old
if let Statement::Declaration(decl) = &stmt { decl.do_stuff() };

// New
if let Some(decl) = stmt.as_declaration() { decl.do_stuff() };
```

Matching:

```rs
// Old
match stmt {
    Statement::Declaration(decl) => visitor.visit(decl),
}

// New (exhaustive match)
match stmt {
    match_declaration!(Statement) => visitor.visit(stmt.to_declaration()),
}

// New (alternative)
match stmt {
    _ if stmt.is_declaration() => visitor.visit(stmt.to_declaration()),
}
```

New syntax has pluses and minuses vs the old. `match` syntax is worse,
but when working with a deeply nested enum, the code is much nicer -
it's shorter and easier to read.

This PR removes 200 lines from the linter with changes like this:


https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95

```diff
- let AssignmentTarget::SimpleAssignmentTarget(simple_assignment_target) =
-     &assignment_expr.left
- else {
-     return;
- };
- let SimpleAssignmentTarget::AssignmentTargetIdentifier(ident) =
-     simple_assignment_target
+ let AssignmentTarget::AssignmentTargetIdentifier(ident) = &assignment_expr.left
else {
    return;
};
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-ast Area - AST A-codegen Area - Code Generation A-linter Area - Linter A-minifier Area - Minifier A-parser Area - Parser A-prettier Area - Prettier A-semantic Area - Semantic A-transformer Area - Transformer / Transpiler
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants