Combine `Statement` and `Declaration` types #2685

overlookmotel · 2024-03-11T15:05:25Z

Would it be a pain if we merged AST Statement and Declaration types into one?

Currently:

pub enum Statement<'a> {
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ... */
    ModuleDeclaration(Box<'a, ModuleDeclaration<'a>>),

    Declaration(Declaration<'a>),
}

pub enum Declaration<'a> {
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>),
    /* ... */
    TSImportEqualsDeclaration(Box<'a, TSImportEqualsDeclaration<'a>>),
}

Could we make it just 1 enum combining all the variants:

pub enum Statement<'a> {
    // From `Statement`
    BlockStatement(Box<'a, BlockStatement<'a>>),
    /* ... */
    ModuleDeclaration(Box<'a, ModuleDeclaration<'a>>),

    // From `Declaration`
    VariableDeclaration(Box<'a, VariableDeclaration<'a>>),
    /* ... */
    TSImportEqualsDeclaration(Box<'a, TSImportEqualsDeclaration<'a>>),
}

I'm asking because this would simplify #2409. It is still possible to make it work without this change, but it's a bit more brittle.

To explain:

For AST transfer, ideally we want all enums to be #[repr(u8)] so the byte value for each option's discriminant can be specified explicitly. Then they're completely locked and you can read the bytes representing a Statement in memory, and reliably interpret them.

But in this case, the compiler is cleverly merging the discriminants for Statement and Declaration using niche optimization, which means Statement is 16 bytes. If we make Statement #[repr(u8)], it disables the niche optimization, and Statement becomes 24 bytes. Not good!

There is a way around this without changing the types: Don't make Statement #[repr(u8)], and the compiler in practice is predictable in what discriminants it assigns. But the compiler makes no promise not to change that at any minute without warning, so while I don't think likely that it will break, it could break at any time.

To get to the point... It'd be simpler to sidestep the issue entirely by folding Declaration's variants into Statement. Question is how much of a pain would that be? So is it worth it?

NB: The same problem applies to a few other types (only ExportDefaultDeclarationKind and AssignmentTargetMaybeDefault that I'm aware of, but maybe there's a few more). But of course Statement is the one which gets used absolutely everywhere.

The text was updated successfully, but these errors were encountered:

Boshen · 2024-03-12T12:19:36Z

Let me give this a try and see if the APIs around this part of the AST become difficult to use.

Boshen · 2024-03-12T12:59:54Z

🤔 The node is reused here

oxc/crates/oxc_ast/src/ast/js.rs

Lines 2498 to 2501 in 0f86333

    
           pub struct ExportNamedDeclaration<'a> { 
        
               #[serde(flatten)] 
        
               pub span: Span, 
        
               pub declaration: Option<Declaration<'a>>,

Edit:

jsparagus:

#[derive(Debug, PartialEq)]
pub enum Export<'alloc> {
    FunctionDeclaration(Function<'alloc>),
    ClassDeclaration(ClassDeclaration<'alloc>),
    VariableDeclaration(VariableDeclaration<'alloc>),
}

#[derive(Debug, PartialEq)]
pub enum ExportDefault<'alloc> {
    FunctionDeclaration(Function<'alloc>),
    ClassDeclaration(ClassDeclaration<'alloc>),
    Expression(arena::Box<'alloc, Expression<'alloc>>),
}

🤔

let me try this tomorrow.

overlookmotel · 2024-03-12T16:00:29Z

🤔 The node is reused here

Ah. That does complicate matters.

Just to repeat, we don't have to do this. There is another way around it, it's just a bit more brittle.

Boshen · 2024-03-13T10:25:13Z

After some experimentation (2 hours 😅 ), I conclude the current structure is better than everything flattened.

overlookmotel · 2024-03-13T10:35:53Z

OK cool. Thanks for trying it out. I'll switch to plan B.

@inherit

OK, this is a big one... I have done this as part of work on Traversable AST, but I believe it has wider benefits, so thought better to spin it off into its own PR. ## What this PR does This PR squashes all nested AST enum types (#2685). e.g.: Previously: ```rs pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>), /* ...other Statement variants... */ Declaration(Declaration<'a>), } pub enum Declaration<'a> { VariableDeclaration(Box<'a, VariableDeclaration<'a>>), /* ...other Declaration variants... */ } ``` After this PR: ```rs #[repr(C, u8)] pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>) = 0, /* ...other Statement variants... */ VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32, /* ...other Declaration variants... */ } #[repr(C, u8)] pub enum Declaration<'a> { VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32, /* ...other Declaration variants... */ } ``` All `Declaration`'s variants are combined into `Statement`, but `Declaration` type still exists. As both types are `#[repr(C, u8)]`, and the discriminants are aligned, a `Declaration` can be transmuted to a `Statement` at zero cost. This is the same thing as #2847, but here applied to *all* nested enums in the AST, and with improved helper methods. No enums increase in size, and a few get smaller. Indirection is reduced for some types (this removes multiple levels of boxing). ## Why? 1. It is a prerequisite for Traversable AST (#2987). 2. It would help a lot with AST Transfer (#2409) - it solves the only remaining blocker for this. 3. It is a step closer to making the whole AST `#[repr(C)]`. ## Why is it a good thing for the AST to be `#[repr(C)]`? Oxc's direction appears to be increasingly to build up control over the fundamental primitives we use, in order to unlock performance and features. We have our own allocator, our own custom implementations for `Box` and `Vec`, our own `IndexVec` (TBC). The AST is the central building block of Oxc, and taking control of its memory layout feels like a step in this same direction. Oxc has a major advantage over other similar libraries in that it keeps all the AST data in an arena. This opens the door to treating the AST either as Rust types or as *pure data* (just bytes). That data can be moved around and manipulated beyond what Rust natively allows. However, to enable that, the types need to be well-specified, with completely stable layouts. `#[repr(C)]` is the only tool Rust provides to do this. Once the types are `#[repr(C)]`, various features become possible: 1. Cheap transfer of the AST across boundaries without ser/deser - the property used by AST Transfer. 2. Having multiple versions of the AST (standard, read-only, traversable), and these AST representations can be converted to one other at zero cost via transmute - the property used by Traversable AST scheme. 3. Caching AST data on disk (#3079) or transferring across network. 4. Stuff we haven't thought of yet! Allowing the AST to be treated as pure data will likely unlock other "next level" features further down the track (caching for "edge bundling" comes to mind). ## The problem with `#[repr(C)]` It's not *required* to squash nested enums to make the AST `#[repr(C)]`. But the problem with `#[repr(C)]` is that it disables some compiler optimizations. Without `#[repr(C)]`, the compiler squashes enums itself in some cases (which is how `Statement` is currently 16 bytes). But making the types `#[repr(C)]` as they are currently disables this optimization. So this PR essentially makes explicit what the compiler is already doing - and in fact goes a bit further with the optimization than the compiler is able to, in squashing 3 or 4 layers of nested enums (the compiler only does up to 2 layers). ## Implementation One enum "inheriting" variants from another is implemented with `inherit_variants!` macro. ```rs inherit_variants! { #[repr(C, u8)] pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>), /* ...other Statement variants... */ // `Declaration` variants added here by `inherit_variants!` macro @inherit Declaration // `ModuleDeclaration` variants added here by `inherit_variants!` macro @inherit ModuleDeclaration } } ``` The macro is *fairly* lightweight, and I think the above is quite easy to understand. No proc macros. The macro also implements utility methods for converting between enums e.g. `Statement::as_declaration`. These methods are all zero-cost (essentially transmutes). New patterns for dealing with nested enums are introduced: Creation: ```rs // Old let stmt = Statement::Declaration(Declaration::VariableDeclaration(var_decl)); // New let stmt = Statement::VariableDeclaration(var_decl); ``` Conversion: ```rs // Old let stmt = Statement::Declaration(decl); // New let stmt = Statement::from(decl); ``` Testing: ```rs // Old if matches!(stmt, Statement::Declaration(_)) { } if matches!(stmt, Statement::ModuleDeclaration(m) if m.is_import()) { } // New if stmt.is_declaration() { } if matches!(stmt, Statement::ImportDeclaration(_)) { } ``` Branching: ```rs // Old if let Statement::Declaration(decl) = &stmt { decl.do_stuff() }; // New if let Some(decl) = stmt.as_declaration() { decl.do_stuff() }; ``` Matching: ```rs // Old match stmt { Statement::Declaration(decl) => visitor.visit(decl), } // New (exhaustive match) match stmt { match_declaration!(Statement) => visitor.visit(stmt.to_declaration()), } // New (alternative) match stmt { _ if stmt.is_declaration() => visitor.visit(stmt.to_declaration()), } ``` New syntax has pluses and minuses vs the old. `match` syntax is worse, but when working with a deeply nested enum, the code is much nicer - it's shorter and easier to read. This PR removes 200 lines from the linter with changes like this: https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95 ```diff - let AssignmentTarget::SimpleAssignmentTarget(simple_assignment_target) = - &assignment_expr.left - else { - return; - }; - let SimpleAssignmentTarget::AssignmentTargetIdentifier(ident) = - simple_assignment_target + let AssignmentTarget::AssignmentTargetIdentifier(ident) = &assignment_expr.left else { return; }; ```

@inherit

OK, this is a big one... I have done this as part of work on Traversable AST, but I believe it has wider benefits, so thought better to spin it off into its own PR. ## What this PR does This PR squashes all nested AST enum types (oxc-project#2685). e.g.: Previously: ```rs pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>), /* ...other Statement variants... */ Declaration(Declaration<'a>), } pub enum Declaration<'a> { VariableDeclaration(Box<'a, VariableDeclaration<'a>>), /* ...other Declaration variants... */ } ``` After this PR: ```rs #[repr(C, u8)] pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>) = 0, /* ...other Statement variants... */ VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32, /* ...other Declaration variants... */ } #[repr(C, u8)] pub enum Declaration<'a> { VariableDeclaration(Box<'a, VariableDeclaration<'a>>) = 32, /* ...other Declaration variants... */ } ``` All `Declaration`'s variants are combined into `Statement`, but `Declaration` type still exists. As both types are `#[repr(C, u8)]`, and the discriminants are aligned, a `Declaration` can be transmuted to a `Statement` at zero cost. This is the same thing as oxc-project#2847, but here applied to *all* nested enums in the AST, and with improved helper methods. No enums increase in size, and a few get smaller. Indirection is reduced for some types (this removes multiple levels of boxing). ## Why? 1. It is a prerequisite for Traversable AST (oxc-project#2987). 2. It would help a lot with AST Transfer (oxc-project#2409) - it solves the only remaining blocker for this. 3. It is a step closer to making the whole AST `#[repr(C)]`. ## Why is it a good thing for the AST to be `#[repr(C)]`? Oxc's direction appears to be increasingly to build up control over the fundamental primitives we use, in order to unlock performance and features. We have our own allocator, our own custom implementations for `Box` and `Vec`, our own `IndexVec` (TBC). The AST is the central building block of Oxc, and taking control of its memory layout feels like a step in this same direction. Oxc has a major advantage over other similar libraries in that it keeps all the AST data in an arena. This opens the door to treating the AST either as Rust types or as *pure data* (just bytes). That data can be moved around and manipulated beyond what Rust natively allows. However, to enable that, the types need to be well-specified, with completely stable layouts. `#[repr(C)]` is the only tool Rust provides to do this. Once the types are `#[repr(C)]`, various features become possible: 1. Cheap transfer of the AST across boundaries without ser/deser - the property used by AST Transfer. 2. Having multiple versions of the AST (standard, read-only, traversable), and these AST representations can be converted to one other at zero cost via transmute - the property used by Traversable AST scheme. 3. Caching AST data on disk (oxc-project#3079) or transferring across network. 4. Stuff we haven't thought of yet! Allowing the AST to be treated as pure data will likely unlock other "next level" features further down the track (caching for "edge bundling" comes to mind). ## The problem with `#[repr(C)]` It's not *required* to squash nested enums to make the AST `#[repr(C)]`. But the problem with `#[repr(C)]` is that it disables some compiler optimizations. Without `#[repr(C)]`, the compiler squashes enums itself in some cases (which is how `Statement` is currently 16 bytes). But making the types `#[repr(C)]` as they are currently disables this optimization. So this PR essentially makes explicit what the compiler is already doing - and in fact goes a bit further with the optimization than the compiler is able to, in squashing 3 or 4 layers of nested enums (the compiler only does up to 2 layers). ## Implementation One enum "inheriting" variants from another is implemented with `inherit_variants!` macro. ```rs inherit_variants! { #[repr(C, u8)] pub enum Statement<'a> { BlockStatement(Box<'a, BlockStatement<'a>>), /* ...other Statement variants... */ // `Declaration` variants added here by `inherit_variants!` macro @inherit Declaration // `ModuleDeclaration` variants added here by `inherit_variants!` macro @inherit ModuleDeclaration } } ``` The macro is *fairly* lightweight, and I think the above is quite easy to understand. No proc macros. The macro also implements utility methods for converting between enums e.g. `Statement::as_declaration`. These methods are all zero-cost (essentially transmutes). New patterns for dealing with nested enums are introduced: Creation: ```rs // Old let stmt = Statement::Declaration(Declaration::VariableDeclaration(var_decl)); // New let stmt = Statement::VariableDeclaration(var_decl); ``` Conversion: ```rs // Old let stmt = Statement::Declaration(decl); // New let stmt = Statement::from(decl); ``` Testing: ```rs // Old if matches!(stmt, Statement::Declaration(_)) { } if matches!(stmt, Statement::ModuleDeclaration(m) if m.is_import()) { } // New if stmt.is_declaration() { } if matches!(stmt, Statement::ImportDeclaration(_)) { } ``` Branching: ```rs // Old if let Statement::Declaration(decl) = &stmt { decl.do_stuff() }; // New if let Some(decl) = stmt.as_declaration() { decl.do_stuff() }; ``` Matching: ```rs // Old match stmt { Statement::Declaration(decl) => visitor.visit(decl), } // New (exhaustive match) match stmt { match_declaration!(Statement) => visitor.visit(stmt.to_declaration()), } // New (alternative) match stmt { _ if stmt.is_declaration() => visitor.visit(stmt.to_declaration()), } ``` New syntax has pluses and minuses vs the old. `match` syntax is worse, but when working with a deeply nested enum, the code is much nicer - it's shorter and easier to read. This PR removes 200 lines from the linter with changes like this: https://github.com/oxc-project/oxc/pull/3115/files#diff-dc417ff57352da6727a760ec6dee22de6816f8231fb69dbef1bf05d478699103L92-R95 ```diff - let AssignmentTarget::SimpleAssignmentTarget(simple_assignment_target) = - &assignment_expr.left - else { - return; - }; - let SimpleAssignmentTarget::AssignmentTargetIdentifier(ident) = - simple_assignment_target + let AssignmentTarget::AssignmentTargetIdentifier(ident) = &assignment_expr.left else { return; }; ```

Boshen self-assigned this Mar 12, 2024

Boshen closed this as not planned Won't fix, can't repro, duplicate, stale Mar 13, 2024

overlookmotel mentioned this issue Mar 28, 2024

refactor: add declaration variants to Statement #2847

Closed

overlookmotel mentioned this issue Apr 27, 2024

refactor(ast): squash nested enums #3115

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combine `Statement` and `Declaration` types #2685

Combine `Statement` and `Declaration` types #2685

overlookmotel commented Mar 11, 2024 •

edited

Loading

Boshen commented Mar 12, 2024

Boshen commented Mar 12, 2024 •

edited

Loading

overlookmotel commented Mar 12, 2024

Boshen commented Mar 13, 2024

overlookmotel commented Mar 13, 2024

Combine Statement and Declaration types #2685

Combine Statement and Declaration types #2685

Comments

overlookmotel commented Mar 11, 2024 • edited Loading

Boshen commented Mar 12, 2024

Boshen commented Mar 12, 2024 • edited Loading

overlookmotel commented Mar 12, 2024

Boshen commented Mar 13, 2024

overlookmotel commented Mar 13, 2024

Combine `Statement` and `Declaration` types #2685

Combine `Statement` and `Declaration` types #2685

overlookmotel commented Mar 11, 2024 •

edited

Loading

Boshen commented Mar 12, 2024 •

edited

Loading