Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(parser): serialize to estree #2463

Open
Boshen opened this issue Feb 21, 2024 · 32 comments
Open

feat(parser): serialize to estree #2463

Boshen opened this issue Feb 21, 2024 · 32 comments

Comments

@Boshen
Copy link
Member

Boshen commented Feb 21, 2024

We should serialize our AST to estree, to create less chaos in the JavaScript ecosystem.

We need to:

    • align the smaller parts to estree by modifying the AST structs directly
    • write serde serializers for the bigger ones
@overlookmotel
Copy link
Collaborator

@ArnaudBarre I see from your Github repos that you mostly code in JS/TS, so I don't know if you're comfortable writing Rust. But if you are, would you consider assisting with this?

What I'm thinking is that rather than building out further your OXC prettier parser layer in JS, could you instead submit the changes upstream in OXC?

For the mismatches where all that's needed is changing field names, it's as simple as e.g.:

pub struct ComputedMemberExpression<'a> {
     #[cfg_attr(feature = "serde", serde(flatten))]
     pub span: Span,
     pub object: Expression<'a>,
+    #[cfg_attr(feature = "serde", serde(rename = "property"))]
     pub expression: Expression<'a>,
     pub optional: bool, // for optional chaining
 }

Where difference is snake_case vs camelCase:

#[derive(Debug, Hash)]
-#[cfg_attr(feature = "serde", derive(Serialize), serde(tag = "type"))]
+#[cfg_attr(feature = "serde", derive(Serialize), serde(tag = "type", rename_all = "camelCase"))]
#[cfg_attr(all(feature = "serde", feature = "wasm"), derive(tsify::Tsify))]
pub struct NewExpression<'a> {
    #[cfg_attr(feature = "serde", serde(flatten))]
    pub span: Span,
    pub callee: Expression<'a>,
    pub arguments: Vec<'a, Argument<'a>>,
    pub type_parameters: Option<Box<'a, TSTypeParameterInstantiation<'a>>>,
}

To cover all the node.importKind = node.importKind.type transforms, I think all we need is:

#[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
-#[cfg_attr(feature = "serde", derive(Serialize), serde(tag = "type", rename_all = "camelCase"))]
+#[cfg_attr(feature = "serde", derive(Serialize), serde(rename_all = "camelCase"))]
#[cfg_attr(all(feature = "serde", feature = "wasm"), derive(tsify::Tsify))]
pub enum ImportOrExportKind {
    Value,
    Type,
}

Obviously, there are other more complex changes to be made which will require manual impl of serde::Serialize (horrible!), but if we could at least deal with the simpler ones first in this manner, that'd make it easier to see what's remaining.

If you're not comfortable with Rust, no worries. I can make the changes, and your work on identifying what they are would be a huge help with that. Please just let me know which way you'd like to go.

@overlookmotel
Copy link
Collaborator

@Boshen I think you mentioned that you previously had a branch where you'd written the impl Serde::Serializes for translation to ESTree. Do you have any idea what might be? I can do it from scratch, but if you already did it and can find that code...

@ArnaudBarre
Copy link
Contributor

Never done Rust but adding metadata to structs or refactoring property names via IDE should be doable! The complex part is to decide when to update the rust name and when to update just the serialize name/struct.

Let's start with the AST nodes with a slight name mismatch:

TSESTree OXC
ArrowFunctionExpression ArrowExpression
NumericLiteral NumberLiteral
Property BindingProperty
TSThisType TSThisKeyword
TSTypeOperator TSTypeOperatorType

@overlookmotel
Copy link
Collaborator

overlookmotel commented Feb 21, 2024

Never done Rust but adding metadata to structs or refactoring property names via IDE should be doable!

Thanks for your willingness! I'm very happy to help if you need any pointers.

Boshen may be happy for us to rename the actual Rust struct property names, but to start with I'd suggest just using serde(rename) (like ComputedMemberExpression example above). This will alter the JSON output, without changing the actual Rust types, so no API changes for anyone consuming OXC as a Rust lib.

To test out changes you make via the Node API, you can build with:

cd napi/parser
npm run build

That builds the Rust binary, and also a JS binding napi/parser/index.js which you can test out from your NodeJS code with a relative import e.g. import { type ParserOptions, parseSync } from "../../oxc/napi/parser/index.js";.

Just to explain what the complex-looking #[cfg_attr(...)] syntax is about:

Support for JSON serialization is only included in the build if you compile with the serde "feature" enabled. serde is the serialization library, and it's relatively heavy, so it's only included if the consumer requests it by enabling the "feature" (it's enabled in the Node API build).

serde(rename) etc is how you instruct Serde library how you want your serialized JSON to look.

#[cfg_attr(feature = "serde", serde(rename = "property"))] says to compiler "ignore everything else in this #[...] unless 'serde' feature is enabled".

Hope this is useful background. Please excuse me if I'm telling stuff that you already know/is obvious. It wasn't obvious at all to me when I started learning Rust!

@Boshen
Copy link
Member Author

Boshen commented Feb 21, 2024

I can help with the changes, but I prefer a test infrastructure in-place first 😁

Probably running prettier tests with oxc-prettier-parser in CI?

@overlookmotel
Copy link
Collaborator

overlookmotel commented Feb 21, 2024

I can help with the changes, but I prefer a test infrastructure in-place first 😁

Yes, makes sense. Would you want to set that up, or shall I look into it?

Probably running prettier tests with oxc-prettier-parser in CI?

I imagine that'd do it. If we wanted to be really thorough, could also run Acorn's tests which I assume will specifically test every possible AST node type. But I don't think Acorn supports TypeScript, so would maybe need typescript-eslint's tests too. Not sure how much time setting all that up would take - maybe worth it, maybe not.

@overlookmotel
Copy link
Collaborator

overlookmotel commented Feb 21, 2024

Let's start with the AST nodes with a slight name mismatch

To change type names e.g.:

#[derive(Debug, Clone)]
-#[cfg_attr(feature = "serde", derive(Serialize), serde(tag = "type"))]
+#[cfg_attr(feature = "serde", derive(Serialize), serde(tag = "type", rename = "NumericLiteral"))]
#[cfg_attr(all(feature = "serde", feature = "wasm"), derive(tsify::Tsify))]
pub struct NumberLiteral<'a> {
    #[cfg_attr(feature = "serde", serde(flatten))]
    pub span: Span,
    pub value: f64,
    #[cfg_attr(feature = "serde", serde(skip))]
    pub raw: &'a str,
    #[cfg_attr(feature = "serde", serde(skip))]
    pub base: NumberBase,
}

@Boshen
Copy link
Member Author

Boshen commented Feb 21, 2024

Yes, makes sense. Would you want to set that up, or shall I look into it?

I'll look into it.

To change type names e.g.:

I'll just change the struct names instead 😅

@Boshen
Copy link
Member Author

Boshen commented Feb 21, 2024

According the the prettier documentation, it seems like we just need to align with @typescript-eslint/typescript-estree:

https://prettier.io/docs/en/options.html#parser

By making a new package that exports the exact same stuff as https://github.com/typescript-eslint/typescript-eslint/tree/main/packages/typescript-estree

as then patch prettier's package.json's "@typescript-eslint/typescript-estree": "7.0.1" with our own package.

The final setup would just be running yarn test.

@Boshen
Copy link
Member Author

Boshen commented Feb 21, 2024

hmm ... it's harder than I thought, it requires more information beyond just the AST

https://github.com/prettier/prettier/blob/53111dd6553084c5108a10d5aa91d1c4bbb75454/src/language-js/parse/typescript.js#L15-L18

@ArnaudBarre
Copy link
Contributor

@overlookmotel Thanks a lot for the explication!

Yeah the main blocker will be adding comments to the AST.

I don't know how this can be done with performance in mind, but even for bundles/minifiers related task done all in rust, having, being able to check if functions have special comments will be required. I know that esbuild does does some heuristic to keep comments only in places where it might contain some bundler information, but for formatting purpose all comments are needed :/

@Boshen
Copy link
Member Author

Boshen commented Feb 21, 2024

We have comments in an array, I'll get it returned to JS.

@Boshen
Copy link
Member Author

Boshen commented Feb 21, 2024

Released napi oxc-parser@0.5.0

  • Returned comments
  • Renamed AST nodes:
TSESTree OXC
ArrowFunctionExpression ArrowExpression
NumericLiteral NumberLiteral
TSThisType TSThisKeyword
TSTypeOperator TSTypeOperatorType

(Property isn't changed yet)

@ArnaudBarre
Copy link
Contributor

Cool I'll keep you updated tonight with my next blocker!

@overlookmotel
Copy link
Collaborator

@Boshen Can we also rename the fields in Rust types to match ESTree? e.g. ComputedMemberExpression's expression -> property.

Personally, I think yes, as it'll be more familiar to contributors, and it sounds like breaking changes are not a big deal pre-1.0, but maybe there's some other context I'm not aware of.

@ArnaudBarre
Copy link
Contributor

Actually Property in TSESTree map to BindingProperty & ObjectProperty so we have the same situation as for identifiers in #2521

Boshen pushed a commit that referenced this issue Feb 27, 2024
A few more I missed in #2506. Re #2463.

Only remaining snake_case in the current types of the AST:
`trailing_comma` in ArrayExpression, ObjectExpression &
ArrayAssignmentTarget.
@Boshen
Copy link
Member Author

Boshen commented Feb 27, 2024

Let me sit on renaming these spec names to estree names for a bit. I'm in favor of full estree compatibility but I need to think about the consequences and maintainability issues.

@kazupon
Copy link

kazupon commented Feb 27, 2024

hmm ... it's harder than I thought, it requires more information beyond just the AST

https://github.com/prettier/prettier/blob/53111dd6553084c5108a10d5aa91d1c4bbb75454/src/ language-js/parse/typescript.js#L15-L18

I think @sosukesuzuki might give you something advice.

Boshen added a commit that referenced this issue Mar 3, 2024
…ression (#2574)

That a tricky one, because it's time to decide what does ESTree
compliant means in the TS world (re #2463)

This code:

```ts
export declare class ByteBuffer {
  clear(): void;
     // ^^
}
```

- Is parsed by
[Babel](https://astexplorer.net/#/gist/9f43cf0fe91760e11f1572934681c92f/d38530204eaa6e12dbd14f3ef2d7d2fcb7da7bc2)
as `FunctionExpression` with an empty body
- By
[TS-ESLint](https://astexplorer.net/#/gist/626dd346956a02c221d4e128450652af/4ea4e2feb5ae7bb8787f8c7e452d442c935c1f09)
as
[TSEmptyBodyFunctionExpression](typescript-eslint/typescript-eslint#1289)
- By
[OXC](https://oxc-project.github.io/oxc/playground/?code=3YCAAIC1gICAgICAgICyHorESipoTXPdvBaE9wxzlOraoWs19SUxDvdcwSVU0kbBO2b7ppX3x2P5IhQlpGHOYEHNCEfLf38HUICA)
as `TSDeclareFunction`

I'm going the easy way to fix this to the Babel way, but I think
following TS-ESLint would make sense. There is an [open babel
issue](babel/babel#13878) about that.

Edit: Ok that not so easy and require updating some logic.

---------

Co-authored-by: Boshen <boshenc@gmail.com>
@ArnaudBarre
Copy link
Contributor

I got my first mismatch because of utf-8/utf-16 mismatch. I saw that sourcemap have been merged, does this mean that the parser now tracks utf-16 positions?

@overlookmotel
Copy link
Collaborator

overlookmotel commented Mar 4, 2024

As far as I can see, this is only in the codegen so far. We could do UTF-8 -> UTF-16 mapping in the lexer, but problem is where to store the UTF-16 spans - we don't want to add them to the AST as that'd be an extra 8 bytes for every AST node. So it needs to be done, but exactly how is a bit of an open question.

Maybe we could move "line tables" creation from codegen to the parser. But we should probably discuss that on #959.

By the way, I see you're banging out PRs at breakneck pace. Brilliant. If you really hadn't touched Rust before, then I must say you're a very quick learner!

@ArnaudBarre
Copy link
Contributor

Yeah better to discuss this there!

Thanks! I was told Rust error messages are great and I can confirm. And in IDE copilot chat is also really cool when navigating in unknown codebase. For now I've always stopped when it worked so I don't always understand things, if I want to do more serious things I will read the book!

a-rustacean pushed a commit to a-rustacean/oxc that referenced this issue Mar 4, 2024
…ression (oxc-project#2574)

That a tricky one, because it's time to decide what does ESTree
compliant means in the TS world (re oxc-project#2463)

This code:

```ts
export declare class ByteBuffer {
  clear(): void;
     // ^^
}
```

- Is parsed by
[Babel](https://astexplorer.net/#/gist/9f43cf0fe91760e11f1572934681c92f/d38530204eaa6e12dbd14f3ef2d7d2fcb7da7bc2)
as `FunctionExpression` with an empty body
- By
[TS-ESLint](https://astexplorer.net/#/gist/626dd346956a02c221d4e128450652af/4ea4e2feb5ae7bb8787f8c7e452d442c935c1f09)
as
[TSEmptyBodyFunctionExpression](typescript-eslint/typescript-eslint#1289)
- By
[OXC](https://oxc-project.github.io/oxc/playground/?code=3YCAAIC1gICAgICAgICyHorESipoTXPdvBaE9wxzlOraoWs19SUxDvdcwSVU0kbBO2b7ppX3x2P5IhQlpGHOYEHNCEfLf38HUICA)
as `TSDeclareFunction`

I'm going the easy way to fix this to the Babel way, but I think
following TS-ESLint would make sense. There is an [open babel
issue](babel/babel#13878) about that.

Edit: Ok that not so easy and require updating some logic.

---------

Co-authored-by: Boshen <boshenc@gmail.com>
@overlookmotel
Copy link
Collaborator

By the way, I'm keeping #2457 up to date on top of all the changes to AST you're making. So far, all the changes are working for "raw" transfer too. We're getting there...

@Boshen
Copy link
Member Author

Boshen commented Mar 5, 2024

We reached the consensus of serializing to a estree compatible AST, all other issues can be raised separately.

@Boshen Boshen closed this as completed Mar 5, 2024
Boshen pushed a commit that referenced this issue Mar 8, 2024
A step towards #2463.

This PR adds `rest` onto end of `elements` / `properties` array in JSON
AST for `ObjectPattern`, `ArrayPattern`, `ObjectAssignmentTarget`,
`ArrayAssignmentTarget` and `FormalParameters`.
@overlookmotel overlookmotel reopened this Mar 13, 2024
@overlookmotel
Copy link
Collaborator

overlookmotel commented Mar 13, 2024

Re-opening this so we can track progress towards this goal in one place. It's proved a little difficult using separate issues for each change, as many of them are tiny changes, and often interconnected.

@ArnaudBarre Can we catch up on where we're up to, and what remains?

I'd imagine quite a bit of what's left you'll be able to make the changes Rust-side just by adding #[serde(rename)] tags. But please do let me know which parts likely need custom Serialize impls, and I can write them.

I've been following your prettier-oxc-parser JS-side transform and trying to implement what I can, but I'm getting to the point where I can't follow all of it.

For example, in a bunch of types, there is setProp(node, "typeArguments", node.typeParameters) which looks like renaming the field, but then node.typeParameters isn't deleted. I assume this is just a mistake, but would prefer to be sure before committing the change.

By the way, part of my motivation to implement all transforms on Rust-side is it makes it much easier to keep #2457 in sync. If it's all done in serde, then I can just make sure that parse_sync_raw's output matches parse_sync.

@ArnaudBarre
Copy link
Contributor

ArnaudBarre commented Mar 13, 2024

This is not exhaustive but should cover all 'types' of differences to resolve:

  • Part of the AST is still based on the "import assertion" spec and should be updated. I want to tackle this I think it's a good next for me to do.
  • setProp(node, "typeArguments", node.typeParameters) is not a mistake, it's because TSESLint deprecated typeParameters in multiple nodes for typeArguments but keep both for now for backward compatibility. I'm not sure what is the right move for OXC here. For ref: fix: rename typeParameters to typeArguments where needed typescript-eslint/typescript-eslint#5384
  • Sometimes node are missing boolean, like prefix: true on UnaryExpression. I don't yet know how to do that in serde
  • StaticMemberExpression, ComputedMemberExpression & PrivateFieldExpression need to be serialized to MemberExpression with few changes
  • StringLiteral, BooleanLiteral, ... need to be serialized as Literal with raw value set. For BigInt, because it's not in the JSON spec, it would still requires some work on the JS side but this can be done as part of the reviver function when desrializing.
  • modifiers is not an ESTree concept. So anytime there is this prop in OXC it probably means you need to set one or two boolean on the equivalent node. Examples are VariableDeclaration.declare, ClassExpression.abstract, TSEnumDeclaration.const. My take here is that it will make the OXC AST more clean and easy to explore to have named booleans on nodes instead of a comment to say which one are 'allowed'. But I don't know the implication of this.
  • Some types that are nullable in OXC default to empty array in ESTree, like TSInterfaceDeclaration.extends, ClassExpression.implements
  • FunctionBody is a BlockStatement in ESTree
  • FormalParameters is closer to ESTree now, but should be inlined directly into <node>.params in multiple nodes
  • TSMappedTypeModifierOperator is boolean | '-' | '+' | undefined in TSESTree, I think my conversion is incomplete at the moment
  • UsingDeclaration is a VariableDeclaration (Not done in my wrapper)
  • Not sure yet, but the structure of hashbang, comments & directives needs some updates

Some changes could be seen as rename only, but as soon we touch the name of the nodes the type generation became complex to handle that why I didn't push more PRs for that for now.

For info the TSUnionType hack is really prettier specific.

But the biggest blocker to make my repo public is the utf16 conversion. The time to print some large files goes in seconds with the dump implementation, and because I saw you plan to solve that I don't want to optimize for it.

Some good news is that my repo successfully gave the same output of prettier on ~500 TSX files, ~8k dts files & 10k files inside my node_modules. This is obviously a very biased dataset, but still promising!

@overlookmotel
Copy link
Collaborator

overlookmotel commented Mar 14, 2024

Thanks loads for writing this up. A slightly daunting list. Thank god we are doing this as a team effort!

  • Part of the AST is still based on the "import assertion" spec and should be updated. I want to tackle this I think it's a good next for me to do.

OK great. If I've understood you right, this sounds like maybe we should alter the Rust types to align better, rather than translating in serde. @Boshen Do you agree?

Ah I see. In typescript-eslint/typescript-eslint#5384 (comment) they say "in v7 we remove the old property". v7 landed last month but unfortunately it doesn't look like they followed through on that. Nonetheless, the old typeParameters was deprecated in v6 which is now 1 major behind.

In my view, I think we should just rename the field to typeArguments, and go with that. If it turns out later that duplicate typeParameters field is required for compat with e.g. old ESLint plugins, we could deal with it then.

Duplicating the field would require a custom impl serde::Serialize and I think these are best avoided as much as possible as it makes Oxc's code less maintainable.

  • Sometimes node are missing boolean, like prefix: true on UnaryExpression. I don't yet know how to do that in serde

I don't think you can with what serde offers out-of-the-box. We can do it with a custom Serialize impl, but maybe I can come up with a better way. Please leave that question with me for a bit...

  • StaticMemberExpression, ComputedMemberExpression & PrivateFieldExpression need to be serialized to MemberExpression with few changes.
  • StringLiteral, BooleanLiteral, ... need to be serialized as Literal with raw value set.

I can do these. Now we've done it with identifiers, we have a good path to follow, won't take long.

For BigInt, because it's not in the JSON spec, it would still requires some work on the JS side but this can be done as part of the reviver function when deserializing.

I think we can do this in serde. Looks like JS AST doesn't need to contain an actual BigInt (see @typescript-eslint/parser's and acorn's output). value field is a string "(BigInt) 10n".

Again, requires a custom impl Serialize. Grr! Another one for my list.

  • modifiers is not an ESTree concept. So anytime there is this prop in OXC it probably means you need to set one or two boolean on the equivalent node. Examples are VariableDeclaration.declare, ClassExpression.abstract, TSEnumDeclaration.const. My take here is that it will make the OXC AST more clean and easy to explore to have named booleans on nodes instead of a comment to say which one are 'allowed'. But I don't know the implication of this.

If compat with ESTree is now a primary goal for Oxc, then yes I agree would be good to consider altering the Rust types, but maybe tricky without bloating size of the AST types that contain them. @Boshen?

  • Some types that are nullable in OXC default to empty array in ESTree, like TSInterfaceDeclaration.extends, ClassExpression.implements
  • FunctionBody is a BlockStatement in ESTree

I'll look into these 2. Hopefully can use #[serde(serialize_with)] for these.

  • FormalParameters is closer to ESTree now, but should be inlined directly into <node>.params in multiple nodes

I'll take a look. May be quite simple.

  • TSMappedTypeModifierOperator is boolean | '-' | '+' | undefined in TSESTree, I think my conversion is incomplete at the moment
  • UsingDeclaration is a VariableDeclaration (Not done in my wrapper)
  • Not sure yet, but the structure of hashbang, comments & directives needs some updates

OK, let's leave these for now until you've identified exactly what's needed.

Some changes could be seen as rename only, but as soon we touch the name of the nodes the type generation became complex to handle that why I didn't push more PRs for that for now.

As above, we now have a good path to follow for these, but it's a bit tricky. Totally fine to leave it to me / Boshen.

But the biggest blocker to make my repo public is the utf16 conversion. The time to print some large files goes in seconds with the dump implementation, and because I saw you plan to solve that I don't want to optimize for it.

We have agreed a way forwards for this, but it's not going to happen soon. All the optimized search routines I wrote in lexer will need to be rewritten, and will need to make sure this change doesn't destroy performance in the process. The last 2 of those optimizations are still a WIP at present, and I would like to get them finished and merged first before embarking on the next thing, and we also need to figure out how to test it. So it's a multi-step process (and not the easiest problem to start with).

So UTF-8 to UTF-16 translation will have to be on JS side for now. You may want to spend a bit of time optimizing it, if there's low-hanging fruit to be picked, but be aware it'll all be replaced later on.

Some good news is that my repo successfully gave the same output of prettier on ~500 TSX files, ~8k dts files & 10k files inside my node_modules. This is obviously a very biased dataset, but still promising!

That is extremely good news! Nice one! This effort is proving a long road, but I do feel like we're getting there...

PS I've converted your list to checkboxes so we can tick them off as we go.

@overlookmotel
Copy link
Collaborator

@Boshen To avoid you wading through all the above, just to flag that the 2 items which I think most need your input are "import assertions" and modifiers.

@hyf0
Copy link
Contributor

hyf0 commented Mar 19, 2024

Kind of curious the steps of how oxc pass the serialized ast to the js. If I understand correctly, it should be:

(oxc AST in rust) => (serialize to json rust string) => pass json rust string(utf8) to js => (node napi convert utf8 to utf16 to get json js string) => JSON.parse json js string to get estree.

The "(node napi convert utf8 to utf16 to get json js string)" step is the classical communication cost we talk about between rust and js for NAPI-RS program.

If serde could emit utf16 string directly, by using v8::String::NewExternalTwoByte API, we could pass the string generated in rust to js without any overhead.

(oxc AST in rust) => (serialize to json utf16 string) => pass json utf16 string to js => JSON.parse json utf16 string to get estree.

@overlookmotel
Copy link
Collaborator

overlookmotel commented Mar 19, 2024

Kind of curious the steps of how oxc pass the serialized ast to the js

You have it right - Rust types serialized to JSON with serde.

I was not aware that there's a further conversion involved in NAPI from UTF-8 to UTF-16. As I understand it, V8 also supports UTF-8 strings internally, so I'd assumed it used that string representation (the "OneByte" in v8::String::NewExternalOneByte). Maybe that assumption is incorrect - we'd need to look into it. But as far as I'm aware, serde only supports outputting JSON as UTF-8.

We could also pass the JSON from Rust to JS as a Uint8Array and then use TextDecoder to convert to a JS string on JS side, if it's faster. But if that does turn out to be faster than letting napi-rs handle the string transfer, really that should be addressed in napi-rs, as it's a very common use case.

In any case, the end goal is to transfer ASTs to JS with binary transfer (#2409), which doesn't involve strings at all, and will be much faster than any JSON-based scheme. Current WIP (#2457) is x3-4 faster than JSON, but I expect it to get faster still once it's optimized. The WIP works (output is correct for all inputs I've tested it on), but it needs a solid block of tests to cover all edge cases, and one technicality with the allocator needs sorting out. All in all, I'd guess it'll be ready in about 2 months (my time to work on it is limited at present). Is the need for faster transfer more urgent than that? Do you think we need to try to optimize the current JSON-based scheme in the meantime?

By the way, this issue "serialize to estree" is orthogonal to the question of how the AST is transferred - it relates to the shape of the JS AST. We're currently working on aligning JS AST with ESTree by using serde, but AST transfer WIP is also tracking the changes, so switching over from one to the other should be seamless.

@overlookmotel
Copy link
Collaborator

This isn't really relevant to this issue, but since we're discussing it here already...

I looked further into the new external string APIs in NAPI. NAPI since NodeJS v20.4.0 also provides a method node_api_create_external_string_latin1. This is partially exposed by napi-rs (only in sys crate at present, with "experimental" feature). We could make serde_json output pure ASCII (see serde-rs/json#907) and then use the external_string_latin1 API to transfer that JSON string to JS without any memory copy. This would be simpler and faster than converting to UTF-16.

But... I suspect buffer-based AST transfer will be ready before these APIs become fully supported in napi-rs.

IWANABETHATGUY pushed a commit to IWANABETHATGUY/oxc that referenced this issue May 29, 2024
IWANABETHATGUY pushed a commit to IWANABETHATGUY/oxc that referenced this issue May 29, 2024
A few more I missed in oxc-project#2506. Re oxc-project#2463.

Only remaining snake_case in the current types of the AST:
`trailing_comma` in ArrayExpression, ObjectExpression &
ArrayAssignmentTarget.
@zmzlois
Copy link

zmzlois commented Sep 29, 2024

I am looking at several tools to parse code atm and benchmarking their performance (none-Rust related sadly):

  • hermes-parser
  • typescript compiler API
  • tree-sitter
  • ast-grep
  • V8 + node add on

found this issue provided great value, drop by to say thanks

@overlookmotel
Copy link
Collaborator

Hi @zmzlois. Thanks for letting us know our ramblings provided some useful background!

By the way, we met at the JS Monthly meetup last week. I'm the one who asked you what the moral of the story of the tale of bundlers was. Very much enjoyed your talk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants