-
-
Notifications
You must be signed in to change notification settings - Fork 454
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(parser): serialize to estree #2463
Comments
@ArnaudBarre I see from your Github repos that you mostly code in JS/TS, so I don't know if you're comfortable writing Rust. But if you are, would you consider assisting with this? What I'm thinking is that rather than building out further your OXC prettier parser layer in JS, could you instead submit the changes upstream in OXC? For the mismatches where all that's needed is changing field names, it's as simple as e.g.: pub struct ComputedMemberExpression<'a> {
#[cfg_attr(feature = "serde", serde(flatten))]
pub span: Span,
pub object: Expression<'a>,
+ #[cfg_attr(feature = "serde", serde(rename = "property"))]
pub expression: Expression<'a>,
pub optional: bool, // for optional chaining
} Where difference is snake_case vs camelCase: #[derive(Debug, Hash)]
-#[cfg_attr(feature = "serde", derive(Serialize), serde(tag = "type"))]
+#[cfg_attr(feature = "serde", derive(Serialize), serde(tag = "type", rename_all = "camelCase"))]
#[cfg_attr(all(feature = "serde", feature = "wasm"), derive(tsify::Tsify))]
pub struct NewExpression<'a> {
#[cfg_attr(feature = "serde", serde(flatten))]
pub span: Span,
pub callee: Expression<'a>,
pub arguments: Vec<'a, Argument<'a>>,
pub type_parameters: Option<Box<'a, TSTypeParameterInstantiation<'a>>>,
} To cover all the #[derive(Debug, Clone, Copy, PartialEq, Eq, Hash)]
-#[cfg_attr(feature = "serde", derive(Serialize), serde(tag = "type", rename_all = "camelCase"))]
+#[cfg_attr(feature = "serde", derive(Serialize), serde(rename_all = "camelCase"))]
#[cfg_attr(all(feature = "serde", feature = "wasm"), derive(tsify::Tsify))]
pub enum ImportOrExportKind {
Value,
Type,
} Obviously, there are other more complex changes to be made which will require manual If you're not comfortable with Rust, no worries. I can make the changes, and your work on identifying what they are would be a huge help with that. Please just let me know which way you'd like to go. |
@Boshen I think you mentioned that you previously had a branch where you'd written the |
Never done Rust but adding metadata to structs or refactoring property names via IDE should be doable! The complex part is to decide when to update the rust name and when to update just the serialize name/struct. Let's start with the AST nodes with a slight name mismatch:
|
Thanks for your willingness! I'm very happy to help if you need any pointers. Boshen may be happy for us to rename the actual Rust struct property names, but to start with I'd suggest just using To test out changes you make via the Node API, you can build with: cd napi/parser
npm run build That builds the Rust binary, and also a JS binding Just to explain what the complex-looking Support for JSON serialization is only included in the build if you compile with the
Hope this is useful background. Please excuse me if I'm telling stuff that you already know/is obvious. It wasn't obvious at all to me when I started learning Rust! |
I can help with the changes, but I prefer a test infrastructure in-place first 😁 Probably running prettier tests with oxc-prettier-parser in CI? |
Yes, makes sense. Would you want to set that up, or shall I look into it?
I imagine that'd do it. If we wanted to be really thorough, could also run Acorn's tests which I assume will specifically test every possible AST node type. But I don't think Acorn supports TypeScript, so would maybe need typescript-eslint's tests too. Not sure how much time setting all that up would take - maybe worth it, maybe not. |
To change type names e.g.: #[derive(Debug, Clone)]
-#[cfg_attr(feature = "serde", derive(Serialize), serde(tag = "type"))]
+#[cfg_attr(feature = "serde", derive(Serialize), serde(tag = "type", rename = "NumericLiteral"))]
#[cfg_attr(all(feature = "serde", feature = "wasm"), derive(tsify::Tsify))]
pub struct NumberLiteral<'a> {
#[cfg_attr(feature = "serde", serde(flatten))]
pub span: Span,
pub value: f64,
#[cfg_attr(feature = "serde", serde(skip))]
pub raw: &'a str,
#[cfg_attr(feature = "serde", serde(skip))]
pub base: NumberBase,
} |
I'll look into it.
I'll just change the struct names instead 😅 |
According the the prettier documentation, it seems like we just need to align with @typescript-eslint/typescript-estree: https://prettier.io/docs/en/options.html#parser
By making a new package that exports the exact same stuff as https://github.com/typescript-eslint/typescript-eslint/tree/main/packages/typescript-estree as then patch prettier's package.json's The final setup would just be running |
hmm ... it's harder than I thought, it requires more information beyond just the AST |
@overlookmotel Thanks a lot for the explication! Yeah the main blocker will be adding comments to the AST. I don't know how this can be done with performance in mind, but even for bundles/minifiers related task done all in rust, having, being able to check if functions have special comments will be required. I know that esbuild does does some heuristic to keep comments only in places where it might contain some bundler information, but for formatting purpose all comments are needed :/ |
We have comments in an array, I'll get it returned to JS. |
Released napi oxc-parser@0.5.0
( |
Cool I'll keep you updated tonight with my next blocker! |
@Boshen Can we also rename the fields in Rust types to match ESTree? e.g. Personally, I think yes, as it'll be more familiar to contributors, and it sounds like breaking changes are not a big deal pre-1.0, but maybe there's some other context I'm not aware of. |
Actually Property in TSESTree map to |
Let me sit on renaming these spec names to estree names for a bit. I'm in favor of full estree compatibility but I need to think about the consequences and maintainability issues. |
I think @sosukesuzuki might give you something advice. |
…ression (#2574) That a tricky one, because it's time to decide what does ESTree compliant means in the TS world (re #2463) This code: ```ts export declare class ByteBuffer { clear(): void; // ^^ } ``` - Is parsed by [Babel](https://astexplorer.net/#/gist/9f43cf0fe91760e11f1572934681c92f/d38530204eaa6e12dbd14f3ef2d7d2fcb7da7bc2) as `FunctionExpression` with an empty body - By [TS-ESLint](https://astexplorer.net/#/gist/626dd346956a02c221d4e128450652af/4ea4e2feb5ae7bb8787f8c7e452d442c935c1f09) as [TSEmptyBodyFunctionExpression](typescript-eslint/typescript-eslint#1289) - By [OXC](https://oxc-project.github.io/oxc/playground/?code=3YCAAIC1gICAgICAgICyHorESipoTXPdvBaE9wxzlOraoWs19SUxDvdcwSVU0kbBO2b7ppX3x2P5IhQlpGHOYEHNCEfLf38HUICA) as `TSDeclareFunction` I'm going the easy way to fix this to the Babel way, but I think following TS-ESLint would make sense. There is an [open babel issue](babel/babel#13878) about that. Edit: Ok that not so easy and require updating some logic. --------- Co-authored-by: Boshen <boshenc@gmail.com>
I got my first mismatch because of utf-8/utf-16 mismatch. I saw that sourcemap have been merged, does this mean that the parser now tracks utf-16 positions? |
As far as I can see, this is only in the codegen so far. We could do UTF-8 -> UTF-16 mapping in the lexer, but problem is where to store the UTF-16 spans - we don't want to add them to the AST as that'd be an extra 8 bytes for every AST node. So it needs to be done, but exactly how is a bit of an open question. Maybe we could move "line tables" creation from codegen to the parser. But we should probably discuss that on #959. By the way, I see you're banging out PRs at breakneck pace. Brilliant. If you really hadn't touched Rust before, then I must say you're a very quick learner! |
Yeah better to discuss this there! Thanks! I was told Rust error messages are great and I can confirm. And in IDE copilot chat is also really cool when navigating in unknown codebase. For now I've always stopped when it worked so I don't always understand things, if I want to do more serious things I will read the book! |
…ression (oxc-project#2574) That a tricky one, because it's time to decide what does ESTree compliant means in the TS world (re oxc-project#2463) This code: ```ts export declare class ByteBuffer { clear(): void; // ^^ } ``` - Is parsed by [Babel](https://astexplorer.net/#/gist/9f43cf0fe91760e11f1572934681c92f/d38530204eaa6e12dbd14f3ef2d7d2fcb7da7bc2) as `FunctionExpression` with an empty body - By [TS-ESLint](https://astexplorer.net/#/gist/626dd346956a02c221d4e128450652af/4ea4e2feb5ae7bb8787f8c7e452d442c935c1f09) as [TSEmptyBodyFunctionExpression](typescript-eslint/typescript-eslint#1289) - By [OXC](https://oxc-project.github.io/oxc/playground/?code=3YCAAIC1gICAgICAgICyHorESipoTXPdvBaE9wxzlOraoWs19SUxDvdcwSVU0kbBO2b7ppX3x2P5IhQlpGHOYEHNCEfLf38HUICA) as `TSDeclareFunction` I'm going the easy way to fix this to the Babel way, but I think following TS-ESLint would make sense. There is an [open babel issue](babel/babel#13878) about that. Edit: Ok that not so easy and require updating some logic. --------- Co-authored-by: Boshen <boshenc@gmail.com>
By the way, I'm keeping #2457 up to date on top of all the changes to AST you're making. So far, all the changes are working for "raw" transfer too. We're getting there... |
We reached the consensus of serializing to a estree compatible AST, all other issues can be raised separately. |
A step towards #2463. This PR adds `rest` onto end of `elements` / `properties` array in JSON AST for `ObjectPattern`, `ArrayPattern`, `ObjectAssignmentTarget`, `ArrayAssignmentTarget` and `FormalParameters`.
Re-opening this so we can track progress towards this goal in one place. It's proved a little difficult using separate issues for each change, as many of them are tiny changes, and often interconnected. @ArnaudBarre Can we catch up on where we're up to, and what remains? I'd imagine quite a bit of what's left you'll be able to make the changes Rust-side just by adding I've been following your For example, in a bunch of types, there is By the way, part of my motivation to implement all transforms on Rust-side is it makes it much easier to keep #2457 in sync. If it's all done in serde, then I can just make sure that |
This is not exhaustive but should cover all 'types' of differences to resolve:
Some changes could be seen as rename only, but as soon we touch the name of the nodes the type generation became complex to handle that why I didn't push more PRs for that for now. For info the But the biggest blocker to make my repo public is the utf16 conversion. The time to print some large files goes in seconds with the dump implementation, and because I saw you plan to solve that I don't want to optimize for it. Some good news is that my repo successfully gave the same output of prettier on ~500 TSX files, ~8k dts files & 10k files inside my node_modules. This is obviously a very biased dataset, but still promising! |
Thanks loads for writing this up. A slightly daunting list. Thank god we are doing this as a team effort!
OK great. If I've understood you right, this sounds like maybe we should alter the Rust types to align better, rather than translating in serde. @Boshen Do you agree?
Ah I see. In typescript-eslint/typescript-eslint#5384 (comment) they say "in v7 we remove the old property". v7 landed last month but unfortunately it doesn't look like they followed through on that. Nonetheless, the old In my view, I think we should just rename the field to Duplicating the field would require a custom
I don't think you can with what
I can do these. Now we've done it with identifiers, we have a good path to follow, won't take long.
I think we can do this in Again, requires a custom
If compat with ESTree is now a primary goal for Oxc, then yes I agree would be good to consider altering the Rust types, but maybe tricky without bloating size of the AST types that contain them. @Boshen?
I'll look into these 2. Hopefully can use
I'll take a look. May be quite simple.
OK, let's leave these for now until you've identified exactly what's needed.
As above, we now have a good path to follow for these, but it's a bit tricky. Totally fine to leave it to me / Boshen.
We have agreed a way forwards for this, but it's not going to happen soon. All the optimized search routines I wrote in lexer will need to be rewritten, and will need to make sure this change doesn't destroy performance in the process. The last 2 of those optimizations are still a WIP at present, and I would like to get them finished and merged first before embarking on the next thing, and we also need to figure out how to test it. So it's a multi-step process (and not the easiest problem to start with). So UTF-8 to UTF-16 translation will have to be on JS side for now. You may want to spend a bit of time optimizing it, if there's low-hanging fruit to be picked, but be aware it'll all be replaced later on.
That is extremely good news! Nice one! This effort is proving a long road, but I do feel like we're getting there... PS I've converted your list to checkboxes so we can tick them off as we go. |
@Boshen To avoid you wading through all the above, just to flag that the 2 items which I think most need your input are "import assertions" and modifiers. |
Kind of curious the steps of how oxc pass the serialized ast to the js. If I understand correctly, it should be: ( The "(node napi convert utf8 to utf16 to get json If ( |
You have it right - Rust types serialized to JSON with serde. I was not aware that there's a further conversion involved in NAPI from UTF-8 to UTF-16. As I understand it, V8 also supports UTF-8 strings internally, so I'd assumed it used that string representation (the "OneByte" in We could also pass the JSON from Rust to JS as a In any case, the end goal is to transfer ASTs to JS with binary transfer (#2409), which doesn't involve strings at all, and will be much faster than any JSON-based scheme. Current WIP (#2457) is x3-4 faster than JSON, but I expect it to get faster still once it's optimized. The WIP works (output is correct for all inputs I've tested it on), but it needs a solid block of tests to cover all edge cases, and one technicality with the allocator needs sorting out. All in all, I'd guess it'll be ready in about 2 months (my time to work on it is limited at present). Is the need for faster transfer more urgent than that? Do you think we need to try to optimize the current JSON-based scheme in the meantime? By the way, this issue "serialize to estree" is orthogonal to the question of how the AST is transferred - it relates to the shape of the JS AST. We're currently working on aligning JS AST with ESTree by using |
This isn't really relevant to this issue, but since we're discussing it here already... I looked further into the new external string APIs in NAPI. NAPI since NodeJS v20.4.0 also provides a method But... I suspect buffer-based AST transfer will be ready before these APIs become fully supported in napi-rs. |
A few more I missed in oxc-project#2506. Re oxc-project#2463. Only remaining snake_case in the current types of the AST: `trailing_comma` in ArrayExpression, ObjectExpression & ArrayAssignmentTarget.
I am looking at several tools to parse code atm and benchmarking their performance (none-Rust related sadly):
found this issue provided great value, drop by to say thanks |
Hi @zmzlois. Thanks for letting us know our ramblings provided some useful background! By the way, we met at the JS Monthly meetup last week. I'm the one who asked you what the moral of the story of the tale of bundlers was. Very much enjoyed your talk. |
We should serialize our AST to estree, to create less chaos in the JavaScript ecosystem.
We need to:
The text was updated successfully, but these errors were encountered: