Performance improvement and potential tools to adopt #35

StringKe · 2023-11-10T08:19:14Z

I have a 5Mb sized file that needs to be processed, I've tried running it directly in node and it takes up to 21G of memory space in some transform processing.

However, in some of the processing, it may fail, so maybe it would be possible to save the data that was processed successfully? And use it again next time?

pionxzh · 2023-11-10T14:38:47Z

Did you clone the project and manually run it directly in Node.js? I want to confirm how you run this whole thing up.

Can you share the source file here or privately via the mail?
21GB is quite scary 😢

StringKe · 2023-11-11T03:37:40Z

Did you clone the project and manually run it directly in Node.js? I want to confirm how you run this whole thing up.

Can you share the source file here or privately via the mail? 21GB is quite scary 😢

Yes, I copied the source code to run it.

What is your email address, you can base64 and reply to me.

pionxzh · 2023-11-14T00:30:39Z

I can confirm that this file will OOM during the unpacking process.

0xdevalias · 2023-11-16T05:47:55Z

The following issue is also probably tangentially relevant to this, as parsing a large file may end up in files getting silently lost along the way:

Consider storing data into IndexedDB #33

pionxzh · 2023-11-16T06:31:24Z

Yes, they are relevant.

0xdevalias · 2023-11-17T08:46:42Z

I've tried running it directly in node and it takes up to 21G of memory space in some transform processing.

I can confirm that this file will OOM during the unpacking process.

My suggestion here would be a rather major change, so it's not worth looking into too deeply unless it turns out there is no good/efficient way to manage this with the current AST parsers/etc; but one thing I stumbled across/was thinking about the other day was that the swc / tree-sitter / etc Rust/etc parsers can apparently be used from JS apps; and how that might allow us to run the unminify process much faster and/or in a potentially more memory efficient way:

These links/resources probably aren't exhaustive; but figured I would share them as a starting point in case this was a path that was worth looking into at some stage:

`tree-sitter` / `web-tree-sitter`

https://github.com/tree-sitter/node-tree-sitter
- Node.js bindings for tree-sitter
https://www.npmjs.com/package/web-tree-sitter
- https://github.com/tree-sitter/tree-sitter/tree/master/lib/binding_web
  - WebAssembly bindings to the Tree-sitter parsing library
- https://crates.io/crates/tree-sitter-javascript
  - https://github.com/tree-sitter/tree-sitter-javascript
  - JavaScript and JSX grammar for tree-sitter. For TypeScript, see tree-sitter-typescript.

`swc` / `@swc/wasm-web` / `swc_ecma_parser`

https://swc.rs/
- https://swc.rs/docs/usage/core#parse
- https://swc.rs/docs/usage/wasm
  - Some examples of using @swc/wasm-web to parse code to an AST in AST Explorer(s)
    - https://github.com/fkling/astexplorer/blob/master/website/src/parsers/js/swc.js#L3
    - https://github.com/sxzz/ast-explorer/blob/main/composables/language/javascript.ts#L74-L109
  - Running a plugin with @swc/wasm-web swc-project/swc#3713
    - Running a plugin with @swc/wasm-web
- https://rustdoc.swc.rs/swc/index.html
https://www.christopherbiscardi.com/how-to-print-a-javascript-ast-using-swc-and-rust
- How to print a JavaScript AST using SWC and Rust
https://blog.logrocket.com/writing-webpack-plugins-rust-using-swc/
- Writing webpack plugins in Rust using SWC for faster builds
Q: How to manipulate nodes+parent_nodes in AST & generate the tree back to JS code? swc-project/swc#3254
- Q: How to manipulate nodes+parent_nodes in AST & generate the tree back to JS code?
https://play.swc.rs/
- The output can be set to AST

`ast-grep`

https://ast-grep.github.io/reference/api.html
- ast-grep currently has an experimental API for Node.js

Etc

https://github.com/rustwasm/wasm-pack
- This tool seeks to be a one-stop shop for building and working with rust- generated WebAssembly that you would like to interop with JavaScript, in the browser or with Node.js. wasm-pack helps you build rust-generated WebAssembly packages that you could publish to the npm registry, or otherwise use alongside any javascript packages in workflows that you already use
- https://rustwasm.github.io/wasm-pack/book/

pionxzh · 2023-11-19T16:18:35Z

I decided to suspend this request for a while. This feature will be implemented on the new web IDE instead of the current playground. In the meantime, you can use CLI to run on local machine.

0xdevalias · 2023-11-19T21:21:21Z

@pionxzh Oo, new web IDE sounds interesting. Are you able to tell us more about your plan/roadmap with it, and what you're hoping it will allow/enable?

pionxzh · 2023-11-20T01:42:18Z

I tried the sample with the new CLI.

It took ~2GB and 60 seconds to pass for unpacker (and failed to unpack it into multiple files😞; I will check further!).

Generated 1 modules from main.ba3b216f.js to out (61,265.4ms)

For unminify, it took 3.4 hours... to finish the whole process. Memory usage is from 100MB ~ 1GB ~ 2GB depending on the rule.

I will improve the CLI to write files more frequently during the unminify process, and add a --perf flag for recording the time and memory usage.

StringKe · 2023-11-20T01:45:47Z

With cli you may need to prompt the user to pass node V8's configuration --max-old-space-size to extend the amount of memory available to the node when it runs out of memory.

pionxzh · 2023-11-20T01:52:43Z

Are you able to tell us more about your plan/roadmap with it, and what you're hoping it will allow/enable?

@0xdevalias I have created a package /packages/ide which provides a vscode-like experience. I want to use it to replace the existing playground. I am still trying to figure out the idea haha.. Module mapping won't be exposed, instead, you can directly edit the filename in the file explore. And two tabs will be provided for the compare view, which you can drag and organize in any form you like. The editor would be Monaco, which offers a better editing experience. Features like download, better unminify handling, error reporting...

But I might drop the idea and simply put Monaco into the existing playground.

pionxzh · 2023-11-20T01:54:10Z

@StringKe We can add some hints in the CLI usage description. But I didn't hit on OOM during the whole process with your sample.

0xdevalias · 2023-11-20T04:53:47Z

I have created a package /packages/ide which provides a vscode-like experience.

@pionxzh Oo.. nice! I created a new issue for this, and will add my /2c over there so as not to clutter this thread too much:

wakaru IDE #45

For unminify, it took 3.4 hours... to finish the whole process. Memory usage is from 100MB ~ 1GB ~ 2GB depending on the rule.

I will improve the CLI to write files more frequently during the unminify process, and add a --perf flag for recording the time and memory usage.

@pionxzh With the --perf flag, it would be interesting to see how much time is spent in each different section of the CLI/etc; not sure how hard it would be to record stats/data that could be loaded into something like the Chrome DevTools debugger to be able to see a flamegraph of where the time was spent; as that could potentially help in knowing where to optimise first.

pionxzh · 2023-11-20T06:23:58Z

I think you don't need additional work on the CLI to export the flamegraph? Should be same as how people measure normal nodejs project.

0xdevalias · 2023-11-20T07:22:04Z

Should be same as how people measure normal nodejs project

@pionxzh True true.. I think I got confused when you were talking about adding a --perf command; as I believe the flamegraph/etc profiling stuff tends to include timing/memory/etc data already doesn't it?

I asked ChatGPT, and it gave some good tool suggestions/how to, included in the expandable below:

Some notes from ChatGPT on profiling

tl;dr:

https://nodejs.org/en/docs/guides/diagnostics-flamegraph
https://github.com/davidmarkclements/0x
- single-command flamegraph profiling 🔥
  Discover the bottlenecks and hot paths in your code, with flamegraphs
https://github.com/clinicjs/node-clinic
- Clinic.js diagnoses your Node.js performance issues
- https://clinicjs.org/
  - Tools to help diagnose and pinpoint Node.js performance issues
  - https://clinicjs.org/doctor/
    - Clinic.js Doctor: Diagnose performance issues in your Node.js applications
  - https://clinicjs.org/bubbleprof/
    - Clinic.js Bubbleprof: Bubbleprof is a new, completely unique, approach to profiling your Node.js code
  - https://clinicjs.org/flame/
    - Clinic.js Flame: Uncovers the bottlenecks and hot paths in your code with flamegraphs
  - https://clinicjs.org/heapprofiler/
    - Clinic.js HeapProfiler: Uncovers memory allocations by functions with Flamegraphs.

To get granular timing and performance data for a JavaScript library and a Node.js CLI application, you can use several tools and methods. Here's a step-by-step guide:

1. Node.js Built-in Profiler

Node.js has a built-in profiler that can be used to capture CPU profiles. You can run your Node.js application with the --prof flag. For example:
node --prof your-app.js
This will generate a .log file with profiling data.

2. Convert Profiling Data for Chrome DevTools

To analyze the profiling data in Chrome DevTools, you need to convert the .log file to a format that Chrome can understand. You can use the node --prof-process command for this:
node --prof-process isolate-0xnnnnnnnnnnnn-v8.log > processed-profile.txt
3. Visualizing with Chrome DevTools

Open Chrome DevTools in your Chrome browser. Go to the "Memory" tab, and load the processed profile (processed-profile.txt). This will display a flame graph and other useful visualizations.

4. Using Third-party Tools

a) 0x

0x is a powerful tool for generating flame graphs for Node.js applications. Install it globally using npm:
npm install -g 0x
Then, run your Node.js application with 0x:
0x your-app.js
0x will automatically generate an HTML file with an interactive flame graph.

b) clinic.js

Clinic.js is another comprehensive profiling toolset. It can be installed via npm:
npm install -g clinic
To profile your application, use:
clinic doctor -- node your-app.js
or
clinic flame -- node your-app.js
This will generate detailed reports and flame graphs.

5. Analyzing the Data

Once you have the flame graphs and performance reports:

Look for "hot" functions that consume a lot of CPU time.

Identify patterns of high memory usage.

Examine async operations and their impact on performance.

6. Optimizing Based on Data

Refactor or optimize hot functions.

Reduce unnecessary computations or memory allocations.

Optimize async flows and database queries if they are the bottlenecks.

Additional Tips

Ensure that your Node.js version is up-to-date for the best profiling support.

Sometimes, running these tools in a production-like environment can give more accurate results.

Using these steps, you can capture detailed performance data for your JavaScript library and Node.js CLI application, analyze it using tools like Chrome DevTools, and identify areas for optimization.

Running the previous tools on my sample webpack'd code (Ref):

⇒ cd ./unpacked/_next/static/chunks

0x:

⇒ npx 0x --open -- node ../../../../node_modules/@wakaru/unpacker/dist/cli.cjs 496.js -o ./496-v2
🔥  ProfilingGenerated 51 modules from 496.js to 496-v2 (5,408.6ms)
🔥  Flamegraph generated in
file:///Users/REDACTED/dev/0xdevalias/REDACTED/unpacked/_next/static/chunks/62569.0x/flamegraph.html

 ⇒ npx 0x --open -- node ../../../../node_modules/@wakaru/unminify/dist/cli.cjs ./496-v2/*.js -o ./496-v2-unminified
🔥  Profiling• Transforming 496-v2-unminified/496-v2/module-10604.js (1,681.9ms)
• Transforming 496-v2-unminified/496-v2/module-13282.js (650.8ms)
• Transforming 496-v2-unminified/496-v2/module-17915.js (1,994.3ms)
• Transforming 496-v2-unminified/496-v2/module-180.js (183ms)
• Transforming 496-v2-unminified/496-v2/module-19051.js (158ms)
• Transforming 496-v2-unminified/496-v2/module-21437.js (911.3ms)
• Transforming 496-v2-unminified/496-v2/module-2368.js (551.6ms)
• Transforming 496-v2-unminified/496-v2/module-24148.js (226.5ms)
• Transforming 496-v2-unminified/496-v2/module-25094.js (1,217ms)
• Transforming 496-v2-unminified/496-v2/module-25345.js (674.5ms)
• Transforming 496-v2-unminified/496-v2/module-25422.js (451.2ms)
• Transforming 496-v2-unminified/496-v2/module-30931.js (975.5ms)
• Transforming 496-v2-unminified/496-v2/module-31721.js (551.3ms)
• Transforming 496-v2-unminified/496-v2/module-32165.js (625.8ms)
• Transforming 496-v2-unminified/496-v2/module-32689.js (424.6ms)
• Transforming 496-v2-unminified/496-v2/module-36112.js (486.7ms)
• Transforming 496-v2-unminified/496-v2/module-36716.js (1,069.3ms)
• Transforming 496-v2-unminified/496-v2/module-37541.js (113.7ms)
• Transforming 496-v2-unminified/496-v2/module-38631.js (151.8ms)
• Transforming 496-v2-unminified/496-v2/module-44925.js (64ms)
• Transforming 496-v2-unminified/496-v2/module-45036.js (27,773.9ms)
• Transforming 496-v2-unminified/496-v2/module-46110.js (839.3ms)
[un-jsx] children from attribute and arguments are both present: jsx(g.Z, {
# ..snip..
• Transforming 496-v2-unminified/496-v2/module-48101.js (1,249.6ms)
[un-jsx] children from attribute and arguments are both present: jsx("div", { className: "text-yellow-500", children: e }, t)
[un-jsx] children from attribute and arguments are both present: jsx("div", { className: "text-red-500", children: e }, t)
• Transforming 496-v2-unminified/496-v2/module-49910.js (876.8ms)
• Transforming 496-v2-unminified/496-v2/module-5046.js (283.5ms)
• Transforming 496-v2-unminified/496-v2/module-56244.js (872.3ms)
• Transforming 496-v2-unminified/496-v2/module-57311.js (3,180.6ms)
• Transforming 496-v2-unminified/496-v2/module-57924.js (428.4ms)
• Transforming 496-v2-unminified/496-v2/module-61119.js (917.9ms)
• Transforming 496-v2-unminified/496-v2/module-62440.js (160.7ms)
[un-jsx] children from attribute and arguments are both present: jsx(w.Z, {
# ..snip..
[un-jsx] children from attribute and arguments are both present: jsx("div", {
# ..snip..
[un-jsx] children from attribute and arguments are both present: jsx(Et, { x: i, children: e }, t)
• Transforming 496-v2-unminified/496-v2/module-63390.js (5,776.7ms)
[un-jsx] children from attribute and arguments are both present: jsx(m.E.div, {
# ..snip..
[un-jsx] children from attribute and arguments are both present: jsxs(N.Z, {
# ..snip..
[un-jsx] children from attribute and arguments are both present: jsx(N.Z, {
# ..snip..
• Transforming 496-v2-unminified/496-v2/module-63727.js (4,760.3ms)
• Transforming 496-v2-unminified/496-v2/module-66523.js (447.6ms)
• Transforming 496-v2-unminified/496-v2/module-69403.js (309.9ms)
• Transforming 496-v2-unminified/496-v2/module-697.js (266.3ms)
• Transforming 496-v2-unminified/496-v2/module-74437.js (263.9ms)
• Transforming 496-v2-unminified/496-v2/module-75179.js (274.9ms)
• Transforming 496-v2-unminified/496-v2/module-75515.js (181.2ms)
• Transforming 496-v2-unminified/496-v2/module-75527.js (4,662ms)
• Transforming 496-v2-unminified/496-v2/module-76559.js (382.7ms)
• Transforming 496-v2-unminified/496-v2/module-77442.js (367.2ms)
• Transforming 496-v2-unminified/496-v2/module-85449.js (322.9ms)
• Transforming 496-v2-unminified/496-v2/module-86433.js (529.2ms)
• Transforming 496-v2-unminified/496-v2/module-86526.js (97.1ms)
• Transforming 496-v2-unminified/496-v2/module-86573.js (1,461.6ms)
• Transforming 496-v2-unminified/496-v2/module-870.js (268ms)
• Transforming 496-v2-unminified/496-v2/module-87105.js (6,067.3ms)
• Transforming 496-v2-unminified/496-v2/module-87316.js (194.1ms)
• Transforming 496-v2-unminified/496-v2/module-90076.js (1,509.4ms)
• Transforming 496-v2-unminified/496-v2/module-94860.js (621ms)
• Transforming 496-v2-unminified/496-v2/module-97732.js (1,447.6ms)
🔥  Flamegraph generated in
file:///Users/REDACTED/dev/0xdevalias/REDACTED/unpacked/_next/static/chunks/62934.0x/flamegraph.html

clinic:

⇒ npx clinic doctor -- node ../../../../node_modules/@wakaru/unpacker/dist/cli.cjs 496.js -o ./496-v2
To generate the report press: Ctrl + C
Generated 51 modules from 496.js to 496-v2 (4,701.4ms)
Analysing data
Generated HTML file is file:///Users/REDACTED/dev/0xdevalias/REDACTED/unpacked/_next/static/chunks/.clinic/65647.clinic-doctor.html

⇒ npx clinic doctor -- node ../../../../node_modules/@wakaru/unminify/dist/cli.cjs ./496-v2/*.js -o ./496-v2-unminified
To generate the report press: Ctrl + C
• Transforming 496-v2-unminified/496-v2/module-10604.js (891.1ms)
# ..snip..
Analysing data
Generated HTML file is file:///Users/REDACTED/dev/0xdevalias/REDACTED/unpacked/_next/static/chunks/.clinic/65956.clinic-doctor.html

⇒ npx clinic flame -- node ../../../../node_modules/@wakaru/unminify/dist/cli.cjs ./496-v2/*.js -o ./496-v2-unminified
To generate the report press: Ctrl + C
• Transforming 496-v2-unminified/496-v2/module-10604.js (1,591.3ms)
# ..snip..
[==  ] Analysing dataRangeError: Invalid string length
    at JSON.stringify (<anonymous>)
    at zeroEks (/Users/devalias/dev/0xdevalias/chatgpt-source-watch/node_modules/0x/index.js:56:51)
    at runMicrotasks (<anonymous>)
    at processTicksAndRejections (node:internal/process/task_queues:96:5)

pionxzh · 2023-11-28T08:59:26Z

I will try ast-grep for simple tasks. I have setup a few benchmark stuff that can help us make the decision.

pionxzh · 2023-11-29T19:18:40Z

The benchmark shows that ast-grep can be 3-4 times faster and require a smaller memory footprint. However, at present, we only have napi binding that can only run on the CLI. A WASM build is required for the browser environment (playground). I'm not too sure about the future direction of this.

We have several options to consider:

Keep migrating part of the rules to ast-grep, and only replace them on CLI.
Pursue the development of a WASM build for ast-grep. This could involve either initiating our own build or waiting for an official build to become available. Once achieved, we could rewrite part of the core logic like import scanning and simple transformation to ast-grep.
Forget about it and focus on optimizing the current implementation.

It's important to note that while ast-grep fulfills many of our requirements, we do not plan to rewrite complex rules using it. Complex rules often involve scope awareness, advanced structure matching, and AST node building. Therefore, it's challenging to determine whether the overall performance gains will justify the effort required for integration.

0xdevalias · 2023-11-29T19:40:36Z

@pionxzh Was there a reason you chose ast-grep specifically out of the suggestions I made above? What about tree-sitter / web-tree-sitter and similar? They would be usable across both CLI and web; and should also theoretically be faster and lower memory too?

I feel like it could be an ideal choice:

https://github.com/tree-sitter/tree-sitter
- Tree-sitter is a parser generator tool and an incremental parsing library. It can build a concrete syntax tree for a source file and efficiently update the syntax tree as the source file is edited. Tree-sitter aims to be:
  - General enough to parse any programming language
  - Fast enough to parse on every keystroke in a text editor
  - Robust enough to provide useful results even in the presence of syntax errors
  - Dependency-free so that the runtime library (which is written in pure C) can be embedded in any application
https://tree-sitter.github.io/tree-sitter/using-parsers
- Tree-sitter provides a DOM-style interface for inspecting syntax trees.
- Once you have a node, you can access the node’s children
  You can also access its siblings and parent
- Many code analysis tasks involve searching for patterns in syntax trees. Tree-sitter provides a small declarative language for expressing these patterns and searching for matches.
You can also specify arbitrary metadata and conditions associated with a pattern by adding predicate S-expressions anywhere within your pattern.
https://tree-sitter.github.io/tree-sitter/playground
https://github.com/tree-sitter/tree-sitter/blob/master/lib/binding_web/README.md
- WebAssembly bindings to the Tree-sitter parsing library.
https://github.com/tree-sitter/tree-sitter-javascript
- JavaScript and JSX grammar for tree-sitter
https://github.com/tree-sitter/tree-sitter-typescript
- TypeScript and TSX grammars for tree-sitter.

I feel like having rules that only work on the CLI and not the web UI will increase maintenance effort; and could easily lead to the two becoming out of sync; which sounds like a bad direction for the project.

Similarly, since you also say that you wouldn't use it for more complex rules/rewrites; even if we did have a web capable build of it, I wonder if it's worth fragmenting the rules and how they're implemented between 2 tools/methods/styles; as then contributors/maintainers will need to know how to make simpler rules in one tool, and more complex rules in another.

If this was a path that was followed, here are some potentially relevant resources; that seem to suggest ast-grep already compiles to webassembly to support their own playground; and that it's seemingly based on tree-sitter / web-tree-sitter anyway, so tree-sitter is probably a more direct parser to look at:

https://ast-grep.github.io/
- https://github.com/ast-grep/ast-grep.github.io
  - ast-grep playground
  - Pre-requisite
    
    It requires tree-sitter.wasm and tree-sitter-{lang}.wasm available in public directory.
    
    Language specific wasm must be built with the same emcc version of the tree-sitter.wasm
- https://lib.rs/crates/ast-grep-wasm
- Setup Guide
  
  Unfortunately wasm-pack does not support compiling C dependency with stdlib. We have to use emcc.
  We have to use web-tree-sitter

Edit: Looking at the ast-grep GitHub org, there are a few more projects there that might be useful with regards to tree-sitter:

https://github.com/ast-grep
- A lightning fast code search/rewrite tool!
- https://github.com/ast-grep/tree-sitter-facade
  - tree-sitter-facade
    A uniform interface for tree-sitter (rust) and web-tree-sitter (wasm-bindgen)
  - This crate is intended to be used for projects that use tree-sitter and which target both native and web platforms (via wasm-bindgen).
- https://github.com/ast-grep/web-tree-sitter-sg
  - web-tree-sitter-sg
    Raw bindings to the web-tree-sitter-sg API for projects using wasm-bindgen
- https://github.com/ast-grep/web-tree-sitter-wasm-bindgen
  - Tree-sitter bindings for the web repackaged for compatibility with wasm-bindgen

@pionxzh Curious, are your benchmarking tools/etc added to this repo? Or just a local thing?

pionxzh · 2023-11-29T19:54:08Z

Tree sitter is an excellent component. But it doesn't provide the matcher that required by rules. ast grep also uses tree sitter underlying, providing a robust matcher and replacer. I can also use magic string to manipulate the source code as a combination.

0xdevalias · 2023-11-29T20:25:34Z

@pionxzh When you say "doesn't provide the matcher", what do you mean? tree-sitter has a built in query language; that from my brief time reading through it, sounds pretty powerful?

https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries
- Pattern Matching with Queries
  Many code analysis tasks involve searching for patterns in syntax trees. Tree-sitter provides a small declarative language for expressing these patterns and searching for matches.
- A query consists of one or more patterns, where each pattern is an S-expression that matches a certain set of nodes in a syntax tree. The expression to match a given node consists of a pair of parentheses containing two things: the node’s type, and optionally, a series of other S-expressions that match the node’s children.
- You can also constrain a pattern so that it only matches nodes that lack a certain field.
- When matching patterns, you may want to process specific nodes within the pattern. Captures allow you to associate names with specific nodes in a pattern, so that you can later refer to those nodes by those names.
- You can also specify arbitrary metadata and conditions associated with a pattern by adding predicate S-expressions anywhere within your pattern.
  - Note — Predicates are not handled directly by the Tree-sitter C library. They are just exposed in a structured form so that higher-level code can perform the filtering. However, higher-level bindings to Tree-sitter like the Rust Crate or the WebAssembly binding do implement a few common predicates like the #eq?, #match?, and #any-of? predicates explained above.
  - To recap about the predicates Tree-Sitter’s bindings support:
    - #eq?: checks for a direct match against a capture or string
    - #match?: checks for a match against a regular expression
    - #any-of?: checks for a match against a list of strings
    - Adding not- to the beginning of any of these predicates will negate the match
    - By default, a quantified capture will only match if all of the nodes match the predicate
    - Adding any- before the eq or match predicates will instead match if any of the nodes match the predicate
- etc
https://tree-sitter.github.io/tree-sitter/using-parsers#the-query-api
- Create a query by specifying a string containing one or more patterns

It's also seemingly commonly used as the basis for 'code navigation systems', which could be useful in the web-IDE side of things:

https://tree-sitter.github.io/tree-sitter/code-navigation-systems
- Tree-sitter can be used in conjunction with its tree query language as a part of code navigation systems. An example of such a system can be seen in the tree-sitter tags command, which emits a textual dump of the interesting syntactic nodes in its file argument.
  
  A notable application of this is GitHub’s support for search-based code navigation.
  
  This document exists to describe how to integrate with such systems, and how to extend this functionality to any language with a Tree-sitter grammar.
- Tagging and captures
  Tagging is the act of identifying the entities that can be named in a program. We use Tree-sitter queries to find those entities. Having found them, you use a syntax capture to label the entity and its name.
  The essence of a given tag lies in two pieces of data: the role of the entity that is matched (i.e. whether it is a definition or a reference) and the kind of that entity, which describes how the entity is used (i.e. whether it’s a class definition, function call, variable reference, and so on). Our convention is to use a syntax capture following the @role.kind capture name format, and another inner capture, always called @name, that pulls out the name of a given identifier.
  
  You may optionally include a capture named @doc to bind a docstring.

This 'magic string'?

https://github.com/Rich-Harris/magic-string
- magic-string
- Suppose you have some source code. You want to make some light modifications to it - replacing a few characters here and there, wrapping it with a header and footer, etc - and ideally you'd like to generate a source map at the end of it. You've thought about using something like recast (which allows you to generate an AST from some JavaScript, manipulate it, and reprint it with a sourcemap without losing your comments and formatting), but it seems like overkill for your needs (or maybe the source code isn't JavaScript).
- Your requirements are, frankly, rather niche. But they're requirements that I also have, and for which I made magic-string. It's a small, fast utility for manipulating strings and generating sourcemaps.
https://github.com/sxzz/magic-string-ast
- Extend Babel AST for magic-string

pionxzh · 2023-11-30T19:17:18Z

Sorry, I didn't check the whole document.

This is the test result on Node.js (wasm on node.js is slower than the native binding but ¯_(ツ)_/¯ )

Perf with void 0 to undefined transformation. Tree-sitter's performance is good on large samples.

10 LOC 1100 | 1000
100 LOC 135 | 467
1000 LOC 7 | 81
5000 LOC 1? | 16

Its API is not so convenient but acceptable. I feel it's powerful, but there is a learning curve. Might need to take some time to really read through the document.

This 'magic string'?

Yes. The first link you provided.

0xdevalias · 2023-12-01T03:06:29Z

Sorry, I didn't check the whole document.

@pionxzh No worries :)

This is the test result on Node.js (wasm on node.js is slower than the native binding but ¯_(ツ)_/¯ )

@pionxzh Is that graph showing tree-sitter wasm on the left and native on the right? Or is that jscodeshift/babel on the left, and tree-sitter on the right?

Wasm is known to have some overheads that makes it slower than native bindings, so that makes sense.

Its API is not so convenient but acceptable. I feel it's powerful, but there is a learning curve. Might need to take some time to really read through the document.

@pionxzh nods that makes sense. I haven't worked with it properly before, so can't give any real world experiences on it; but from reading through the documentation, it sounds like it's pretty powerful.

Curious, are your benchmarking tools/etc added to this repo? Or just a local thing?

@pionxzh You might have missed this question earlier.

pionxzh · 2023-12-01T03:35:14Z

Oh, you can check the benchmark at /benches. Run with npx tsx ./benches/un-undefined.ts. I have a branch for ast-grep, but I forgot to push it haha. And there is another branch for tree-sitter, still on my PC. Will push them when I get home.

0xdevalias · 2023-12-01T03:52:47Z

Oh, you can check the benchmark at /benches.

@pionxzh Oh true! I totally missed that somehow (committed 3cd85f6). Thanks :)

https://caderek.github.io/benny/
- https://github.com/caderek/benny
  - A dead simple benchmarking framework for JS/TS libs

I have a branch for ast-grep, but I forgot to push it haha. And there is another branch for tree-sitter, still on my PC. Will push them when I get home.

@pionxzh Haha. Sounds good, thanks! :)

There was another parser project I stumbled across while looking ast-grep's own benchmarks (Ref), though haven't looked into it much yet. From a quick skim it sounds interesting as well though; and claims that it's parser is 2x faster than swc:

https://github.com/oxc-project/oxc
- Oxc
  The Oxidation Compiler is creating a suite of high-performance tools for JavaScript and TypeScript.
  
  Oxc is building a parser, linter, formatter, transpiler, minifier, resolver ... all written in Rust.
- https://github.com/oxc-project/oxc#-ast-and-parser
  - AST and Parser
    Oxc maintains its own AST and parser, which is by far the fastest and most conformant JavaScript and TypeScript (including JSX and TSX) parser written in Rust.
  - While many existing JavaScript tools rely on estree as their AST specification, a notable drawback is its abundance of ambiguous nodes. This ambiguity often leads to confusion during development with estree.
    
    The Oxc AST differs slightly from the estree AST by removing ambiguous nodes and introducing distinct types. For example, instead of using a generic estree Identifier, the Oxc AST provides specific types such as BindingIdentifier, IdentifierReference, and IdentifierName. This clear distinction greatly enhances the development experience by aligning more closely with the ECMAScript specification.
https://oxc-project.github.io/
- https://oxc-project.github.io/oxc/playground/
- https://oxc-project.github.io/docs/guide/usage/parser.html
- Parser
  - 2x faster then SWC parser
  - By far the fastest and most conformant JavaScript and TypeScript (including JSX and TSX) parser written in Rust
  - https://github.com/oxc-project/bench-javascript-parser-written-in-rust
    - Benchmark for Oxc, Swc and Rome parser
- The umbrella crate oxc exports all public crates from this repository
- The AST and parser crates oxc_ast and oxc_parser are production ready
  - https://docs.rs/oxc_parser/latest/oxc_parser/#conformance
    - The parser parses all of Test262 and most of Babel and TypeScript parser conformance tests.
      - https://github.com/oxc-project/oxc/tree/main/tasks/coverage
https://oxc-project.github.io/javascript-parser-in-rust/
- Write a JavaScript Parser in Rust
  
  Rust, JavaScript, and parsers are all hard to learn, let's combine these three and challenge ourselves to write a JavaScript parser in Rust.
  
  This will be the guide for you if you are interested in learning Rust, parsers, or would like to contribute to oxc, swc or Biome in the near future.
  
  The guide will cover all the basic topics of writing a JavaScript parser in rust. The tutorials will explain some topics in more depth.

pionxzh · 2023-12-01T04:16:29Z

Oh I know this project, will check em. I think now the requirement would be

Having a wasm binding
Provides ast query functionality (can go to parent and child node would be a plus)
Query results provide range information

There is a wasm binding, but it's not so ready-to-use. It's currently built for their playground. Not sure, will check more on these options.

0xdevalias · 2023-12-01T04:36:02Z

I think now the requirement would be...

@pionxzh nods, these make sense and sound reasonable.

Query results provide range information

@pionxzh Though curious, what is this aspect needed for?

There is a wasm binding, but it's not so ready-to-use. It's currently built for their playground.

@pionxzh Yeah, sounds like it's not very maintained currently; or at the very least, it's not currently published:

Publish wasm oxc-project/oxc#301

The playground's package.json loads it as such:

https://github.com/oxc-project/oxc/blob/main/website/package.json#L32C5-L32C19
- "@oxc/wasm-web": "link:../npm/wasm-web",

There doesn't seem to be a npm/wasm-web in the main repo:

https://github.com/oxc-project/oxc/tree/main/npm

But the website/package.json has scripts to build it:

https://github.com/oxc-project/oxc/blob/main/website/package.json#L8-L10
- "wasm-dev": "wasm-pack build --out-dir ../../npm/wasm-web --target web --dev --scope oxc ../crates/oxc_wasm",
- "wasm-build": "wasm-pack build --out-dir ../../npm/wasm-web --target web --release --scope oxc ../crates/oxc_wasm"
https://github.com/oxc-project/oxc/tree/main/crates/oxc_wasm
- Wasm package for oxc compiler

Some other historical context:

From a brief skim of the issues, some of these might be relevant/of interest also:

The issue on benchmarking linked to swc's benchmark page (which seems to have a really cool 'chart over time' to see performance based on commits, which seems to use this tool:

Boa's benchmarks also look similar, and seem to use a tool called criterion for it:

https://boajs.dev/boa/dev/bench/
https://github.com/boa-dev/boa/blob/main/boa_engine/benches/full.rs#L9
https://github.com/bheisler/criterion.rs
- Statistics-driven benchmarking library for Rust
- Criterion.rs helps you write fast code by detecting and measuring performance improvements or regressions, even small ones, quickly and accurately. You can optimize with confidence, knowing how each change affects the performance of your code.
- https://docs.rs/criterion/latest/criterion/

0xdevalias · 2023-12-01T08:20:56Z

ast grep also uses tree sitter underlying, providing a robust matcher and replacer.

Since I got curious about the specifics of ast-grep's matcher/replacer, a few resources related to the higher level docs for it:

https://ast-grep.github.io/guide/pattern-syntax.html
- In this guide we will walk through ast-grep's pattern syntax.
- Pattern is a fast and easy way to match code. But it is not as powerful as rule which can match code with more precise selector or more context.
https://ast-grep.github.io/guide/rule-config.html#rule-file
https://ast-grep.github.io/reference/rule.html
- Atomic rules are the most basic rules to match AST nodes. Relational rules filter matched target according to their position relative to other nodes. Composite rules use logic operation all/any/not to compose the above rules to larger rules.
https://ast-grep.github.io/guide/rewrite-code.html
- One of the powers of ast-grep is that it can not only find code patterns, but also transform them into new code.
https://ast-grep.github.io/catalog/

And digging into the code (as it might be useful to understand aspects of it if going down the ast-grep and/or tree-sitter path):

https://github.com/ast-grep/ast-grep/tree/main/crates/core/src
- https://github.com/ast-grep/ast-grep/blob/main/crates/core/src/lib.rs
  - This module contains the core library for ast-grep.
    
    It provides APIs for parsing, traversing, searching and replacing tree-sitter nodes.
    Usually you will only need ast-grep CLI instead of this crate.
    But if you want to use ast-grep as a library, this is the right place.
- https://github.com/ast-grep/ast-grep/blob/main/crates/core/src/traversal.rs
  - Traverse Node AST
    
    ast-grep supports common tree traversal algorithms, including
    - Pre order traversal
    - Post order traversal
    - Level order traversal
- https://github.com/ast-grep/ast-grep/blob/main/crates/core/src/node.rs
  - Seems to be some higher level API for interacting/matching/etc on nodes
- https://github.com/ast-grep/ast-grep/blob/main/crates/core/src/ops.rs
  - Seems to include some higher level operations like and/not/etc
- https://github.com/ast-grep/ast-grep/blob/main/crates/core/src/meta_var.rs
  - Related to 'meta var' matching/etc
- https://github.com/ast-grep/ast-grep/blob/main/crates/core/src/matcher.rs
  - This module defines the core Matcher trait in ast-grep.
    
    Matcher has three notable implementations in this module:
    - Pattern: matches against a tree-sitter node based on its tree structure.
    - KindMatcher: matches a node based on its kind
    - RegexMatcher: matches a node based on its textual content using regex.
- https://github.com/ast-grep/ast-grep/tree/main/crates/core/src/matcher
  - Seems to be even more lower-level/specific elements related to the matcher, including: kind, node_match, pattern, text
- https://github.com/ast-grep/ast-grep/blob/main/crates/core/src/replacer.rs
- https://github.com/ast-grep/ast-grep/tree/main/crates/core/src/replacer
- etc

This might also be an interesting/useful read:

https://betterprogramming.pub/deep-dive-into-ast-greps-pattern-7efc3eefc7c3
- A Deep Dive Into ast-grep’s Pattern

0xdevalias · 2023-12-04T05:28:16Z

There's a similar 'performance' related issue over on the j4k0xb/webcrack repo:

There are some js to c/wasm transpilers but: j4k0xb/webcrack#24

While the full thread may be of interest, here are some particularly relevant snippets based on some of the alternative parsers/etc they have explored:

that the swc / tree-sitter / etc Rust/etc parsers can apparently be used from JS apps

I experimented a while ago with using rust parsers and converting back to the babel AST format: https://github.com/coderaiser/swc-to-babel

The parsing is faster but unfortunately transferring all the objects to the JS context has so much overhead that its slower than the babel parser. So it should most likely be entirely JS or a native language, not mixed.

Another WIP JS-only attempt with https://github.com/meriyah/meriyah + https://github.com/j4k0xb/estree-to-babel/tree/perf could speed up parsing from 1.5s to 900ms for a large file

Not sure if its worth it since parsing takes up like 2% and transforms 98% of the time

Originally posted by @j4k0xb in j4k0xb/webcrack#24 (comment)

How did you go about transfering the objects back to JS out of curiosity?

Its serializing it with serde_json::to_string and deserializing with JSON.parse.

Idk how to benchmark how long the native parsing vs serializing takes, here's the combined time for a 1.2MB bundle (380k nodes):

swc parse: 959 ms (bindings.parseSync 491 ms + JSON.parse 466 ms)

babel parse: 442 ms

meriyah parse: 171 ms

or alternatively, maybe a way to keep the actual AST objects within the rust side of things, but be able to manipulate them from JS functions/similar?

Definitely a possibility, but again the whole project would have to be rewritten in rust

sometimes it might take an unreasonable amount of time and/or memory to try and unbundle them

Do you have examples where this happens? The current time/memory usage is not that unreasonable imo:

1.2MB webpack bundle: 1.1s unpack, 6s total, 450mb memory

7.2MB obfuscated browserify bundle: 0.3s unpack, 14s total, 1.1GB memory

8MB webpack bundle: 4s unpack, 17s total, 1.2GB memory

Originally posted by @j4k0xb in j4k0xb/webcrack#24 (comment)

0xdevalias · 2023-12-04T05:46:10Z

And here are some ideas I've had/been exploring around the potential for how to 'ease the overhead' of crossing back and forth between the JS/Rust side of things; and to avoid needing to re-write the entire project in Rust to benefit from rust-based parsers:

Definitely a possibility, but again the whole project would have to be rewritten in rust

Yeah.. that would be less than ideal.

I don't know how possible it is, or if there are existing libraries/etc that would make it easier, but my original thought in this area was if it was possible to basically 'describe the transformation to be made' on the JS side of things, and then pass that through to the rust side for it to do the 'heavy lifting'.

If that were possible, then it should remove the need to convert the whole project to rust, and also would remove the performance penalty of needing to pass the whole AST back and forth across the rust/JS boundary.

The 2 ways I was initially pondering that this might be possible were:

a rust library that exposes an 'AST transformation DSL', such that the JS side could use that DSL to describe the transformations, then pass those to the rust side of things to actually process/run

a rust library that can run/interpret some basic JS code within rust itself, and use that to apply the transformations (that are still written in JS)

There was also one more idea I was pondering, though not sure if it would be impractical due to needing to cross back and forth across the rust/JS boundary (this idea is less thought through than the others above; and realistically could probably work with/alongside them, particularly the DSL idea):

somehow do the main 'AST walking'/modification on the rust side of things, but be able to specify when it should pass a smaller subset of the AST back to the JS side for more specific processing

Some related ideas/exploration based on the above potentials was:

looking into using tree-sitter's API for matching/walking, and then implementing some kind of transform + codegen on top of that; e.g.:

Some comments exploring this on the wakaru repo:

Performance improvement and potential tools to adopt #35 (comment)

Exploring the tree-sitter query API

https://tree-sitter.github.io/tree-sitter/using-parsers#walking-trees-with-tree-cursors

https://tree-sitter.github.io/tree-sitter/using-parsers#pattern-matching-with-queries

https://tree-sitter.github.io/tree-sitter/using-parsers#editing

thought: using tree-sitter for AST transformation/codegen tree-sitter/tree-sitter#642 (comment)

Editing AST tree-sitter/tree-sitter#2553

Modifying the Syntax Tree tree-sitter/tree-sitter#2667

This thread also links to the following YouTube video (which I haven't watched yet, otherwise I would link to a more specifically relevant location within it):

you can also look an interesting talk where also an another project author mentions how they use Tree-sitter for edits https://www.youtube.com/watch?v=NcUJnmBqHTY

looking at ast-grep, and how they implement their matchers/replacers, either to use directly, or as a concept for building a more generic DSL for it:

https://betterprogramming.pub/deep-dive-into-ast-greps-pattern-7efc3eefc7c3

A Deep Dive Into ast-grep’s Pattern

Some comments exploring this on the wakaru repo:

Performance improvement and potential tools to adopt #35 (comment)

https://ast-grep.github.io/reference/api.html

ast-grep currently has an experimental API for Node.js

Performance improvement and potential tools to adopt #35 (comment)

The benchmark shows that ast-grep can be 3-4 times faster and require a smaller memory footprint. However, at present, we only have napi binding that can only run on the CLI. A WASM build is required for the browser environment (playground). I'm not too sure about the future direction of this.

It's important to note that while ast-grep fulfills many of our requirements, we do not plan to rewrite complex rules using it. Complex rules often involve scope awareness, advanced structure matching, and AST node building. Therefore, it's challenging to determine whether the overall performance gains will justify the effort required for integration.

Performance improvement and potential tools to adopt #35 (comment)

Some notes on ast-grep's support for wasm builds

Performance improvement and potential tools to adopt #35 (comment)

Some notes on benchmarks with ast-grep / tree-sitter

Performance improvement and potential tools to adopt #35 (comment)

Some notes diving deeper into ast-grep's source code to understand how their match/replace API works on top of tree-sitter; as it theoretically already provides a DSL that removes the need to cross back and forth between JS/rust boundaries

considering whether boa could be used as an embedded JS interpreter to be able to run JS based transformation rules 'within' the rust side of things, and thus avoiding the cross-over back and forth

https://github.com/boa-dev/boa

Boa is an embeddable and experimental Javascript engine written in Rust. Currently, it has support for some of the language.

This is an experimental Javascript lexer, parser and interpreter written in Rust. Currently, it has support for some of the language.

https://boajs.dev/

Boa is an experimental Javascript lexer, parser and compiler written in Rust. Currently, it has support for some of the language. It can be embedded in Rust projects fairly easily and also used from the command line. Boa also exists to serve as a Rust implementation of the EcmaScript specification, there will be areas where we can utilise Rust and its fantastic ecosystem to make a fast, concurrent and safe engine.

https://boajs.dev/boa/playground/

https://boajs.dev/posts/2022-10-24-boa-usage/

Adding a JavaScript interpreter to your Rust project

https://boajs.dev/boa/dev/bench/

Boa Benchmarks

https://boajs.dev/boa/test262/

EcmaScript conformance test results for Boa

Originally posted by @0xdevalias in j4k0xb/webcrack#24 (comment)

pionxzh · 2023-12-04T14:34:45Z

Thanks for the insight. https://github.com/meriyah/meriyah looks interesting. I will spend some time looking into it.

I wonder if there are any methods that could be used to optimize that in a way that would speed things up; or alternatively, maybe a way to keep the actual AST objects within the rust side of things, but be able to manipulate them from JS functions/similar?

Actually that's how ast-grep works. Here is a quote from Benchmark TypeScript Parsers: Demystify Rust Tooling Performance:

Tree-sitter and ast-grep avoid serde overhead by returning a tree object rather than a full AST structure. Accessing tree nodes requires invoking Rust methods from JavaScript, which distributes the cost over the reading process.

btw, we are kind of off the topic in this long thread lol. Let me update the title.

0xdevalias · 2023-12-04T22:57:21Z

Thanks for the insight

@pionxzh No worries :)

I will spend some time looking into it.

@pionxzh Sounds good. Would be interested to hear what you think of it once you do.

Actually that's how ast-grep works.

@pionxzh Oh true. I was thinking it probably did, particularly after skimming the source the other day (Ref), but I wasn't 100% sure still.

Here is a quote from Benchmark TypeScript Parsers: Demystify Rust Tooling Performance:

tree-sitter and ast-grep avoid serde overhead by returning a tree object rather than a full AST structure. Accessing tree nodes requires invoking Rust methods from JavaScript, which distributes the cost over the reading process.

@pionxzh That's super interesting and neat to know. For future reference, here's a link to the non-medium-paywalled version of that article (Ref) It was a really interesting read in general!

The results of the benchmarks for synchronous performance were pretty interesting, I definitely didn't expect swc to perform so poorly in general, nor for babel to perform unexpectedly so much better on the medium-sized file compared to most other things (I wonder what made it perform so well there?). It was also interesting to see that ast-grep consistently beat tree-sitter on it's own, despite it using tree-sitter as it's parser.

It was also really interesting in the async parsing to see just how much ast-grep seems to dominate everything; again seeming to perform way better than the tree-sitter parser it's built on top of:

These notes towards the end were also interesting/worth paying attention to:

Native Parser Performance Tricks

tree-sitter & ast-grep' Edge

These parsers manage to bypass serde costs post-parsing by returning a Rust object wrapper to Node.js. This strategy, while efficient, can lead to slower AST access in JavaScript as the cost is amortized over the reading phase.

ast-grep's async advantage:

ast-grep's performance in concurrent parsing scenarios is largely due to its utilization of multiple libuv threads. By default, the libuv thread pool size is set to four, but there's potential to enhance performance further by expanding the thread pool size, thus fully leveraging the available CPU cores.

Cool to see this note at the very end aligns with one of the thoughts I had too!

Shifting Workloads to Rust: The creation of a domain-specific language (DSL) tailored for AST node querying could shift a greater portion of computational work to the Rust side, enhancing overall efficiency.

we are kind of off the topic in this long thread lol. Let me update the title.

@pionxzh Haha yeah, I was thinking that recently as well, it certainly seems to have evolved into more of a general 'performance optimisation' exploration.

Was skimming through the ast-grep repo/issues, and the following looked like potentially interesting ones to watch for future improvements (both features, and performance). The tl;dr is that they seem to have a lot of little performance improvement ideas in mind already:

Roadmap: https://github.com/orgs/ast-grep/projects/2
[feature] Implement more powerful fix methods ast-grep/ast-grep#656
[feature] add magic string support for ast-grep-napi ast-grep/ast-grep#140
More powerful tree-sitter analysis ast-grep/ast-grep#524
[refactor] move tree-sitter-wasm related crates into one monorepo ast-grep/ast-grep#414
[perf] benchmark and profiling ast-grep core matching logic ast-grep/ast-grep#144
- https://medium.com/source-and-buggy/data-driven-performance-optimization-with-rust-and-miri-70cb6dde0d35
- [perf] benchmark and profiling ast-grep core matching logic ast-grep/ast-grep#144 (comment)
  - The tl;dr of that blog seemed to be that they first used flamegraph, which was mildly useful, but not specifically detailed enough:
    - https://github.com/flamegraph-rs/flamegraph
      
      A Rust-powered flamegraph generator with additional support for Cargo projects! It can be used to profile anything, not just Rust projects!
    And then moved on to using miri and a less-obvious feature of it to generate a far more detailed trace, that could then be analysed within Google Chrome's DevTools performance tab:
    - https://github.com/rust-lang/miri
      
      An experimental interpreter for Rust's mid-level intermediate representation (MIR)
      
      https://github.com/rust-lang/miri#miri--z-flags-and-environment-variables
      
      -Zmiri-measureme=<name> enables measureme profiling for the interpreted program. This can be used to find which parts of your program are executing slowly under Miri. The profile is written out to a file inside a directory called <name>, and can be processed using the tools in the repository https://github.com/rust-lang/measureme
      
      https://github.com/rust-lang/measureme#crox
      
      crox turns measureme profiling data into files that can be visualized by the Chromium performance tools.
      
      https://github.com/rust-lang/measureme/tree/master/crox#readme
[perf] use AHash for hashmap ast-grep/ast-grep#449
[perf] optimize match_tree's implementation ast-grep/ast-grep#377
[perf] optimize extract_meta_var ast-grep/ast-grep#355
[perf] use &'static Node to improve pattern performance ast-grep/ast-grep#345
[perf] use Pre traversal to reduce stack usage in Has operator ast-grep/ast-grep#194
[feature] try jumprope for better editing performance ast-grep/ast-grep#273
- https://github.com/josephg/jumprope-rs
  - JumpRope
    
    Because inserting into a string should be fast.
    
    A rope is a data structure for efficiently editing large strings, or for processing editing traces.
    
    As far as I know, JumpRope is the world's fastest rope implementation.
    
    Unlike traditional strings, JumpRope allows you to:
    - Efficiently insert or delete arbitrary keystrokes from anywhere in the document. Using real world editing traces, jumprope can process about 35-40 million edits per second.
    - Index using unicode character offsets or wchar offsets (like you find in JS and other languages). Jumprope can efficiently convert between these formats.
    JumpRope is optimized for large strings like source code files and text documents. If your strings are very small (less than 100 bytes), you should probably just use Rust's built in std String or a small-string-optimized string library like SmartString.
[perf] reduce bitset allocations in potential_kinds ast-grep/ast-grep#203
TypeScript Support for Generic SgNode ast-grep/ast-grep#48
[pipedream] add type infromation to ast-grep with LSIF ast-grep/ast-grep#433
- https://comby.dev/blog/2022/08/31/comby-with-types
  - To that end, I really just want comby to use an external service that answers: What is the type of the value at this location in the file?
    
    Type-information-as-a-service is ideal because then I can keep the convenience of matching syntax easily, and outsource accessing language-specific semantic properties as needed. The concept is to expose a property on any matched variable that corresponds to type information (if available).
  - I know of only two high-level solutions resembling a service that conveniently provides type information today, and which aim to eventually support all languages. The first is language server implementations of LSP (Language Server Protocol). Language servers expose all kinds of information, and typically include type information in hover tooltips found in editors. With LSP there is no authoritative, centralized language server that you can just query. Instead, language servers are configured for specific projects and either run locally on your machine or get managed by an organization.
    
    The second is a central, public-facing service maintained on Sourcegraph.com that exposes hover information similar to the LSP format (disclaimer: I work at Sourcegraph, but my day job is unrelated to this service). The key difference is that Sourcegraph automatically processes popular repositories on GitHub (currently about 45K) with support for TypeScript, Go, Java, Scala, and Kotlin. It’s also possible to process your own projects (locally, in CI, or GH actions) and upload the data. Once that happens, you can efficently browse or query type information for variables on hover.
    
    This is exactly the kind of information I need to tell whether a matched variable is a slice, map, channel, or pointer. For supported projects, you just fire a GQL request to get the hover content.
  - Which projects and languages does this work for?
    
    If a repository is processed with precise type-on-hover on Sourcegraph.com, accessing that information via comby should just work. This means you can use tools to process and upload this information with your own TypeScript, Java, Scala, Kotlin, or Go projects to Sourcegraph and comby will be able to access hover information for them. These tools encode type information and other data using SCIP, which is related to the Language Server Index Format (LSIF/LSP) but with efficiency in mind.
- https://microsoft.github.io/language-server-protocol/overviews/lsif/overview/
  - What is the Language Server Index Format?
  - https://github.com/Microsoft/lsif-node
    - Language Server Index Format
      The purpose of the Language Server Index Format (LSIF) is it to define a standard format for language servers or other programming tools to dump their knowledge about a workspace. This dump can later be used to answer language server LSP requests for the same workspace without running the language server itself. Since much of the information would be invalidated by a change to the workspace, the dumped information typically excludes requests used when mutating a document. So, for example, the result of a code complete request is typically not part of such a dump.
- https://about.sourcegraph.com/blog/announcing-scip
  - SCIP - a better code indexing format than LSIF

pionxzh · 2024-01-21T06:44:53Z

@StringKe

Data is now stored in IndexedDB, no more size limitation. Playground refactor #106
File size checker and the --max-old-space-size hint has been implemented.

This comment was marked as outdated.

Sign in to view

pionxzh added the enhancement New feature or request label Nov 14, 2023

0xdevalias mentioned this issue Nov 20, 2023

wakaru IDE #45

Open

0xdevalias mentioned this issue Nov 20, 2023

There are some js to c/wasm transpilers but: j4k0xb/webcrack#24

Open

0xdevalias mentioned this issue Dec 1, 2023

Publish wasm oxc-project/oxc#301

Closed

pionxzh changed the title ~~Split code and save progress?~~ Performance improvement and potential tools to adopt Dec 4, 2023

pionxzh mentioned this issue Dec 4, 2023

Split code and save progress? OOM? #67

Closed

4 tasks

pionxzh added discussion and removed enhancement New feature or request labels Dec 4, 2023

pionxzh mentioned this issue Feb 14, 2024

feat: integrate ast-grep #111

Open

Performance improvement and potential tools to adopt #35

Performance improvement and potential tools to adopt #35

Comments

StringKe commented Nov 10, 2023

pionxzh commented Nov 10, 2023

StringKe commented Nov 11, 2023

This comment was marked as outdated.

pionxzh commented Nov 14, 2023

0xdevalias commented Nov 16, 2023

pionxzh commented Nov 16, 2023

0xdevalias commented Nov 17, 2023 • edited

tree-sitter / web-tree-sitter

swc / @swc/wasm-web / swc_ecma_parser

ast-grep

Etc

pionxzh commented Nov 19, 2023

0xdevalias commented Nov 19, 2023

pionxzh commented Nov 20, 2023

StringKe commented Nov 20, 2023

pionxzh commented Nov 20, 2023

pionxzh commented Nov 20, 2023

0xdevalias commented Nov 20, 2023

pionxzh commented Nov 20, 2023

0xdevalias commented Nov 20, 2023 • edited

1. Node.js Built-in Profiler

2. Convert Profiling Data for Chrome DevTools

3. Visualizing with Chrome DevTools

4. Using Third-party Tools

a) 0x

b) clinic.js

5. Analyzing the Data

6. Optimizing Based on Data

Additional Tips

pionxzh commented Nov 28, 2023

pionxzh commented Nov 29, 2023

0xdevalias commented Nov 29, 2023 • edited

pionxzh commented Nov 29, 2023 • edited

0xdevalias commented Nov 29, 2023 • edited

pionxzh commented Nov 30, 2023 • edited

0xdevalias commented Dec 1, 2023 • edited

pionxzh commented Dec 1, 2023

0xdevalias commented Dec 1, 2023 • edited

pionxzh commented Dec 1, 2023

0xdevalias commented Dec 1, 2023 • edited

0xdevalias commented Dec 1, 2023 • edited

0xdevalias commented Dec 4, 2023 • edited

0xdevalias commented Dec 4, 2023 • edited

pionxzh commented Dec 4, 2023

0xdevalias commented Dec 4, 2023 • edited

Native Parser Performance Tricks

pionxzh commented Jan 21, 2024

0xdevalias commented Nov 17, 2023 •

edited

`tree-sitter` / `web-tree-sitter`

`swc` / `@swc/wasm-web` / `swc_ecma_parser`

`ast-grep`

0xdevalias commented Nov 20, 2023 •

edited

a) `0x`

b) `clinic.js`

0xdevalias commented Nov 29, 2023 •

edited

pionxzh commented Nov 29, 2023 •

edited

0xdevalias commented Nov 29, 2023 •

edited

pionxzh commented Nov 30, 2023 •

edited

0xdevalias commented Dec 1, 2023 •

edited

0xdevalias commented Dec 1, 2023 •

edited

0xdevalias commented Dec 1, 2023 •

edited

0xdevalias commented Dec 1, 2023 •

edited

0xdevalias commented Dec 4, 2023 •

edited

0xdevalias commented Dec 4, 2023 •

edited

0xdevalias commented Dec 4, 2023 •

edited