Performance of async printers (was: Adopt `swc` parser?) #13158

fisker · 2022-07-23T14:16:03Z

I don't know how faster can it be.

The parser is not the really problem slowing us down, but should be faster.

Hope we can rewrite our print in Rust? Maybe?

cspotcode · 2022-07-23T15:36:46Z

A collaboration with dprint might make sense for this, since it already uses swc's parser and is written in rust. Could be a prettier mode within dprint or something.

sosukesuzuki · 2022-07-23T15:40:07Z

FYI previously I tried to use swc(wasm) parser for Prettier but it could not improve perf

cspotcode · 2022-07-23T15:42:27Z

How did that look? Was swc parsing, and then somehow a JS AST was being generated from that? I assume pretty slow to convert the rust structs to JS objects.

fisker · 2022-07-23T15:46:02Z

Prettier spend much more time in print than the parse step.

fisker · 2022-07-23T15:46:33Z

FYI previously I tried to use swc(wasm) parser for Prettier but it could not improve perf

Do you have any numbers to share?

sosukesuzuki · 2022-07-23T15:48:27Z

Sorry it was in my laptop..., and I've switched laptop so I lost the branch

cspotcode · 2022-07-23T15:49:00Z

Do you have information about #13158 (comment)?

Was the rust AST converted to JS objects?

sosukesuzuki · 2022-07-23T15:51:09Z

Maybe there are several overhead in SWC ESTree parser.

Convert SWC AST to ESTree
Bring Wasm land obj to JS land

I think SWC wasm parser cannot improve perf for us for above reasons

cspotcode · 2022-07-23T15:52:14Z

Right, so for this idea to be viable, prettier's printer needs to be ported to rust.

cspotcode · 2022-07-23T16:13:46Z

I'm no expert, but I see lots of printer logic does stuff like this: print('id') I assume that's quite a challenge for the VM to optimize those calls, as opposed to something like printer.printIdent(node.id) which can be monomorphic and inlined? I'm no expert here, jumping in cold. But I'm guessing there are some inefficiencies baked into this printer design that affect performance.

cspotcode · 2022-07-23T17:20:26Z

Another performance-related question: was performance measured before and after for #13104?
I'm worried all those awaits slowed things down.

fisker · 2022-07-23T18:32:42Z

Yes, I did, there are no noticeable differences.

cspotcode · 2022-07-23T18:39:01Z

My benchmarking shows a 10-20% drop in performance due to #13104. Is there a difference between our testing methodologies?

❯ hyperfine 'node ./bench.mjs before 100 10 parallel' 'node ./bench.mjs after 100 10 parallel' 'node ./bench.mjs before 100 10 serial' 'node ./bench.mjs after 100 10 serial' --show-output
Benchmark #1: node ./bench.mjs before 100 10 parallel
  Time (mean ± σ):     25.564 s ±  0.153 s    [User: 34.482 s, System: 0.650 s]
  Range (min … max):   25.387 s … 25.891 s    10 runs
 
Benchmark #2: node ./bench.mjs after 100 10 parallel
  Time (mean ± σ):     29.335 s ±  0.125 s    [User: 41.025 s, System: 0.936 s]
  Range (min … max):   29.153 s … 29.550 s    10 runs
 
Benchmark #3: node ./bench.mjs before 100 10 serial
  Time (mean ± σ):     25.446 s ±  0.179 s    [User: 34.401 s, System: 0.542 s]
  Range (min … max):   25.169 s … 25.828 s    10 runs
 
Benchmark #4: node ./bench.mjs after 100 10 serial
  Time (mean ± σ):     32.375 s ±  0.211 s    [User: 42.611 s, System: 0.776 s]
  Range (min … max):   32.013 s … 32.588 s    10 runs
 
Summary
  'node ./bench.mjs before 100 10 serial' ran
    1.00 ± 0.01 times faster than 'node ./bench.mjs before 100 10 parallel'
    1.15 ± 0.01 times faster than 'node ./bench.mjs after 100 10 parallel'
    1.27 ± 0.01 times faster than 'node ./bench.mjs after 100 10 serial'

benchmark.sh

#!/usr/bin/env bash

doSetup=0

if [ doSetup = 1 ]
then
    git worktree add before
    git worktree add after
    (
        cd before
        git checkout 8dae4abf41e1ccec3515e8acea159505bbd55f33~1
        yarn
        yarn build
    )
    (
        cd after
        git checkout 8dae4abf41e1ccec3515e8acea159505bbd55f33
        yarn
        yarn build
    )
fi

#https://github.com/sharkdp/hyperfine
hyperfine 'node ./bench.mjs before 1000' 'node ./bench.mjs after 1000'

bench.mjs

const [, , version, groupCountString, groupSizeString, method] = process.argv;
const groupCount = +groupCountString;
const groupSize = +groupSizeString;
const prettierApi = await import(`./${version}/index.js`);
import {readFileSync} from 'fs';

const printResults = false;
const sourceText = readFileSync('./before/src/main/comments.js', 'utf8');

for(let i = 0; i < groupCount; i++) {
    if(method === 'serial') {
        for(let i = 0; i < groupSize; i++) {
            const result = await prettierApi.format(sourceText, {
                parser: 'typescript'
            });
            if(printResults) console.log(result);
        }
    }
    if(method === 'parallel') {
        const promises = [];
        for(let i = 0; i < groupSize; i++) {
            promises.push(prettierApi.format(sourceText, {
                parser: 'typescript'
            }));
        }
        await Promise.allSettled(promises);
    }
}

liuxingbaoyu · 2022-07-23T18:43:53Z

I think it's hard to just migrate a part of the logic to rust or wasm to improve speed.
Because even js interacts with v8 native (c++), it is slow enough for object serialization/deserialization.

fisker · 2022-07-23T18:57:55Z

https://github.com/prettier/prettier/blob/next/CONTRIBUTING.md#performance

I ran this script on both next and main branch on .js and .md files. I choosed .md because we happened have a big CHANGELOG.md in project root, and the markdown parser is also async now.

thorn0 · 2022-07-23T20:17:07Z

@cspotcode To test the built version, you need to change this:

-const prettierApi = await import(`./${version}/index.js`);
+const prettierApi = await import(`./${version}/dist/index.js`);

cspotcode · 2022-07-23T22:32:40Z

Even after updating my benchmark to use the built version, I get similar results. But I'll try the scripts in CONTRIBUTING.md to see if they yield different results.

❯ hyperfine 'node ./bench.mjs before 100 10 parallel' 'node ./bench.mjs after 100 10 parallel' 'node ./bench.mjs before 100 10 serial' 'node ./bench.mjs after 100 10 serial' Benchmark #1: node ./bench.mjs before 100 10 parallel Time (mean ± σ): 24.749 s ± 0.764 s [User: 35.986 s, System: 0.871 s] Range (min … max): 23.937 s … 25.806 s 10 runs Benchmark #2: node ./bench.mjs after 100 10 parallel Time (mean ± σ): 28.362 s ± 0.176 s [User: 40.760 s, System: 0.982 s] Range (min … max): 28.041 s … 28.607 s 10 runs Benchmark #3: node ./bench.mjs before 100 10 serial Time (mean ± σ): 23.774 s ± 0.189 s [User: 32.920 s, System: 0.410 s] Range (min … max): 23.464 s … 23.995 s 10 runs Benchmark #4: node ./bench.mjs after 100 10 serial Time (mean ± σ): 30.384 s ± 0.172 s [User: 41.535 s, System: 0.724 s] Range (min … max): 30.112 s … 30.632 s 10 runs Summary 'node ./bench.mjs before 100 10 serial' ran 1.04 ± 0.03 times faster than 'node ./bench.mjs before 100 10 parallel' 1.19 ± 0.01 times faster than 'node ./bench.mjs after 100 10 parallel' 1.28 ± 0.01 times faster than 'node ./bench.mjs after 100 10 serial'

cspotcode · 2022-07-23T22:44:05Z

I still get a big difference between 6c722c7990f4e8b03b1ed6c01fbbe749eaed0224 and 8dae4abf41e1ccec3515e8acea159505bbd55f33

52.3 ops/sec vs 37.5 ops/sec

"After" runs at 71% the speed of "before."

❯ for dir in before after ; do ( cd $dir ; NODE_ENV=production node ./dist/bin-prettier.js --debug-benchmark ../before/src/main/comments.js ) ; done
...
src/main/comments.js[debug] '--debug-benchmark' option found, measuring formatWithCursor with 'benchmark' module.
[debug] '--debug-benchmark' measurements for formatWithCursor: {
[debug]   "benchmark": "format x 52.30 ops/sec ±15.10% (70 runs sampled)",
[debug]   "hz": 52.30170071681723,
[debug]   "ms": 76.47934857142856
[debug] }
'--debug-benchmark' option found, skipped print code or write files.
...
[debug] '--debug-benchmark' measurements for formatWithCursor: {
[debug]   "benchmark": "format x 37.51 ops/sec ±13.36% (67 runs sampled)",
[debug]   "hz": 37.51091525644132,
[debug]   "ms": 79.97672089552238
[debug] }
'--debug-benchmark' option found, skipped print code or write files.

fisker · 2022-07-24T03:30:30Z

Thanks for testing, maybe I did something wrong, I'll verify again later. But the asynchronous parser support is requested for parsers, and I can't figure a way to do async parse and synchronous print, so maybe we'll have to accept that.

cspotcode · 2022-07-24T03:38:45Z

Usually, if function A calls function B, and if B is async, then A must be async, too. But the opposite is not true: if function A is async, function B can remain sync.

By that logic, is the problem that the printer must call the parser? Shouldn't parse happen first, and then print?

fisker · 2022-07-24T04:09:05Z

You are right, the problem is we support language in another language eg: css/js in html, so we'll call parse during print. I also thought about parse them before print, in that way we don't need async printer, but problem is we don't even know how to traverse AST, currently the AST print it's children itself. I thought about this for a long long time, before I made the move. Can't see a better solution.

cspotcode · 2022-07-24T04:15:33Z

I see, thanks for the explanation. As I understand it, embed does two things:

decide whether or not it should embed. For the majority of nodes, the answer is "no."
do the embedding: parse and print, return a Doc

I understand that step 2. must be async. But do any plugins require step 1. to be async?

If step 1. can remain sync, then maybe step 2. can be deferred. The Doc can get a placeholder <embed will go here> and embed can run async. But the rest of AST does not need to wait; it can continue to print() sync.

Once sync print() is complete, all pending embed operations are awaited and inserted into their placeholders.

The majority of cases are not going to embed. It seems an awful shame to reduce performance by 20% or 30%, rather than tweaking the embedding API to preserve that performance.

fisker · 2022-07-24T04:23:55Z

But the rest of AST does not need to wait;

Unfortunately they do, sometimes we need make decisions based on printed docs. Also the embed is allowed to fail, then we need call print as fallback, but maybe this can solve, we can wait another round.

fisker · 2022-07-24T04:25:57Z

It seems an awful shame to reduce performance by 20% or 30%, rather than tweaking the embedding API to preserve that performance.

I don't like that either, I'll be glad if someone can come up a solution.

thorn0 · 2022-07-24T07:52:56Z

Maybe something similar can be done to what React does with throwing promises. Something like this in ast-to-doc.js (pseudocode):

while (!doc) {
  try {
    doc = mainPrint();
  } catch (caught) {
    if (caught instanceof AsyncWorkBailout) {
      cache[caught.node] = await caught.promise;
      path.resetToRoot();
    } else {
      throw caught;
    }
  }
}

Docs for individual nodes are cached (not all of them, but that can be improved), so restarting the printer might be cheap. Still looks like a lot of overhead.

fisker · 2022-07-24T08:03:50Z

Interesting.

Still looks like a lot of overhead.

Agree.

thorn0 · 2022-07-24T10:23:35Z

I also thought about parse them before print, in that way we don't need async printer, but problem is we don't even know how to traverse AST, currently the AST print it's children itself.

This seems to be a better option. Traversing can be implemented in a generic way like its done when attaching comments.

thorn0 · 2022-08-01T09:46:13Z

I also thought about parse them before print, in that way we don't need async printer, but problem is we don't even know how to traverse AST, currently the AST print it's children itself.

I tried this approach (see thorn0@89f1ad2). The results are even worse:

  'node ./bench.mjs before 100 10 serial' ran
    1.05 ± 0.05 times faster than 'node ./bench.mjs before 100 10 parallel'
    1.27 ± 0.04 times faster than 'node ./bench.mjs after 100 10 parallel'
    1.34 ± 0.03 times faster than 'node ./bench.mjs after 100 10 serial'
    1.58 ± 0.03 times faster than 'node ./bench.mjs alt-async 100 10 parallel'
    1.65 ± 0.04 times faster than 'node ./bench.mjs alt-async 100 10 serial'

I might have missed some low-hanging-fruit optimization, but this async AST traversing seems to be really expensive.

fisker · 2022-08-01T09:57:44Z

Maybe we can still use sync traverse, and add promises to embeds, then await them after traverse?

thorn0 · 2022-08-01T11:10:49Z

IDK how to implement that properly. Consider this input:

function HelloWorld() {
        return html`
          <h3>Bar List</h3>
          ${bars.map(
            (bar) =>   html`
             <p >${bar}</p>
           `
          )}
        `;
      }

When embed for the outer html template literal is called, the promise for the inner one must be already resolved.

fisker · 2022-08-01T11:45:07Z

I guess we'll have to pass a different async print to embed , and use another internal embeds? I tried a little bit, but seems hard...

thorn0 · 2022-08-01T11:52:30Z

Seams to work, didn't benchmark it yet: thorn0@6749a2f

thorn0 · 2022-08-01T13:25:53Z

Not bad:

  'node ./bench.mjs before 100 10 serial' ran
    1.06 ± 0.05 times faster than 'node ./bench.mjs before 100 10 parallel'
    1.10 ± 0.03 times faster than 'node ./bench.mjs alt-async 100 10 serial'
    1.15 ± 0.03 times faster than 'node ./bench.mjs alt-async 100 10 parallel'
    1.24 ± 0.04 times faster than 'node ./bench.mjs after 100 10 parallel'
    1.31 ± 0.04 times faster than 'node ./bench.mjs after 100 10 serial'

It can be optimized by splitting printer.embed into migthBeEmbed (sync check) and embed (may be async, called only if mightBeEmbed returned true). This way we'll need to clone much fewer path stack arrays.

fisker · 2022-08-01T13:29:29Z

That's great!

thorn0 · 2022-08-01T19:16:36Z

Results for the mentioned optimization (thorn0@41e905d):

  'node ./bench.mjs before 100 10 serial' ran
    1.04 ± 0.06 times faster than 'node ./bench.mjs before 100 10 parallel'
    1.07 ± 0.04 times faster than 'node ./bench.mjs alt-async 100 10 serial'
    1.16 ± 0.05 times faster than 'node ./bench.mjs alt-async 100 10 parallel'
    1.25 ± 0.05 times faster than 'node ./bench.mjs after 100 10 parallel'
    1.32 ± 0.04 times faster than 'node ./bench.mjs after 100 10 serial'

I wonder what's going with the "parallel" results.

fisker · 2022-08-01T23:59:35Z

Seems not much difference than #13158 (comment)

thorn0 · 2022-08-02T05:00:15Z

Reused printer.massageAstNode.ignoredProperties, so that it doesn't recurse into properties like range (thorn0@1435322):

  'node ./bench.mjs before 100 10 serial' ran
    1.04 ± 0.03 times faster than 'node ./bench.mjs alt-async 100 10 serial'
    1.05 ± 0.05 times faster than 'node ./bench.mjs before 100 10 parallel'
    1.14 ± 0.05 times faster than 'node ./bench.mjs alt-async 100 10 parallel'
    1.25 ± 0.04 times faster than 'node ./bench.mjs after 100 10 parallel'
    1.32 ± 0.03 times faster than 'node ./bench.mjs after 100 10 serial'

Seems not much difference than

embed is sync in the JS parser most of the time. If it were defined as async function, the difference would be bigger.

thorn0 · 2022-08-02T05:34:57Z

Okay, I think it's good enough. thorn0@77ca54c

  'node ./bench.mjs before 100 10 serial' ran
    1.03 ± 0.03 times faster than 'node ./bench.mjs alt-async 100 10 serial'
    1.07 ± 0.05 times faster than 'node ./bench.mjs before 100 10 parallel'
    1.11 ± 0.03 times faster than 'node ./bench.mjs alt-async 100 10 parallel'
    1.23 ± 0.03 times faster than 'node ./bench.mjs after 100 10 parallel'
    1.32 ± 0.03 times faster than 'node ./bench.mjs after 100 10 serial'

thorn0 · 2022-08-02T08:39:52Z

@fisker @sosukesuzuki If you're okay with this solution (diff), how do we proceed? IMHO it doesn't make sense to open a PR against next as the revert is going to be really messy. I'd say it'd be better to start a new branch at 6c722c7 and open a PR against it. Once it's merged, we'll have to cherypick commits from next and relabel the new branch as next. WDYT?

fisker · 2022-08-02T08:52:02Z

I'm okay with your solution, there are already 67 commits after #13104. I prefer revert #13104

fisker · 2022-08-02T08:58:07Z

I'll try to revert it.

thorn0 · 2022-08-02T08:59:14Z

Great. I tried and gave up.

fisker · 2022-08-02T09:20:09Z

#13210

cspotcode · 2022-08-03T19:08:59Z

Meta thought: does it make sense to rename this issue? If we do that, it will be more intuitive when it appears in notifications.

sosukesuzuki · 2022-08-06T18:01:08Z

@thorn0 @fisker Can we close this because #13211 has been merged to next?

thorn0 · 2022-08-06T18:27:25Z

@sosukesuzuki Yes, let's close it. If someone wants to discuss Rust, please open a new issue.

fisker added status:needs discussion Issues needing discussion and a decision to be made before action can be taken type:meta Issues about the status of Prettier, or anything else about Prettier itself labels Jul 23, 2022

thorn0 mentioned this issue Aug 2, 2022

Remove detectEmbeddedLanguage thorn0/prettier#1

Merged

4 tasks

thorn0 mentioned this issue Aug 4, 2022

Support async parser for embedded languages #13211

Merged

4 tasks

thorn0 changed the title ~~Adopt swc parser?~~ Performance of async printers (was: Adopt swc parser?) Aug 4, 2022

thorn0 added the type:perf Issue with performance of Prettier label Aug 4, 2022

thorn0 closed this as completed Aug 6, 2022

liuxingbaoyu mentioned this issue Oct 24, 2022

worker: Add experimental SynchronousWorker implementation nodejs/node#45018

Closed

github-actions bot locked as resolved and limited conversation to collaborators Aug 7, 2023

Performance of async printers (was: Adopt swc parser?) #13158

Performance of async printers (was: Adopt swc parser?) #13158

Comments

fisker commented Jul 23, 2022

cspotcode commented Jul 23, 2022 • edited

sosukesuzuki commented Jul 23, 2022

cspotcode commented Jul 23, 2022

fisker commented Jul 23, 2022

fisker commented Jul 23, 2022

sosukesuzuki commented Jul 23, 2022

cspotcode commented Jul 23, 2022

sosukesuzuki commented Jul 23, 2022 • edited

cspotcode commented Jul 23, 2022

cspotcode commented Jul 23, 2022

cspotcode commented Jul 23, 2022

fisker commented Jul 23, 2022

cspotcode commented Jul 23, 2022 • edited by thorn0

liuxingbaoyu commented Jul 23, 2022

fisker commented Jul 23, 2022 • edited

thorn0 commented Jul 23, 2022

cspotcode commented Jul 23, 2022

cspotcode commented Jul 23, 2022

fisker commented Jul 24, 2022

cspotcode commented Jul 24, 2022

fisker commented Jul 24, 2022 • edited

cspotcode commented Jul 24, 2022

fisker commented Jul 24, 2022 • edited

fisker commented Jul 24, 2022

thorn0 commented Jul 24, 2022 • edited

fisker commented Jul 24, 2022

thorn0 commented Jul 24, 2022

thorn0 commented Aug 1, 2022

fisker commented Aug 1, 2022 • edited

thorn0 commented Aug 1, 2022

fisker commented Aug 1, 2022

thorn0 commented Aug 1, 2022 • edited

thorn0 commented Aug 1, 2022 • edited

fisker commented Aug 1, 2022

thorn0 commented Aug 1, 2022

fisker commented Aug 1, 2022

thorn0 commented Aug 2, 2022

thorn0 commented Aug 2, 2022

thorn0 commented Aug 2, 2022 • edited

fisker commented Aug 2, 2022

fisker commented Aug 2, 2022

thorn0 commented Aug 2, 2022

fisker commented Aug 2, 2022

cspotcode commented Aug 3, 2022

sosukesuzuki commented Aug 6, 2022

thorn0 commented Aug 6, 2022

Performance of async printers (was: Adopt `swc` parser?) #13158

Performance of async printers (was: Adopt `swc` parser?) #13158

cspotcode commented Jul 23, 2022 •

edited

sosukesuzuki commented Jul 23, 2022 •

edited

cspotcode commented Jul 23, 2022 •

edited by thorn0

fisker commented Jul 23, 2022 •

edited

fisker commented Jul 24, 2022 •

edited

fisker commented Jul 24, 2022 •

edited

thorn0 commented Jul 24, 2022 •

edited

fisker commented Aug 1, 2022 •

edited

thorn0 commented Aug 1, 2022 •

edited

thorn0 commented Aug 1, 2022 •

edited

thorn0 commented Aug 2, 2022 •

edited