Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance of async printers (was: Adopt swc parser?) #13158

Closed
fisker opened this issue Jul 23, 2022 · 46 comments
Closed

Performance of async printers (was: Adopt swc parser?) #13158

fisker opened this issue Jul 23, 2022 · 46 comments
Labels
status:needs discussion Issues needing discussion and a decision to be made before action can be taken type:meta Issues about the status of Prettier, or anything else about Prettier itself type:perf Issue with performance of Prettier

Comments

@fisker
Copy link
Member

fisker commented Jul 23, 2022

I don't know how faster can it be.

The parser is not the really problem slowing us down, but should be faster.

Hope we can rewrite our print in Rust? Maybe?

@fisker fisker added status:needs discussion Issues needing discussion and a decision to be made before action can be taken type:meta Issues about the status of Prettier, or anything else about Prettier itself labels Jul 23, 2022
@cspotcode
Copy link

cspotcode commented Jul 23, 2022

A collaboration with dprint might make sense for this, since it already uses swc's parser and is written in rust. Could be a prettier mode within dprint or something.

@sosukesuzuki
Copy link
Member

FYI previously I tried to use swc(wasm) parser for Prettier but it could not improve perf

@cspotcode
Copy link

How did that look? Was swc parsing, and then somehow a JS AST was being generated from that? I assume pretty slow to convert the rust structs to JS objects.

@fisker
Copy link
Member Author

fisker commented Jul 23, 2022

Prettier spend much more time in print than the parse step.

@fisker
Copy link
Member Author

fisker commented Jul 23, 2022

FYI previously I tried to use swc(wasm) parser for Prettier but it could not improve perf

Do you have any numbers to share?

@sosukesuzuki
Copy link
Member

Sorry it was in my laptop..., and I've switched laptop so I lost the branch

@cspotcode
Copy link

Do you have information about #13158 (comment)?

Was the rust AST converted to JS objects?

@sosukesuzuki
Copy link
Member

sosukesuzuki commented Jul 23, 2022

Maybe there are several overhead in SWC ESTree parser.

  1. Convert SWC AST to ESTree
  2. Bring Wasm land obj to JS land

I think SWC wasm parser cannot improve perf for us for above reasons

@cspotcode
Copy link

Right, so for this idea to be viable, prettier's printer needs to be ported to rust.

@cspotcode
Copy link

I'm no expert, but I see lots of printer logic does stuff like this: print('id') I assume that's quite a challenge for the VM to optimize those calls, as opposed to something like printer.printIdent(node.id) which can be monomorphic and inlined? I'm no expert here, jumping in cold. But I'm guessing there are some inefficiencies baked into this printer design that affect performance.

@cspotcode
Copy link

Another performance-related question: was performance measured before and after for #13104?
I'm worried all those awaits slowed things down.

@fisker
Copy link
Member Author

fisker commented Jul 23, 2022

Yes, I did, there are no noticeable differences.

@cspotcode
Copy link

cspotcode commented Jul 23, 2022

My benchmarking shows a 10-20% drop in performance due to #13104. Is there a difference between our testing methodologies?

❯ hyperfine 'node ./bench.mjs before 100 10 parallel' 'node ./bench.mjs after 100 10 parallel' 'node ./bench.mjs before 100 10 serial' 'node ./bench.mjs after 100 10 serial' --show-output
Benchmark #1: node ./bench.mjs before 100 10 parallel
  Time (mean ± σ):     25.564 s ±  0.153 s    [User: 34.482 s, System: 0.650 s]
  Range (min … max):   25.387 s … 25.891 s    10 runs
 
Benchmark #2: node ./bench.mjs after 100 10 parallel
  Time (mean ± σ):     29.335 s ±  0.125 s    [User: 41.025 s, System: 0.936 s]
  Range (min … max):   29.153 s … 29.550 s    10 runs
 
Benchmark #3: node ./bench.mjs before 100 10 serial
  Time (mean ± σ):     25.446 s ±  0.179 s    [User: 34.401 s, System: 0.542 s]
  Range (min … max):   25.169 s … 25.828 s    10 runs
 
Benchmark #4: node ./bench.mjs after 100 10 serial
  Time (mean ± σ):     32.375 s ±  0.211 s    [User: 42.611 s, System: 0.776 s]
  Range (min … max):   32.013 s … 32.588 s    10 runs
 
Summary
  'node ./bench.mjs before 100 10 serial' ran
    1.00 ± 0.01 times faster than 'node ./bench.mjs before 100 10 parallel'
    1.15 ± 0.01 times faster than 'node ./bench.mjs after 100 10 parallel'
    1.27 ± 0.01 times faster than 'node ./bench.mjs after 100 10 serial'

benchmark.sh
#!/usr/bin/env bash

doSetup=0

if [ doSetup = 1 ]
then
    git worktree add before
    git worktree add after
    (
        cd before
        git checkout 8dae4abf41e1ccec3515e8acea159505bbd55f33~1
        yarn
        yarn build
    )
    (
        cd after
        git checkout 8dae4abf41e1ccec3515e8acea159505bbd55f33
        yarn
        yarn build
    )
fi

#https://github.com/sharkdp/hyperfine
hyperfine 'node ./bench.mjs before 1000' 'node ./bench.mjs after 1000'
bench.mjs
const [, , version, groupCountString, groupSizeString, method] = process.argv;
const groupCount = +groupCountString;
const groupSize = +groupSizeString;
const prettierApi = await import(`./${version}/index.js`);
import {readFileSync} from 'fs';

const printResults = false;
const sourceText = readFileSync('./before/src/main/comments.js', 'utf8');

for(let i = 0; i < groupCount; i++) {
    if(method === 'serial') {
        for(let i = 0; i < groupSize; i++) {
            const result = await prettierApi.format(sourceText, {
                parser: 'typescript'
            });
            if(printResults) console.log(result);
        }
    }
    if(method === 'parallel') {
        const promises = [];
        for(let i = 0; i < groupSize; i++) {
            promises.push(prettierApi.format(sourceText, {
                parser: 'typescript'
            }));
        }
        await Promise.allSettled(promises);
    }
}

@liuxingbaoyu
Copy link
Contributor

I think it's hard to just migrate a part of the logic to rust or wasm to improve speed.
Because even js interacts with v8 native (c++), it is slow enough for object serialization/deserialization.

@fisker
Copy link
Member Author

fisker commented Jul 23, 2022

https://github.com/prettier/prettier/blob/next/CONTRIBUTING.md#performance

I ran this script on both next and main branch on .js and .md files. I choosed .md because we happened have a big CHANGELOG.md in project root, and the markdown parser is also async now.

@thorn0
Copy link
Member

thorn0 commented Jul 23, 2022

@cspotcode To test the built version, you need to change this:

-const prettierApi = await import(`./${version}/index.js`);
+const prettierApi = await import(`./${version}/dist/index.js`);

@cspotcode
Copy link

Even after updating my benchmark to use the built version, I get similar results. But I'll try the scripts in CONTRIBUTING.md to see if they yield different results.


❯ hyperfine 'node ./bench.mjs before 100 10 parallel' 'node ./bench.mjs after 100 10 parallel' 'node ./bench.mjs before 100 10 serial' 'node ./bench.mjs after 100 10 serial'
Benchmark #1: node ./bench.mjs before 100 10 parallel
  Time (mean ± σ):     24.749 s ±  0.764 s    [User: 35.986 s, System: 0.871 s]
  Range (min … max):   23.937 s … 25.806 s    10 runs

Benchmark #2: node ./bench.mjs after 100 10 parallel
Time (mean ± σ): 28.362 s ± 0.176 s [User: 40.760 s, System: 0.982 s]
Range (min … max): 28.041 s … 28.607 s 10 runs

Benchmark #3: node ./bench.mjs before 100 10 serial
Time (mean ± σ): 23.774 s ± 0.189 s [User: 32.920 s, System: 0.410 s]
Range (min … max): 23.464 s … 23.995 s 10 runs

Benchmark #4: node ./bench.mjs after 100 10 serial
Time (mean ± σ): 30.384 s ± 0.172 s [User: 41.535 s, System: 0.724 s]
Range (min … max): 30.112 s … 30.632 s 10 runs

Summary
'node ./bench.mjs before 100 10 serial' ran
1.04 ± 0.03 times faster than 'node ./bench.mjs before 100 10 parallel'
1.19 ± 0.01 times faster than 'node ./bench.mjs after 100 10 parallel'
1.28 ± 0.01 times faster than 'node ./bench.mjs after 100 10 serial'

@cspotcode
Copy link

I still get a big difference between 6c722c7990f4e8b03b1ed6c01fbbe749eaed0224 and 8dae4abf41e1ccec3515e8acea159505bbd55f33

52.3 ops/sec vs 37.5 ops/sec

"After" runs at 71% the speed of "before."

❯ for dir in before after ; do ( cd $dir ; NODE_ENV=production node ./dist/bin-prettier.js --debug-benchmark ../before/src/main/comments.js ) ; done
...
src/main/comments.js[debug] '--debug-benchmark' option found, measuring formatWithCursor with 'benchmark' module.
[debug] '--debug-benchmark' measurements for formatWithCursor: {
[debug]   "benchmark": "format x 52.30 ops/sec ±15.10% (70 runs sampled)",
[debug]   "hz": 52.30170071681723,
[debug]   "ms": 76.47934857142856
[debug] }
'--debug-benchmark' option found, skipped print code or write files.
...
[debug] '--debug-benchmark' measurements for formatWithCursor: {
[debug]   "benchmark": "format x 37.51 ops/sec ±13.36% (67 runs sampled)",
[debug]   "hz": 37.51091525644132,
[debug]   "ms": 79.97672089552238
[debug] }
'--debug-benchmark' option found, skipped print code or write files.

@fisker
Copy link
Member Author

fisker commented Jul 24, 2022

Thanks for testing, maybe I did something wrong, I'll verify again later. But the asynchronous parser support is requested for parsers, and I can't figure a way to do async parse and synchronous print, so maybe we'll have to accept that.

@cspotcode
Copy link

Usually, if function A calls function B, and if B is async, then A must be async, too. But the opposite is not true: if function A is async, function B can remain sync.

By that logic, is the problem that the printer must call the parser? Shouldn't parse happen first, and then print?

@fisker
Copy link
Member Author

fisker commented Jul 24, 2022

You are right, the problem is we support language in another language eg: css/js in html, so we'll call parse during print. I also thought about parse them before print, in that way we don't need async printer, but problem is we don't even know how to traverse AST, currently the AST print it's children itself. I thought about this for a long long time, before I made the move. Can't see a better solution.

@cspotcode
Copy link

I see, thanks for the explanation. As I understand it, embed does two things:

  1. decide whether or not it should embed. For the majority of nodes, the answer is "no."
  2. do the embedding: parse and print, return a Doc

I understand that step 2. must be async. But do any plugins require step 1. to be async?

If step 1. can remain sync, then maybe step 2. can be deferred. The Doc can get a placeholder <embed will go here> and embed can run async. But the rest of AST does not need to wait; it can continue to print() sync.

Once sync print() is complete, all pending embed operations are awaited and inserted into their placeholders.

The majority of cases are not going to embed. It seems an awful shame to reduce performance by 20% or 30%, rather than tweaking the embedding API to preserve that performance.

@fisker
Copy link
Member Author

fisker commented Jul 24, 2022

But the rest of AST does not need to wait;

Unfortunately they do, sometimes we need make decisions based on printed docs. Also the embed is allowed to fail, then we need call print as fallback, but maybe this can solve, we can wait another round.

@fisker
Copy link
Member Author

fisker commented Jul 24, 2022

It seems an awful shame to reduce performance by 20% or 30%, rather than tweaking the embedding API to preserve that performance.

I don't like that either, I'll be glad if someone can come up a solution.

@thorn0
Copy link
Member

thorn0 commented Jul 24, 2022

Maybe something similar can be done to what React does with throwing promises. Something like this in ast-to-doc.js (pseudocode):

while (!doc) {
  try {
    doc = mainPrint();
  } catch (caught) {
    if (caught instanceof AsyncWorkBailout) {
      cache[caught.node] = await caught.promise;
      path.resetToRoot();
    } else {
      throw caught;
    }
  }
}

Docs for individual nodes are cached (not all of them, but that can be improved), so restarting the printer might be cheap. Still looks like a lot of overhead.

@fisker
Copy link
Member Author

fisker commented Jul 24, 2022

Interesting.

Still looks like a lot of overhead.

Agree.

@thorn0
Copy link
Member

thorn0 commented Jul 24, 2022

I also thought about parse them before print, in that way we don't need async printer, but problem is we don't even know how to traverse AST, currently the AST print it's children itself.

This seems to be a better option. Traversing can be implemented in a generic way like its done when attaching comments.

@thorn0
Copy link
Member

thorn0 commented Aug 1, 2022

I also thought about parse them before print, in that way we don't need async printer, but problem is we don't even know how to traverse AST, currently the AST print it's children itself.

I tried this approach (see thorn0@89f1ad2). The results are even worse:

  'node ./bench.mjs before 100 10 serial' ran
    1.05 ± 0.05 times faster than 'node ./bench.mjs before 100 10 parallel'
    1.27 ± 0.04 times faster than 'node ./bench.mjs after 100 10 parallel'
    1.34 ± 0.03 times faster than 'node ./bench.mjs after 100 10 serial'
    1.58 ± 0.03 times faster than 'node ./bench.mjs alt-async 100 10 parallel'
    1.65 ± 0.04 times faster than 'node ./bench.mjs alt-async 100 10 serial'

I might have missed some low-hanging-fruit optimization, but this async AST traversing seems to be really expensive.

@fisker
Copy link
Member Author

fisker commented Aug 1, 2022

Maybe we can still use sync traverse, and add promises to embeds, then await them after traverse?

@thorn0
Copy link
Member

thorn0 commented Aug 1, 2022

IDK how to implement that properly. Consider this input:

function HelloWorld() {
        return html`
          <h3>Bar List</h3>
          ${bars.map(
            (bar) =>   html`
             <p >${bar}</p>
           `
          )}
        `;
      }

When embed for the outer html template literal is called, the promise for the inner one must be already resolved.

@fisker
Copy link
Member Author

fisker commented Aug 1, 2022

I guess we'll have to pass a different async print to embed , and use another internal embeds? I tried a little bit, but seems hard...

@thorn0
Copy link
Member

thorn0 commented Aug 1, 2022

Seams to work, didn't benchmark it yet: thorn0@6749a2f

@thorn0
Copy link
Member

thorn0 commented Aug 1, 2022

Not bad:

  'node ./bench.mjs before 100 10 serial' ran
    1.06 ± 0.05 times faster than 'node ./bench.mjs before 100 10 parallel'
    1.10 ± 0.03 times faster than 'node ./bench.mjs alt-async 100 10 serial'
    1.15 ± 0.03 times faster than 'node ./bench.mjs alt-async 100 10 parallel'
    1.24 ± 0.04 times faster than 'node ./bench.mjs after 100 10 parallel'
    1.31 ± 0.04 times faster than 'node ./bench.mjs after 100 10 serial'

It can be optimized by splitting printer.embed into migthBeEmbed (sync check) and embed (may be async, called only if mightBeEmbed returned true). This way we'll need to clone much fewer path stack arrays.

@fisker
Copy link
Member Author

fisker commented Aug 1, 2022

That's great!

@thorn0
Copy link
Member

thorn0 commented Aug 1, 2022

Results for the mentioned optimization (thorn0@41e905d):

  'node ./bench.mjs before 100 10 serial' ran
    1.04 ± 0.06 times faster than 'node ./bench.mjs before 100 10 parallel'
    1.07 ± 0.04 times faster than 'node ./bench.mjs alt-async 100 10 serial'
    1.16 ± 0.05 times faster than 'node ./bench.mjs alt-async 100 10 parallel'
    1.25 ± 0.05 times faster than 'node ./bench.mjs after 100 10 parallel'
    1.32 ± 0.04 times faster than 'node ./bench.mjs after 100 10 serial'

I wonder what's going with the "parallel" results.

@fisker
Copy link
Member Author

fisker commented Aug 1, 2022

Seems not much difference than #13158 (comment)

@thorn0
Copy link
Member

thorn0 commented Aug 2, 2022

Reused printer.massageAstNode.ignoredProperties, so that it doesn't recurse into properties like range (thorn0@1435322):

  'node ./bench.mjs before 100 10 serial' ran
    1.04 ± 0.03 times faster than 'node ./bench.mjs alt-async 100 10 serial'
    1.05 ± 0.05 times faster than 'node ./bench.mjs before 100 10 parallel'
    1.14 ± 0.05 times faster than 'node ./bench.mjs alt-async 100 10 parallel'
    1.25 ± 0.04 times faster than 'node ./bench.mjs after 100 10 parallel'
    1.32 ± 0.03 times faster than 'node ./bench.mjs after 100 10 serial'

Seems not much difference than

embed is sync in the JS parser most of the time. If it were defined as async function, the difference would be bigger.

@thorn0
Copy link
Member

thorn0 commented Aug 2, 2022

Okay, I think it's good enough. thorn0@77ca54c

  'node ./bench.mjs before 100 10 serial' ran
    1.03 ± 0.03 times faster than 'node ./bench.mjs alt-async 100 10 serial'
    1.07 ± 0.05 times faster than 'node ./bench.mjs before 100 10 parallel'
    1.11 ± 0.03 times faster than 'node ./bench.mjs alt-async 100 10 parallel'
    1.23 ± 0.03 times faster than 'node ./bench.mjs after 100 10 parallel'
    1.32 ± 0.03 times faster than 'node ./bench.mjs after 100 10 serial'

@thorn0
Copy link
Member

thorn0 commented Aug 2, 2022

@fisker @sosukesuzuki If you're okay with this solution (diff), how do we proceed? IMHO it doesn't make sense to open a PR against next as the revert is going to be really messy. I'd say it'd be better to start a new branch at 6c722c7 and open a PR against it. Once it's merged, we'll have to cherypick commits from next and relabel the new branch as next. WDYT?

@fisker
Copy link
Member Author

fisker commented Aug 2, 2022

I'm okay with your solution, there are already 67 commits after #13104. I prefer revert #13104

@fisker
Copy link
Member Author

fisker commented Aug 2, 2022

I'll try to revert it.

@thorn0
Copy link
Member

thorn0 commented Aug 2, 2022

Great. I tried and gave up.

@fisker
Copy link
Member Author

fisker commented Aug 2, 2022

#13210

@cspotcode
Copy link

Meta thought: does it make sense to rename this issue? If we do that, it will be more intuitive when it appears in notifications.

@thorn0 thorn0 changed the title Adopt swc parser? Performance of async printers (was: Adopt swc parser?) Aug 4, 2022
@thorn0 thorn0 added the type:perf Issue with performance of Prettier label Aug 4, 2022
@sosukesuzuki
Copy link
Member

@thorn0 @fisker Can we close this because #13211 has been merged to next?

@thorn0
Copy link
Member

thorn0 commented Aug 6, 2022

@sosukesuzuki Yes, let's close it. If someone wants to discuss Rust, please open a new issue.

@thorn0 thorn0 closed this as completed Aug 6, 2022
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 7, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status:needs discussion Issues needing discussion and a decision to be made before action can be taken type:meta Issues about the status of Prettier, or anything else about Prettier itself type:perf Issue with performance of Prettier
Projects
None yet
Development

No branches or pull requests

5 participants