perf: introduce `parseAstAsync` and parallelize parsing AST #5202

sapphi-red · 2023-10-14T16:12:33Z

This PR contains:

Are tests included?

yes (bugfixes and features will not be merged without tests)
no

Breaking Changes?

yes (breaking changes will not be merged unless absolutely necessary)
no

List any relevant issue numbers:

Description

This PR introduces a new function parseAstAsync that runs the parse in a different thread by using AsyncTask. There's a more simpler way to execute parallelly by using tokio, but because SWC uses rayon, a different parallelism library, I guess it's better to avoid using that.

Running the benchmark in #5182 (actually I used the esbuild repo instead), I got ~850ms / ~12% improvement on my laptop. Putting together with #5201, I got ~1100ms / ~16% improvement.

Benchmark results

rollup 3.29.4

# BUILD: 6079ms, 934 MB / 944 MB
## initialize: 1ms, 3.55 kB / 9.85 MB
## generate module graph: 2891ms, 765 MB / 775 MB
- plugin 0 (stdin) - resolveId: 16ms, -3.18 MB / 773 MB
- plugin 0 (stdin) - load: 7ms, 883 kB / 774 MB
generate ast: 632ms, 159 MB / 774 MB
analyze ast: 1382ms, 829 MB / 774 MB
## sort and bind modules: 253ms, 38.4 MB / 813 MB
## mark included statements: 2934ms, 131 MB / 944 MB
treeshaking pass 1: 1013ms, 102 MB / 917 MB
treeshaking pass 2: 457ms, 24 MB / 941 MB
treeshaking pass 3: 184ms, -5.31 MB / 936 MB
treeshaking pass 4: 178ms, 6.55 MB / 942 MB
treeshaking pass 5: 198ms, -115 kB / 942 MB
treeshaking pass 6: 169ms, 14.5 MB / 957 MB
treeshaking pass 7: 164ms, -5.48 MB / 951 MB
treeshaking pass 8: 152ms, 3.51 MB / 955 MB
treeshaking pass 9: 137ms, 1.66 MB / 956 MB
treeshaking pass 10: 137ms, 2.53 MB / 959 MB
treeshaking pass 11: 139ms, -15.3 MB / 944 MB
# GENERATE: 553ms, 107 MB / 1.05 GB
## initialize render: 0ms, 3.46 kB / 946 MB
## generate chunks: 47ms, 4.11 MB / 950 MB
optimize chunks: 1ms, 394 kB / 950 MB
## render chunks: 487ms, 83.1 MB / 1.03 GB
## transform chunks: 19ms, 19.7 MB / 1.05 GB
## generate bundle: 0ms, 11.7 kB / 1.05 GB
# WRITE: 31ms, 19.7 MB / 1.07 GB

rollup 4.0.2

# BUILD: 6656ms, 917 MB / 926 MB
## initialize: 0ms, 3.55 kB / 9.06 MB
## generate module graph: 3437ms, 747 MB / 756 MB
- plugin 0 (stdin) - resolveId: 13ms, 3.18 MB / 755 MB
- plugin 0 (stdin) - load: 8ms, 882 kB / 756 MB
generate ast: 1177ms, 137 MB / 756 MB
analyze ast: 1419ms, 721 MB / 756 MB
## sort and bind modules: 245ms, 38.3 MB / 794 MB
## mark included statements: 2973ms, 132 MB / 926 MB
treeshaking pass 1: 1123ms, 103 MB / 899 MB
treeshaking pass 2: 464ms, 24.5 MB / 924 MB
treeshaking pass 3: 175ms, -5.22 MB / 918 MB
treeshaking pass 4: 173ms, 6.72 MB / 925 MB
treeshaking pass 5: 196ms, -284 kB / 925 MB
treeshaking pass 6: 161ms, 14.4 MB / 939 MB
treeshaking pass 7: 150ms, -5.39 MB / 934 MB
treeshaking pass 8: 136ms, 3.57 MB / 937 MB
treeshaking pass 9: 119ms, 1.7 MB / 939 MB
treeshaking pass 10: 124ms, 2.52 MB / 942 MB
treeshaking pass 11: 147ms, -15.3 MB / 926 MB
# GENERATE: 573ms, 107 MB / 1.04 GB
## initialize render: 0ms, 3.46 kB / 928 MB
## generate chunks: 50ms, 3.55 MB / 932 MB
optimize chunks: 1ms, 428 kB / 933 MB
## render chunks: 503ms, 82.5 MB / 1.01 GB
## transform chunks: 19ms, 20.8 MB / 1.04 GB
## generate bundle: 0ms, 1.74 kB / 1.04 GB
# WRITE: 33ms, 19.5 MB / 1.05 GB

rollup 4.1.0

# BUILD: 6597ms, 847 MB / 855 MB
## initialize: 0ms, 3.55 kB / 8.51 MB
## generate module graph: 3451ms, 667 MB / 676 MB
- plugin 0 (stdin) - resolveId: 16ms, -6.69 MB / 677 MB
- plugin 0 (stdin) - load: 8ms, 882 kB / 677 MB
generate ast: 1208ms, 168 MB / 677 MB
analyze ast: 1404ms, 650 MB / 677 MB
## sort and bind modules: 246ms, 37.3 MB / 713 MB
## mark included statements: 2900ms, 142 MB / 855 MB
treeshaking pass 1: 1086ms, 100 MB / 815 MB
treeshaking pass 2: 474ms, 22.1 MB / 837 MB
treeshaking pass 3: 183ms, 10.3 MB / 847 MB
treeshaking pass 4: 171ms, 6.99 MB / 854 MB
treeshaking pass 5: 194ms, -688 kB / 853 MB
treeshaking pass 6: 160ms, -1.58 MB / 852 MB
treeshaking pass 7: 140ms, -5.54 MB / 846 MB
treeshaking pass 8: 137ms, 3.62 MB / 850 MB
treeshaking pass 9: 116ms, 1.5 MB / 851 MB
treeshaking pass 10: 119ms, 2.55 MB / 854 MB
treeshaking pass 11: 110ms, 1.15 MB / 855 MB
# GENERATE: 569ms, 101 MB / 958 MB
## initialize render: 0ms, 3.46 kB / 857 MB
## generate chunks: 49ms, 1.97 MB / 859 MB
optimize chunks: 1ms, 380 kB / 860 MB
## render chunks: 503ms, 72.7 MB / 932 MB
## transform chunks: 16ms, 26.2 MB / 958 MB
## generate bundle: 0ms, 1.74 kB / 958 MB
# WRITE: 33ms, 17.5 MB / 975 MB

`parseAstAsync` (~850ms / ~12% improvement)

# BUILD: 5754ms, 852 MB / 861 MB
## initialize: 0ms, 3.55 kB / 9.23 MB
## generate module graph: 2588ms, 672 MB / 681 MB
- plugin 0 (stdin) - resolveId: 13ms, 2.95 MB / 682 MB
- plugin 0 (stdin) - load: 7ms, 830 kB / 681 MB
generate ast: 9531ms, 4.58 GB / 682 MB
analyze ast: 1429ms, 704 MB / 682 MB
## sort and bind modules: 243ms, 37.3 MB / 719 MB
## mark included statements: 2923ms, 143 MB / 861 MB
treeshaking pass 1: 1095ms, 101 MB / 821 MB
treeshaking pass 2: 482ms, 22.6 MB / 844 MB
treeshaking pass 3: 175ms, 10.2 MB / 854 MB
treeshaking pass 4: 168ms, -9.41 MB / 844 MB
treeshaking pass 5: 188ms, 15.3 MB / 860 MB
treeshaking pass 6: 160ms, -1.65 MB / 858 MB
treeshaking pass 7: 138ms, -5.42 MB / 853 MB
treeshaking pass 8: 126ms, 3.47 MB / 856 MB
treeshaking pass 9: 124ms, 1.52 MB / 858 MB
treeshaking pass 10: 116ms, 2.52 MB / 860 MB
treeshaking pass 11: 144ms, 1.16 MB / 861 MB
# GENERATE: 583ms, 99.4 MB / 963 MB
## initialize render: 0ms, 3.46 kB / 863 MB
## generate chunks: 49ms, 1.65 MB / 865 MB
optimize chunks: 1ms, 389 kB / 866 MB
## render chunks: 517ms, 71.4 MB / 936 MB
## transform chunks: 16ms, 26.4 MB / 963 MB
## generate bundle: 0ms, 1.74 kB / 963 MB
# WRITE: 38ms, 17.4 MB / 980 MB

using mimalloc + `parseAstAsync` (~1100ms / ~16% improvement)

# BUILD: 5550ms, 846 MB / 855 MB
## initialize: 0ms, 3.55 kB / 9.25 MB
## generate module graph: 2444ms, 667 MB / 676 MB
- plugin 0 (stdin) - resolveId: 13ms, 3.17 MB / 678 MB
- plugin 0 (stdin) - load: 6ms, 885 kB / 678 MB
generate ast: 11716ms, 5.85 GB / 679 MB
analyze ast: 1366ms, 726 MB / 679 MB
## sort and bind modules: 235ms, 37.5 MB / 713 MB
## mark included statements: 2870ms, 142 MB / 855 MB
treeshaking pass 1: 1043ms, 101 MB / 815 MB
treeshaking pass 2: 483ms, 21.6 MB / 837 MB
treeshaking pass 3: 180ms, 10.3 MB / 847 MB
treeshaking pass 4: 168ms, 6.61 MB / 854 MB
treeshaking pass 5: 192ms, -642 kB / 853 MB
treeshaking pass 6: 165ms, -1.66 MB / 852 MB
treeshaking pass 7: 144ms, -5.52 MB / 846 MB
treeshaking pass 8: 136ms, 3.39 MB / 850 MB
treeshaking pass 9: 114ms, 1.47 MB / 851 MB
treeshaking pass 10: 120ms, 2.68 MB / 854 MB
treeshaking pass 11: 119ms, 1.16 MB / 855 MB
# GENERATE: 558ms, 101 MB / 958 MB
## initialize render: 0ms, 3.46 kB / 857 MB
## generate chunks: 49ms, 1.92 MB / 859 MB
optimize chunks: 1ms, 382 kB / 860 MB
## render chunks: 492ms, 73 MB / 932 MB
## transform chunks: 17ms, 26.2 MB / 958 MB
## generate bundle: 0ms, 1.74 kB / 958 MB
# WRITE: 31ms, 17.8 MB / 976 MB

Specs of my laptop

CPU: Intel Core-i7 1360P
Memory: DDR5-4800 32GB
OS: Windows 11

This change increases the binary size by 0.13MB (3.25MB -> 3.38MB).

vercel · 2023-10-14T16:12:38Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
rollup	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Oct 31, 2023 5:46am

codecov · 2023-10-14T16:23:13Z

Codecov Report

Merging #5202 (f284981) into master (5865fbd) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #5202      +/-   ##
==========================================
- Coverage   98.82%   98.81%   -0.01%     
==========================================
  Files         231      231              
  Lines        8850     8861      +11     
  Branches     2315     2316       +1     
==========================================
+ Hits         8746     8756      +10     
- Misses         43       44       +1     
  Partials       61       61

Files	Coverage Δ
native.js	`76.47% <100.00%> (+1.47%)`	⬆️
src/Module.ts	`99.61% <100.00%> (-0.20%)`	⬇️
src/ModuleLoader.ts	`99.59% <100.00%> (ø)`
src/utils/logs.ts	`97.66% <ø> (ø)`
src/utils/parseAst.ts	`100.00% <100.00%> (ø)`

lukastaegert

Thank you, that looks really good, especially the numbers. Still, I actually started working into another direction, which I would like to compare with this one first, which would be to move parallelizing out of Rust and instead use a worker in JS.
The advantage would be that this would also parallelize the WASM build, though that may only be relevant to some. It would be more overhead creating the worker(s), though, but considering the potential savings, I would start with a single persistent worker. I hope to finish it tomorrow or the day after, then we can compare.
By the way if we split parseAst and separately expose parseToBuffer and convertBufferToProgram, people could easily implement their own worker approach by putting the first function into a worker. With your approach, of course, everyone would get a parallelized build but there would be no limit to the number of threads.

sapphi-red · 2023-10-15T07:55:09Z

Still, I actually started working into another direction, which I would like to compare with this one first, which would be to move parallelizing out of Rust and instead use a worker in JS.

Awesome!

With your approach, of course, everyone would get a parallelized build but there would be no limit to the number of threads.

I guess it is limited by the UV_THREADPOOL_SIZE value. AsyncTask seems to use libuv threads and I understand that libuv threads is limited by UV_THREADPOOL_SIZE.
(AsyncTask seems to use napi_create_async_work under the hood and that runs in the worker pool thread and worker pool thread seems to be the same with libuv thread)

lukastaegert · 2023-10-20T17:22:27Z

Ok, my worker attempt is at #5211 but I already see that at least the tests run MUCH slower with a worker (but I did not do much profiling yet). It seems that workers are much less light-weight than threads in Rust, which tempts me to parallelize in Rust instead as you implemented and accept that the WASM build will remain somewhat slower for now.

lukastaegert · 2023-10-21T05:22:21Z

Ok, to be more precise, the worker approach is much slower for very small builds, as you would have in a test: A build of 40ms now takes 80ms for me. On the other hand, using the "ten times three.js benchmark", the worker approach is still considerably faster for me, around 10%. So it seems, there is just an initial overhead for workers that cannot be ignored.
Trying the same with the Rust parallelized build, I see no slow-down for the "small" build, on the contrary, while I see similar performance improvements for the large build. So for now, I would prefer your version.
We can still think if we want the worker for the WASM build, but it is much trickier to handle, as you need to make sure the worker is torn down properly in order for rollup to terminate gracefully.

sapphi-red · 2023-10-23T05:01:06Z

We can still think if we want the worker for the WASM build, but it is much trickier to handle, as you need to make sure the worker is torn down properly in order for rollup to terminate gracefully.

Would a different interface make it easier to handle? This interface would work for Vite.

let workerRefCount = 0
let worker

const getWorker = () => {
	workerRefCount++
	return worker || new Worker('/path/to/parseWorker')
}
const stopWorker = () => {
	workerRefCount--
	if (workerRefCount === 0) {
		worker.terminate()
	}
}

export async function createAsyncParser() {
	const w = getWorker()
	const parseAsync = async (
		code: string,
		allowReturnOutsideFunction: boolean,
		_signal?: AbortSignal | undefined | null
	) => w.parse(code, allowReturnOutsideFunction);
	const stop = () => {
		stopWorker()
	}
	// warn for node if the process existed with non-zero exit code without calling `stop`?

	return { parse: parseAsync, stop }
}

lukastaegert · 2023-10-24T06:12:44Z

The problem that Rollup is facing is that it does not fully know when it can terminate the worker. Usually, you can throw away the worker after the build phase. However, there are some edge cases where you still need to do parsing during generate phase. But we cannot know beforehand if and how many outputs will be generated. We could also tie it to the closeBundle hook, but then it is up to the user to trigger the hook. I think I would stick with the asynchronous Rust approach for now and revisit the topic once more code has been ported to Rust so that parallelisation has more of an impact.

github-actions · 2023-10-31T08:11:14Z

This PR has been released as part of rollup@4.2.0. You can test it via npm install rollup.

perf: introduce parseAstAsync and parallelize parsing ast

09f5fe5

vercel bot deployed to Preview October 14, 2023 16:13 View deployment

chore: fix lint

0c43e1f

vercel bot deployed to Preview October 14, 2023 16:40 View deployment

chore: fix type

5c12359

vercel bot deployed to Preview October 14, 2023 16:52 View deployment

lukastaegert reviewed Oct 15, 2023

View reviewed changes

lukastaegert mentioned this pull request Oct 17, 2023

Parse AST in worker #5211

Draft

9 tasks

chore: merge main

18f5f74

vercel bot deployed to Preview October 19, 2023 08:15 View deployment

lukastaegert and others added 2 commits October 21, 2023 13:45

Test that errors in this.parse are handled

11a6b76

Merge branch 'master' into async-task

a6d10f1

vercel bot deployed to Preview October 21, 2023 11:49 View deployment

Merge branch 'master' into async-task

a65eeb3

vercel bot deployed to Preview October 29, 2023 05:28 View deployment

Merge branch 'master' into async-task

be49366

lukastaegert approved these changes Oct 31, 2023

View reviewed changes

vercel bot deployed to Preview October 31, 2023 05:27 View deployment

lukastaegert and others added 2 commits October 31, 2023 06:44

Slightly improve docs

4407b81

Merge branch 'master' into async-task

f284981

vercel bot deployed to Preview October 31, 2023 05:46 View deployment

lukastaegert merged commit 49b57c2 into rollup:master Oct 31, 2023
26 of 27 checks passed

sapphi-red mentioned this pull request Oct 31, 2023

feat: upgrade rollup to 4.2.0 and use parseAstAsync vitejs/vite#14821

Merged

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: introduce `parseAstAsync` and parallelize parsing AST #5202

perf: introduce `parseAstAsync` and parallelize parsing AST #5202

sapphi-red commented Oct 14, 2023

vercel bot commented Oct 14, 2023 •

edited

codecov bot commented Oct 14, 2023 •

edited

lukastaegert left a comment

sapphi-red commented Oct 15, 2023

lukastaegert commented Oct 20, 2023

lukastaegert commented Oct 21, 2023 •

edited

sapphi-red commented Oct 23, 2023

lukastaegert commented Oct 24, 2023

github-actions bot commented Oct 31, 2023

perf: introduce parseAstAsync and parallelize parsing AST #5202

perf: introduce parseAstAsync and parallelize parsing AST #5202

Conversation

sapphi-red commented Oct 14, 2023

Description

rollup 3.29.4

rollup 4.0.2

rollup 4.1.0

parseAstAsync (~850ms / ~12% improvement)

using mimalloc + parseAstAsync (~1100ms / ~16% improvement)

vercel bot commented Oct 14, 2023 • edited

codecov bot commented Oct 14, 2023 • edited

Codecov Report

lukastaegert left a comment

Choose a reason for hiding this comment

sapphi-red commented Oct 15, 2023

lukastaegert commented Oct 20, 2023

lukastaegert commented Oct 21, 2023 • edited

sapphi-red commented Oct 23, 2023

lukastaegert commented Oct 24, 2023

github-actions bot commented Oct 31, 2023

perf: introduce `parseAstAsync` and parallelize parsing AST #5202

perf: introduce `parseAstAsync` and parallelize parsing AST #5202

`parseAstAsync` (~850ms / ~12% improvement)

using mimalloc + `parseAstAsync` (~1100ms / ~16% improvement)

vercel bot commented Oct 14, 2023 •

edited

codecov bot commented Oct 14, 2023 •

edited

lukastaegert commented Oct 21, 2023 •

edited