Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: introduce parseAstAsync and parallelize parsing AST #5202

Merged
merged 10 commits into from Oct 31, 2023

Conversation

sapphi-red
Copy link
Contributor

This PR contains:

  • bugfix
  • feature
  • refactor
  • documentation
  • other

Are tests included?

  • yes (bugfixes and features will not be merged without tests)
  • no

Breaking Changes?

  • yes (breaking changes will not be merged unless absolutely necessary)
  • no

List any relevant issue numbers:

Description

This PR introduces a new function parseAstAsync that runs the parse in a different thread by using AsyncTask. There's a more simpler way to execute parallelly by using tokio, but because SWC uses rayon, a different parallelism library, I guess it's better to avoid using that.

Running the benchmark in #5182 (actually I used the esbuild repo instead), I got ~850ms / ~12% improvement on my laptop. Putting together with #5201, I got ~1100ms / ~16% improvement.

Benchmark results

rollup 3.29.4

# BUILD: 6079ms, 934 MB / 944 MB
## initialize: 1ms, 3.55 kB / 9.85 MB
## generate module graph: 2891ms, 765 MB / 775 MB
- plugin 0 (stdin) - resolveId: 16ms, -3.18 MB / 773 MB
- plugin 0 (stdin) - load: 7ms, 883 kB / 774 MB
generate ast: 632ms, 159 MB / 774 MB
analyze ast: 1382ms, 829 MB / 774 MB
## sort and bind modules: 253ms, 38.4 MB / 813 MB
## mark included statements: 2934ms, 131 MB / 944 MB
treeshaking pass 1: 1013ms, 102 MB / 917 MB
treeshaking pass 2: 457ms, 24 MB / 941 MB
treeshaking pass 3: 184ms, -5.31 MB / 936 MB
treeshaking pass 4: 178ms, 6.55 MB / 942 MB
treeshaking pass 5: 198ms, -115 kB / 942 MB
treeshaking pass 6: 169ms, 14.5 MB / 957 MB
treeshaking pass 7: 164ms, -5.48 MB / 951 MB
treeshaking pass 8: 152ms, 3.51 MB / 955 MB
treeshaking pass 9: 137ms, 1.66 MB / 956 MB
treeshaking pass 10: 137ms, 2.53 MB / 959 MB
treeshaking pass 11: 139ms, -15.3 MB / 944 MB
# GENERATE: 553ms, 107 MB / 1.05 GB
## initialize render: 0ms, 3.46 kB / 946 MB
## generate chunks: 47ms, 4.11 MB / 950 MB
optimize chunks: 1ms, 394 kB / 950 MB
## render chunks: 487ms, 83.1 MB / 1.03 GB
## transform chunks: 19ms, 19.7 MB / 1.05 GB
## generate bundle: 0ms, 11.7 kB / 1.05 GB
# WRITE: 31ms, 19.7 MB / 1.07 GB

rollup 4.0.2

# BUILD: 6656ms, 917 MB / 926 MB
## initialize: 0ms, 3.55 kB / 9.06 MB
## generate module graph: 3437ms, 747 MB / 756 MB
- plugin 0 (stdin) - resolveId: 13ms, 3.18 MB / 755 MB
- plugin 0 (stdin) - load: 8ms, 882 kB / 756 MB
generate ast: 1177ms, 137 MB / 756 MB
analyze ast: 1419ms, 721 MB / 756 MB
## sort and bind modules: 245ms, 38.3 MB / 794 MB
## mark included statements: 2973ms, 132 MB / 926 MB
treeshaking pass 1: 1123ms, 103 MB / 899 MB
treeshaking pass 2: 464ms, 24.5 MB / 924 MB
treeshaking pass 3: 175ms, -5.22 MB / 918 MB
treeshaking pass 4: 173ms, 6.72 MB / 925 MB
treeshaking pass 5: 196ms, -284 kB / 925 MB
treeshaking pass 6: 161ms, 14.4 MB / 939 MB
treeshaking pass 7: 150ms, -5.39 MB / 934 MB
treeshaking pass 8: 136ms, 3.57 MB / 937 MB
treeshaking pass 9: 119ms, 1.7 MB / 939 MB
treeshaking pass 10: 124ms, 2.52 MB / 942 MB
treeshaking pass 11: 147ms, -15.3 MB / 926 MB
# GENERATE: 573ms, 107 MB / 1.04 GB
## initialize render: 0ms, 3.46 kB / 928 MB
## generate chunks: 50ms, 3.55 MB / 932 MB
optimize chunks: 1ms, 428 kB / 933 MB
## render chunks: 503ms, 82.5 MB / 1.01 GB
## transform chunks: 19ms, 20.8 MB / 1.04 GB
## generate bundle: 0ms, 1.74 kB / 1.04 GB
# WRITE: 33ms, 19.5 MB / 1.05 GB

rollup 4.1.0

# BUILD: 6597ms, 847 MB / 855 MB
## initialize: 0ms, 3.55 kB / 8.51 MB
## generate module graph: 3451ms, 667 MB / 676 MB
- plugin 0 (stdin) - resolveId: 16ms, -6.69 MB / 677 MB
- plugin 0 (stdin) - load: 8ms, 882 kB / 677 MB
generate ast: 1208ms, 168 MB / 677 MB
analyze ast: 1404ms, 650 MB / 677 MB
## sort and bind modules: 246ms, 37.3 MB / 713 MB
## mark included statements: 2900ms, 142 MB / 855 MB
treeshaking pass 1: 1086ms, 100 MB / 815 MB
treeshaking pass 2: 474ms, 22.1 MB / 837 MB
treeshaking pass 3: 183ms, 10.3 MB / 847 MB
treeshaking pass 4: 171ms, 6.99 MB / 854 MB
treeshaking pass 5: 194ms, -688 kB / 853 MB
treeshaking pass 6: 160ms, -1.58 MB / 852 MB
treeshaking pass 7: 140ms, -5.54 MB / 846 MB
treeshaking pass 8: 137ms, 3.62 MB / 850 MB
treeshaking pass 9: 116ms, 1.5 MB / 851 MB
treeshaking pass 10: 119ms, 2.55 MB / 854 MB
treeshaking pass 11: 110ms, 1.15 MB / 855 MB
# GENERATE: 569ms, 101 MB / 958 MB
## initialize render: 0ms, 3.46 kB / 857 MB
## generate chunks: 49ms, 1.97 MB / 859 MB
optimize chunks: 1ms, 380 kB / 860 MB
## render chunks: 503ms, 72.7 MB / 932 MB
## transform chunks: 16ms, 26.2 MB / 958 MB
## generate bundle: 0ms, 1.74 kB / 958 MB
# WRITE: 33ms, 17.5 MB / 975 MB

parseAstAsync (~850ms / ~12% improvement)

# BUILD: 5754ms, 852 MB / 861 MB
## initialize: 0ms, 3.55 kB / 9.23 MB
## generate module graph: 2588ms, 672 MB / 681 MB
- plugin 0 (stdin) - resolveId: 13ms, 2.95 MB / 682 MB
- plugin 0 (stdin) - load: 7ms, 830 kB / 681 MB
generate ast: 9531ms, 4.58 GB / 682 MB
analyze ast: 1429ms, 704 MB / 682 MB
## sort and bind modules: 243ms, 37.3 MB / 719 MB
## mark included statements: 2923ms, 143 MB / 861 MB
treeshaking pass 1: 1095ms, 101 MB / 821 MB
treeshaking pass 2: 482ms, 22.6 MB / 844 MB
treeshaking pass 3: 175ms, 10.2 MB / 854 MB
treeshaking pass 4: 168ms, -9.41 MB / 844 MB
treeshaking pass 5: 188ms, 15.3 MB / 860 MB
treeshaking pass 6: 160ms, -1.65 MB / 858 MB
treeshaking pass 7: 138ms, -5.42 MB / 853 MB
treeshaking pass 8: 126ms, 3.47 MB / 856 MB
treeshaking pass 9: 124ms, 1.52 MB / 858 MB
treeshaking pass 10: 116ms, 2.52 MB / 860 MB
treeshaking pass 11: 144ms, 1.16 MB / 861 MB
# GENERATE: 583ms, 99.4 MB / 963 MB
## initialize render: 0ms, 3.46 kB / 863 MB
## generate chunks: 49ms, 1.65 MB / 865 MB
optimize chunks: 1ms, 389 kB / 866 MB
## render chunks: 517ms, 71.4 MB / 936 MB
## transform chunks: 16ms, 26.4 MB / 963 MB
## generate bundle: 0ms, 1.74 kB / 963 MB
# WRITE: 38ms, 17.4 MB / 980 MB

using mimalloc + parseAstAsync (~1100ms / ~16% improvement)

# BUILD: 5550ms, 846 MB / 855 MB
## initialize: 0ms, 3.55 kB / 9.25 MB
## generate module graph: 2444ms, 667 MB / 676 MB
- plugin 0 (stdin) - resolveId: 13ms, 3.17 MB / 678 MB
- plugin 0 (stdin) - load: 6ms, 885 kB / 678 MB
generate ast: 11716ms, 5.85 GB / 679 MB
analyze ast: 1366ms, 726 MB / 679 MB
## sort and bind modules: 235ms, 37.5 MB / 713 MB
## mark included statements: 2870ms, 142 MB / 855 MB
treeshaking pass 1: 1043ms, 101 MB / 815 MB
treeshaking pass 2: 483ms, 21.6 MB / 837 MB
treeshaking pass 3: 180ms, 10.3 MB / 847 MB
treeshaking pass 4: 168ms, 6.61 MB / 854 MB
treeshaking pass 5: 192ms, -642 kB / 853 MB
treeshaking pass 6: 165ms, -1.66 MB / 852 MB
treeshaking pass 7: 144ms, -5.52 MB / 846 MB
treeshaking pass 8: 136ms, 3.39 MB / 850 MB
treeshaking pass 9: 114ms, 1.47 MB / 851 MB
treeshaking pass 10: 120ms, 2.68 MB / 854 MB
treeshaking pass 11: 119ms, 1.16 MB / 855 MB
# GENERATE: 558ms, 101 MB / 958 MB
## initialize render: 0ms, 3.46 kB / 857 MB
## generate chunks: 49ms, 1.92 MB / 859 MB
optimize chunks: 1ms, 382 kB / 860 MB
## render chunks: 492ms, 73 MB / 932 MB
## transform chunks: 17ms, 26.2 MB / 958 MB
## generate bundle: 0ms, 1.74 kB / 958 MB
# WRITE: 31ms, 17.8 MB / 976 MB
Specs of my laptop
  • CPU: Intel Core-i7 1360P
  • Memory: DDR5-4800 32GB
  • OS: Windows 11

This change increases the binary size by 0.13MB (3.25MB -> 3.38MB).

@vercel
Copy link

vercel bot commented Oct 14, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
rollup ✅ Ready (Inspect) Visit Preview 💬 Add feedback Oct 31, 2023 5:46am

@codecov
Copy link

codecov bot commented Oct 14, 2023

Codecov Report

Merging #5202 (f284981) into master (5865fbd) will decrease coverage by 0.01%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #5202      +/-   ##
==========================================
- Coverage   98.82%   98.81%   -0.01%     
==========================================
  Files         231      231              
  Lines        8850     8861      +11     
  Branches     2315     2316       +1     
==========================================
+ Hits         8746     8756      +10     
- Misses         43       44       +1     
  Partials       61       61              
Files Coverage Δ
native.js 76.47% <100.00%> (+1.47%) ⬆️
src/Module.ts 99.61% <100.00%> (-0.20%) ⬇️
src/ModuleLoader.ts 99.59% <100.00%> (ø)
src/utils/logs.ts 97.66% <ø> (ø)
src/utils/parseAst.ts 100.00% <100.00%> (ø)

Copy link
Member

@lukastaegert lukastaegert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, that looks really good, especially the numbers. Still, I actually started working into another direction, which I would like to compare with this one first, which would be to move parallelizing out of Rust and instead use a worker in JS.
The advantage would be that this would also parallelize the WASM build, though that may only be relevant to some. It would be more overhead creating the worker(s), though, but considering the potential savings, I would start with a single persistent worker. I hope to finish it tomorrow or the day after, then we can compare.
By the way if we split parseAst and separately expose parseToBuffer and convertBufferToProgram, people could easily implement their own worker approach by putting the first function into a worker. With your approach, of course, everyone would get a parallelized build but there would be no limit to the number of threads.

@sapphi-red
Copy link
Contributor Author

Still, I actually started working into another direction, which I would like to compare with this one first, which would be to move parallelizing out of Rust and instead use a worker in JS.

Awesome!

With your approach, of course, everyone would get a parallelized build but there would be no limit to the number of threads.

I guess it is limited by the UV_THREADPOOL_SIZE value. AsyncTask seems to use libuv threads and I understand that libuv threads is limited by UV_THREADPOOL_SIZE.
(AsyncTask seems to use napi_create_async_work under the hood and that runs in the worker pool thread and worker pool thread seems to be the same with libuv thread)

@lukastaegert lukastaegert mentioned this pull request Oct 17, 2023
9 tasks
@lukastaegert
Copy link
Member

Ok, my worker attempt is at #5211 but I already see that at least the tests run MUCH slower with a worker (but I did not do much profiling yet). It seems that workers are much less light-weight than threads in Rust, which tempts me to parallelize in Rust instead as you implemented and accept that the WASM build will remain somewhat slower for now.

@lukastaegert
Copy link
Member

lukastaegert commented Oct 21, 2023

Ok, to be more precise, the worker approach is much slower for very small builds, as you would have in a test: A build of 40ms now takes 80ms for me. On the other hand, using the "ten times three.js benchmark", the worker approach is still considerably faster for me, around 10%. So it seems, there is just an initial overhead for workers that cannot be ignored.
Trying the same with the Rust parallelized build, I see no slow-down for the "small" build, on the contrary, while I see similar performance improvements for the large build. So for now, I would prefer your version.
We can still think if we want the worker for the WASM build, but it is much trickier to handle, as you need to make sure the worker is torn down properly in order for rollup to terminate gracefully.

@sapphi-red
Copy link
Contributor Author

We can still think if we want the worker for the WASM build, but it is much trickier to handle, as you need to make sure the worker is torn down properly in order for rollup to terminate gracefully.

Would a different interface make it easier to handle? This interface would work for Vite.

let workerRefCount = 0
let worker

const getWorker = () => {
	workerRefCount++
	return worker || new Worker('/path/to/parseWorker')
}
const stopWorker = () => {
	workerRefCount--
	if (workerRefCount === 0) {
		worker.terminate()
	}
}

export async function createAsyncParser() {
	const w = getWorker()
	const parseAsync = async (
		code: string,
		allowReturnOutsideFunction: boolean,
		_signal?: AbortSignal | undefined | null
	) => w.parse(code, allowReturnOutsideFunction);
	const stop = () => {
		stopWorker()
	}
	// warn for node if the process existed with non-zero exit code without calling `stop`?

	return { parse: parseAsync, stop }
}

@lukastaegert
Copy link
Member

The problem that Rollup is facing is that it does not fully know when it can terminate the worker. Usually, you can throw away the worker after the build phase. However, there are some edge cases where you still need to do parsing during generate phase. But we cannot know beforehand if and how many outputs will be generated. We could also tie it to the closeBundle hook, but then it is up to the user to trigger the hook. I think I would stick with the asynchronous Rust approach for now and revisit the topic once more code has been ported to Rust so that parallelisation has more of an impact.

@lukastaegert lukastaegert merged commit 49b57c2 into rollup:master Oct 31, 2023
26 of 27 checks passed
@github-actions
Copy link

This PR has been released as part of rollup@4.2.0. You can test it via npm install rollup.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants