-
Notifications
You must be signed in to change notification settings - Fork 9.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transfer, rather than copy, CMap data to the worker-thread #11118
Transfer, rather than copy, CMap data to the worker-thread #11118
Conversation
It recently occurred to me that the CMap data should be an excellent candidate for transfering. This will help reduce peak memory usage for PDF documents using CMaps, since transfering of data avoids duplicating it on both the main- and worker-threads. Unfortunately it's not possible to actually transfer data when *returning* data through `sendWithPromise`, and another solution had to be used. Initially I looked at using one message for requesting the data, and another message for returning the actual CMap data. While that should have worked, it would have meant adding a lot more complexity particularly on the worker-thread. Hence the simplest solution, at least in my opinion, is to utilize `sendWithStream` since that makes it *really* easy to transfer the CMap data. (This required PR 11115 to land first, since otherwise CMap fetch errors won't propagate correctly to the worker-thread.) Please note that the patch *purposely* only changes the API to Worker communication, and not the API *itself* since changing the interface of `CMapReaderFactory` would be a breaking change. Furthermore, given the relatively small size of the `.bcmap` files (the largest one is smaller than the default range-request size) streaming doesn't really seem necessary either.
/botio test |
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/f5a393c4d79b04f/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.215.176.217:8877/3e13d34d39106f0/output.txt |
From: Bot.io (Linux m4)SuccessFull output at http://54.67.70.0:8877/f5a393c4d79b04f/output.txt Total script time: 17.61 mins
|
From: Bot.io (Windows)SuccessFull output at http://54.215.176.217:8877/3e13d34d39106f0/output.txt Total script time: 25.97 mins
|
I think this is a really good idea. Do you perhaps have measurements of the peak memory usage before/after this patch, or is that difficult to do? |
I'd expect that the amount of allocated memory is reduced by (approximately) the total size of the loaded CMap files, with e.g. |
Nice work; thanks! |
It recently occurred to me that the CMap data should be an excellent candidate for transfering.
This will help reduce peak memory usage for PDF documents using CMaps, since transfering of data avoids duplicating it on both the main- and worker-threads.
Unfortunately it's not possible to actually transfer data when returning data through
sendWithPromise
, and another solution had to be used.Initially I looked at using one message for requesting the data, and another message for returning the actual CMap data. While that should have worked, it would have meant adding a lot more complexity particularly on the worker-thread.
Hence the simplest solution, at least in my opinion, is to utilize
sendWithStream
since that makes it really easy to transfer the CMap data. (This required PR #11115 to land first, since otherwise CMap fetch errors won't propagate correctly to the worker-thread.)Please note that the patch purposely only changes the API to Worker communication, and not the API itself since changing the interface of
CMapReaderFactory
would be a breaking change.Furthermore, given the relatively small size of the
.bcmap
files (the largest one is smaller than the default range-request size) streaming doesn't really seem necessary either.