Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transfer, rather than copy, CMap data to the worker-thread #11118

Merged

Conversation

Snuffleupagus
Copy link
Collaborator

It recently occurred to me that the CMap data should be an excellent candidate for transfering.
This will help reduce peak memory usage for PDF documents using CMaps, since transfering of data avoids duplicating it on both the main- and worker-threads.

Unfortunately it's not possible to actually transfer data when returning data through sendWithPromise, and another solution had to be used.
Initially I looked at using one message for requesting the data, and another message for returning the actual CMap data. While that should have worked, it would have meant adding a lot more complexity particularly on the worker-thread.
Hence the simplest solution, at least in my opinion, is to utilize sendWithStream since that makes it really easy to transfer the CMap data. (This required PR #11115 to land first, since otherwise CMap fetch errors won't propagate correctly to the worker-thread.)

Please note that the patch purposely only changes the API to Worker communication, and not the API itself since changing the interface of CMapReaderFactory would be a breaking change.
Furthermore, given the relatively small size of the .bcmap files (the largest one is smaller than the default range-request size) streaming doesn't really seem necessary either.

It recently occurred to me that the CMap data should be an excellent candidate for transfering.
This will help reduce peak memory usage for PDF documents using CMaps, since transfering of data avoids duplicating it on both the main- and worker-threads.

Unfortunately it's not possible to actually transfer data when *returning* data through `sendWithPromise`, and another solution had to be used.
Initially I looked at using one message for requesting the data, and another message for returning the actual CMap data. While that should have worked, it would have meant adding a lot more complexity particularly on the worker-thread.
Hence the simplest solution, at least in my opinion, is to utilize `sendWithStream` since that makes it *really* easy to transfer the CMap data. (This required PR 11115 to land first, since otherwise CMap fetch errors won't propagate correctly to the worker-thread.)

Please note that the patch *purposely* only changes the API to Worker communication, and not the API *itself* since changing the interface of `CMapReaderFactory` would be a breaking change.
Furthermore, given the relatively small size of the `.bcmap` files (the largest one is smaller than the default range-request size) streaming doesn't really seem necessary either.
@Snuffleupagus
Copy link
Collaborator Author

/botio test

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2019

From: Bot.io (Linux m4)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/f5a393c4d79b04f/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2019

From: Bot.io (Windows)


Received

Command cmd_test from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/3e13d34d39106f0/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2019

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/f5a393c4d79b04f/output.txt

Total script time: 17.61 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@pdfjsbot
Copy link

pdfjsbot commented Sep 4, 2019

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/3e13d34d39106f0/output.txt

Total script time: 25.97 mins

  • Font tests: Passed
  • Unit tests: Passed
  • Regression tests: Passed

@timvandermeij
Copy link
Contributor

I think this is a really good idea. Do you perhaps have measurements of the peak memory usage before/after this patch, or is that difficult to do?

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Sep 5, 2019

Do you perhaps have measurements of the peak memory usage before/after this patch, or is that difficult to do?

I'd expect that the amount of allocated memory is reduced by (approximately) the total size of the loaded CMap files, with e.g. mao.pdf from the test-suite we're loading approx. 40 kB of CMap data.
When looking at about:memory it appears that the size of the ArrayBuffer allocations are reduced by the expected amount, but please keep in mind that I'm really not very knowledgeable about these matters (i.e. measuring detailed memory usage).

@timvandermeij timvandermeij merged commit 37d5b80 into mozilla:master Sep 6, 2019
@timvandermeij
Copy link
Contributor

Nice work; thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants