New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Export the "raw" toUnicode-data from PartialEvaluator.preEvaluateFont
#13354
Export the "raw" toUnicode-data from PartialEvaluator.preEvaluateFont
#13354
Conversation
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/b130e0f9786275a/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://3.101.106.178:8877/92dbee648649bb4/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/b130e0f9786275a/output.txt Total script time: 25.87 mins
Image differences available at: http://54.67.70.0:8877/b130e0f9786275a/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://3.101.106.178:8877/92dbee648649bb4/output.txt Total script time: 28.97 mins
Image differences available at: http://3.101.106.178:8877/92dbee648649bb4/reftest-analyzer.html#web=eq.log |
94df095
to
7435cb9
Compare
…uateFont` Rather than re-fetching/re-parsing these properties immediately in `PartialEvaluator.translateFont`, we can simply export them instead. (Obviously the effect will be really tiny, but there is less parsing overall this way.)
…ont` Compared to other data-structures, such as e.g. `Dict`s, we're purposely *not* caching Streams on the `XRef`-instance.[1] The, somewhat unfortunate, effect of Streams not being cached is that repeatedly getting the *same* Stream-data requires re-parsing/re-initializing of a bunch of data; see `XRef.fetch` and related methods. For the font-parsing in particular we're currently fetching the `toUnicode`-data, which is very often a Stream, in `PartialEvaluator.preEvaluateFont` and then *again* in `PartialEvaluator.extractDataStructures` soon afterwards. By instead letting `PartialEvaluator.preEvaluateFont` export the "raw" `toUnicode`-data, we can avoid *some* unnecessary re-parsing/re-initializing when handling fonts. *Please note:* In this particular case, given that `PartialEvaluator.preEvaluateFont` only accesses the "raw" `toUnicode` data, exporting a Stream should be safe. --- [1] The reasons for this include: - Streams, especially `DecodeStream`-instances, can become *very* large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so. - Attempting to read from the *same* Stream-instance more than once won't work, unless it's `reset` in between, since using any method such as e.g. `getBytes` always starts at the current data position. - Given that parsing, even in the worker-thread, is now fairly asynchronous it's generally impossible to assert that any one Stream-instance isn't being accessed "concurrently" by e.g. different `getOperatorList` calls. Hence `reset`-ing a cached Stream-instance isn't going to work in the general case.
7435cb9
to
6eef69d
Compare
From: Bot.io (Linux m4)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://54.67.70.0:8877/24d2852924f5720/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://3.101.106.178:8877/04ec3280681c6fc/output.txt |
From: Bot.io (Linux m4)FailedFull output at http://54.67.70.0:8877/24d2852924f5720/output.txt Total script time: 25.75 mins
Image differences available at: http://54.67.70.0:8877/24d2852924f5720/reftest-analyzer.html#web=eq.log |
From: Bot.io (Windows)FailedFull output at http://3.101.106.178:8877/04ec3280681c6fc/output.txt Total script time: 28.82 mins
Image differences available at: http://3.101.106.178:8877/04ec3280681c6fc/reftest-analyzer.html#web=eq.log |
Looks good to me; thanks! |
Compared to other data-structures, such as e.g.
Dict
s, we're purposely not caching Streams on theXRef
-instance.[1]The, somewhat unfortunate, effect of Streams not being cached is that repeatedly getting the same Stream-data requires re-parsing/re-initializing of a bunch of data; see
XRef.fetch
and related methods.For the font-parsing in particular we're currently fetching the
toUnicode
-data, which is very often a Stream, inPartialEvaluator.preEvaluateFont
and then again inPartialEvaluator.extractDataStructures
soon afterwards.By instead letting
PartialEvaluator.preEvaluateFont
export the "raw"toUnicode
-data, we can avoid some unnecessary re-parsing/re-initializing when handling fonts.Please note: In this particular case, given that
PartialEvaluator.preEvaluateFont
only accesses the "raw"toUnicode
data, exporting a Stream should be safe.[1] The reasons for this include:
Streams, especially
DecodeStream
-instances, can become very large once read. Hence caching them really isn't a good idea simply because of the (potential) memory impact of doing so.Attempting to read from the same Stream-instance more than once won't work, unless it's
reset
in between, since using any method such as e.g.getBytes
always starts at the current data position.Given that parsing, even in the worker-thread, is now fairly asynchronous it's generally impossible to assert that any one Stream-instance isn't being accessed "concurrently" by e.g. different
getOperatorList
calls. Hencereset
-ing a cached Stream-instance isn't going to work in the general case.