-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid unnecessary re-reading of streams #4405
Avoid unnecessary re-reading of streams #4405
Conversation
|
||
DecodeStream.call(this); | ||
} | ||
|
||
JpxStream.prototype = Object.create(DecodeStream.prototype); | ||
|
||
JpxStream.prototype.__defineGetter__('bytes', function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use:
Object.defineProperty(JpxStream.prototype, 'bytes', {
get: function JpxStream_bytes() {
var bytes = this.stream.getBytes(this.length);
return shadow(this, 'bytes', bytes);
},
configurable: true
});
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I understand correctly, that creates a getter called 'bytes', and the first time it's called the getter will be replaced with a read-only property? Is there an advantage doing it that way compared to the way I've done it?
I changed how the getters are defined. |
/botio test |
From: Bot.io (Linux)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://107.21.233.14:8877/1aaf341b354758c/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://107.22.172.223:8877/39942240785f43e/output.txt |
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/1aaf341b354758c/output.txt Total script time: 26.31 mins
|
From: Bot.io (Windows)FailedFull output at http://107.22.172.223:8877/39942240785f43e/output.txt Total script time: 36.65 mins
Image differences available at: http://107.22.172.223:8877/39942240785f43e/reftest-analyzer.html#web=eq.log |
|
||
var b; | ||
while (codeSize < bits) { | ||
if (typeof (b = bytes[bytesPos++]) == 'undefined') | ||
if (typeof (b = str.getByte()) === 'undefined') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getByte() returns -1 if byte after eof is requested
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yes. I'll fix, but I probably won't get to it until early next week.
I guess this means there aren't any tests with invalid FlateDecode streams in them :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that's just mean we have tests only for valid/complete DEFLATE streams -- these have valid EOF marked.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right :) the same you just said, sorry
/botio-linux preview |
From: Bot.io (Linux)ReceivedCommand cmd_preview from @timvandermeij received. Current queue size: 0 Live output at: http://107.21.233.14:8877/b4d80886a9a7f03/output.txt |
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/b4d80886a9a7f03/output.txt Total script time: 0.35 mins Published
|
Awesome! Major difference indeed. |
|
||
DecodeStream.call(this); | ||
} | ||
|
||
JpegStream.prototype = Object.create(DecodeStream.prototype); | ||
|
||
Object.defineProperty(JpegStream.prototype, 'bytes', { | ||
get: function JpegStream_bytes() { | ||
return shadow(this, 'bytes', this.stream.getBytes(this.length)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add shadow
to the list of globals to fix the lint errors.
I fixed the eof/-1 confusion, and tested the new code worked by deliberately corrupting a FlateStream and checking that the error message was the same. I also fixed the jshint warning. |
Beside the very nice performance improvement and memory reduction that this patch provides, it might also address #4084 at the same time. /botio test |
From: Bot.io (Linux)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://107.21.233.14:8877/28348dd1476303a/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://107.22.172.223:8877/ebc37f1ad12b97a/output.txt |
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/28348dd1476303a/output.txt Total script time: 26.35 mins
|
From: Bot.io (Windows)FailedFull output at http://107.22.172.223:8877/ebc37f1ad12b97a/output.txt Total script time: 36.60 mins
Image differences available at: http://107.22.172.223:8877/ebc37f1ad12b97a/reftest-analyzer.html#web=eq.log |
@@ -201,6 +201,7 @@ var ChunkedStream = (function ChunkedStreamClosure() { | |||
} | |||
return missingChunks; | |||
}; | |||
this.ensureRange(start, start + length); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer a solution that does not use ensureRange. EnsureRange throws an exception when any pieces are missing so any function that calls makeSubstream must now handle this exception. Would it be enough to just fire of a request for all the missing chunks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think removing the ensureRange() call is a bad idea. For one, the subsequent patch isn't valid -- it cause the sewer PDF to hang.
The whole point of the patch is to avoid doing any work that might subsequently be thrown away. Your solution would be a partial fix -- depending on the load speed, those requests might be fulfilled by the time we're parsing. But they might not, so you could still end up doing wasted work.
There aren't that many calls to makeSubStream() anyway.
By checking if the data is all present before making a substream, we avoid cases where we parse part of a stream and then throw a MissingDataException part-way through, which forces us to later re-read the stream -- possibly multiple times. This is a sizeable performance win for some cases when file loading is slow (e.g. over the web).
This avoids lots of unnecessary work when such streams are referred to via fetch(), and so their bytes aren't subsequently read. This is a large performance win on some files.
I moved the ensureRange() call earlier in the function, per bdahl's suggestion on IRC. |
/botio test |
From: Bot.io (Linux)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://107.21.233.14:8877/f5a835cafb19526/output.txt |
From: Bot.io (Windows)ReceivedCommand cmd_test from @Snuffleupagus received. Current queue size: 0 Live output at: http://107.22.172.223:8877/ec2e68d99362d5d/output.txt |
From: Bot.io (Linux)SuccessFull output at http://107.21.233.14:8877/f5a835cafb19526/output.txt Total script time: 26.34 mins
|
From: Bot.io (Windows)FailedFull output at http://107.22.172.223:8877/ec2e68d99362d5d/output.txt Total script time: 36.54 mins
Image differences available at: http://107.22.172.223:8877/ec2e68d99362d5d/reftest-analyzer.html#web=eq.log |
Avoid unnecessary re-reading of streams
The first two patches in this pull request reduce the load time of http://www.minotnd.org/pdf/engineering/sanitary%20sewer/24x36_ntrunk.pdf (when loaded through a local web server rather than directly from file), from ~28 seconds to ~6 seconds, and reduce peak memory consumption from ~405 MiB to ~355 MiB.
The third patch doesn't have any effect on the load time/memory consumption of the files I tested, but it's certainly conceivable that there are files for which it would help. Getting all of a stream's bytes up-front is an anti-pattern that's worth avoiding in general.