Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Firefox regression] Fix disableRange=true bug in PDFDataTransportStream #10675

Conversation

Snuffleupagus
Copy link
Collaborator

@Snuffleupagus Snuffleupagus commented Mar 24, 2019

Currently if trying to set disableRange=true in the built-in PDF Viewer in Firefox, either through about:config or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing.

Obviously it's not a good idea to set disableRange=true, however it seems that this bug affects the PDF Viewer in Firefox even with default settings:

  • In the case where initialData already contains the entire file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete.
  • (In the case where the data arrives, via streaming, before being specifically requested through requestDataRange, we're also forced to re-fetch data unnecessarily.) This part was removed, to reduce the scope/risk of the patch somewhat.

In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).

/cc @yurydelendik Could you please, time permitting of course, check that this patch makes sense?

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Mar 24, 2019

Apparently I'm not really understanding the code well enough to fix this, since one of the existing unit-tests started failing now :-(

Edit: I've also tested setting this._done = true; only in the _queuedChunks case, but while it fixes the test failure it does nothing for the bug which prompted this patch.

pdf.js/test/unit/api_spec.js

Lines 1496 to 1527 in 9b5a937

it('should fetch document info and page using range and streaming',
function(done) {
const initialDataLength = 4000;
let fetches = 0, loadingTask;
dataPromise.then(function(data) {
const initialData = data.subarray(0, initialDataLength);
const transport = new PDFDataRangeTransport(data.length, initialData);
transport.requestDataRange = function(begin, end) {
fetches++;
if (fetches === 1) {
// Send rest of the data on first range request.
transport.onDataProgressiveRead(data.subarray(initialDataLength));
}
waitSome(function() {
transport.onDataRange(begin, data.subarray(begin, end));
});
};
loadingTask = getDocument(transport);
return loadingTask.promise;
}).then(function(pdfDocument) {
expect(pdfDocument.numPages).toEqual(14);
return pdfDocument.getPage(10);
}).then(function(pdfPage) {
expect(pdfPage.rotate).toEqual(0);
expect(fetches).toEqual(1);
waitSome(function() {
loadingTask.destroy().then(done);
});
}).catch(done.fail);

@Snuffleupagus Snuffleupagus force-pushed the PDFDataTransportStream-disableRange branch 2 times, most recently from 080ca42 to 89da03f Compare March 24, 2019 15:50
Copy link
Contributor

@yurydelendik yurydelendik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really good. Thank you for the patch.

@yurydelendik
Copy link
Contributor

Actually let me think more about this PR. As I recall, we cannot trust _contentLength.

@yurydelendik
Copy link
Contributor

I spy

if (end <= this.progressiveDataLength) {
. It means if file is partially loaded, the range request is ignored. I wonder if we can do the same at the display layer, in case if it is just a async timing issue between data arriving to core layer and display one.

@yurydelendik yurydelendik self-requested a review March 25, 2019 12:57
@Snuffleupagus
Copy link
Collaborator Author

Actually let me think more about this PR. As I recall, we cannot trust _contentLength.

Excellent point about contentLength not being trusted, I should have thought about that myself!

So the current patch obviously won't cut it, as far as solutions go, but I'm wondering if a possible fix would be to dispatch a message (e.g. "progressiveComplete") from https://searchfox.org/mozilla-central/rev/56705678f5fc363be5e0237e1686f619b0d23009/browser/extensions/pdfjs/content/PdfStreamConverter.jsm#626-628 which is then passed to PDFDataTransportStream such that the PDFDataTransportStreamReader._done property can be set?

It means if file is partially loaded, the range request is ignored. I wonder if we can do the same at the display layer, in case if it is just a async timing issue between data arriving to core layer and display one.

Good find, that looks very much relevant here as well. Implementing something similar for PDFDataTransportStream doesn't look too difficult (famous last words) so I can try that later.

@Snuffleupagus Snuffleupagus force-pushed the PDFDataTransportStream-disableRange branch from 89da03f to 0f14872 Compare March 26, 2019 15:11
@Snuffleupagus
Copy link
Collaborator Author

The updated patch seems to work both with disableRange=true in Firefox and with the unit-tests, and doesn't require outright trusting the contentLength. Note that it will also require a few minor, in my opinion at least, changes to the code in mozilla-central; please see the diff below.

@yurydelendik Thanks for helping out here; does this look like a more viable approach to you?
Also, the ideas from #10675 (comment) are now deferred for a later patch, since I didn't want to touch too much code and thus risk regressing file loading in general.

diff --git a/browser/extensions/pdfjs/content/PdfStreamConverter.jsm b/browser/extensions/pdfjs/content/PdfStreamConverter.jsm
--- a/browser/extensions/pdfjs/content/PdfStreamConverter.jsm
+++ b/browser/extensions/pdfjs/content/PdfStreamConverter.jsm
@@ -168,16 +168,19 @@ PdfDataListener.prototype = {
     let writeOffset = 0;
     while (this.buffers.length) {
       let buffer = this.buffers.shift();
       combinedArray.set(buffer, writeOffset);
       writeOffset += buffer.length;
     }
     return combinedArray;
   },
+  get isDone() {
+    return !!this.isDataReady;
+  },
   finish: function PdfDataListener_finish() {
     this.isDataReady = true;
     if (this.oncompleteCallback) {
       this.oncompleteCallback(this.readData());
     }
   },
   error: function PdfDataListener_error(errorCode) {
     this.errorCode = errorCode;
@@ -601,45 +604,53 @@ class RangedChromeActions extends Chrome
     // requests, which we need to abort when we leave the page
     domWindow.addEventListener("unload", function unload(e) {
       domWindow.removeEventListener(e.type, unload);
       self.abortLoading();
     });
   }
 
   initPassiveLoading() {
-    var data;
+    let data, done;
     if (!this.streamingEnabled) {
       this.originalRequest.cancel(Cr.NS_BINDING_ABORTED);
       this.originalRequest = null;
       data = this.dataListener.readData();
+      done = this.dataListener.isDone;
       this.dataListener = null;
     } else {
       data = this.dataListener.readData();
+      done = this.dataListener.isDone;
 
       this.dataListener.onprogress = (loaded, total) => {
         this.domWindow.postMessage({
           pdfjsLoadAction: "progressiveRead",
           loaded,
           total,
           chunk: this.dataListener.readData(),
         }, PDF_VIEWER_ORIGIN);
       };
       this.dataListener.oncomplete = () => {
+        if (!done && this.dataListener.isDone) {
+          this.domWindow.postMessage({
+            pdfjsLoadAction: "progressiveDone",
+          }, PDF_VIEWER_ORIGIN);
+        }
         this.dataListener = null;
       };
     }
 
     this.domWindow.postMessage({
       pdfjsLoadAction: "supportsRangedLoading",
       rangeEnabled: this.rangeEnabled,
       streamingEnabled: this.streamingEnabled,
       pdfUrl: this.pdfUrl,
       length: this.contentLength,
       data,
+      done,
     }, PDF_VIEWER_ORIGIN);
 
     return true;
   }
 
   requestDataRange(args) {
     if (!this.rangeEnabled) {
       return;

@Snuffleupagus
Copy link
Collaborator Author

/botio unittest

@pdfjsbot
Copy link

From: Bot.io (Windows)


Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/51aec803513ac1b/output.txt

@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/d6a084b392613fa/output.txt

…Stream`

Currently if trying to set `disableRange=true` in the built-in PDF Viewer in Firefox, either through `about:config` or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing.

Obviously it's not a good idea to set `disableRange=true`, however it seems that this bug affects the PDF Viewer in Firefox even with default settings:
 - In the case where `initialData` already contains the *entire* file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete.
 - (In the case where the data arrives, via streaming, before being specifically requested through `requestDataRange`, we're also forced to re-fetch data unnecessarily.) *This part was removed, to reduce the scope/risk of the patch somewhat.*

In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).
@pdfjsbot
Copy link

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/d6a084b392613fa/output.txt

Total script time: 2.86 mins

  • Unit Tests: Passed

@Snuffleupagus Snuffleupagus force-pushed the PDFDataTransportStream-disableRange branch from 0f14872 to bb384dd Compare March 26, 2019 15:34
@pdfjsbot
Copy link

From: Bot.io (Windows)


Success

Full output at http://54.215.176.217:8877/51aec803513ac1b/output.txt

Total script time: 5.86 mins

  • Unit Tests: Passed

Copy link
Contributor

@yurydelendik yurydelendik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approach is right. (The end goal was to have FirefoxCom match Streams API). Let's do it as you implemented for now. Thanks.

@timvandermeij
Copy link
Contributor

Have the changes in mozilla-central already been made, since I think that that's a requirement for merging this, or is that not strictly necessary to land this here?

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Mar 29, 2019

Have the changes in mozilla-central already been made, since I think that that's a requirement for merging this, or is that not strictly necessary to land this here?

Ideally, they should probably land simultaneously with the changes being made here.

Also, I'm no longer able to submit patches for mozilla-central, since you now need a 2FA device to do so. Currently I don't have a device that I'm able to use for this purpose, and having to essentially pay for the "privilege" of being able to submit a patch I've written in my spare time just feels wrong to me :-(

@timvandermeij
Copy link
Contributor

timvandermeij commented Mar 29, 2019

Also, I'm no longer able to submit patches for mozilla-central, since you now need a 2FA device to do so.

It's also (usually?) possible for 2FA that you're presented with a secret key (I believe it's 16 characters) so that you can use, for example, Google Authenticator from a desktop. Refer to https://www.labnol.org/internet/google-authenticator-for-desktop/25341/ and https://github.com/grahammitchell/google-authenticator. Perhaps you already thought of such a solution though, but I thought I'd just mention it since it might help out with not needing an actual device.

@Snuffleupagus
Copy link
Collaborator Author

Snuffleupagus commented Mar 30, 2019

Perhaps you already thought of such a solution though, but I thought I'd just mention it since it might help out with not needing an actual device.

While that ought to work, and there's even e.g. a Firefox addon that could probably be used, doesn't it sort of negate the whole point of 2FA by using one and the same device though?
Anyway, I really appreciate your suggestions here!

@timvandermeij
Copy link
Contributor

timvandermeij commented Mar 30, 2019

doesn't it sort of negate the whole point of 2FA by using one and the same device though

Not necessarily. Two-factor authentication is nothing more than adding an additional step to the login process. It doesn't actually have to be another device, it should just be another type of data that someone else cannot obtain (easily). From https://en.wikipedia.org/wiki/Wikipedia:Simple_2FA: "You can use apps like WinAuth and KeeWeb to handle 2FA tokens on a desktop computer. This is the recommended way to use 2FA if you don't have a smartphone or tablet computer". I would highly recommend saving the 2FA secret in a password manager so it's properly encrypted. If your username/password would become public after a database breach, the 2FA token would still prevent people from logging in to your account.

@Snuffleupagus
Copy link
Collaborator Author

Have the changes in mozilla-central already been made, since I think that that's a requirement for merging this, or is that not strictly necessary to land this here?

Looking at this again I don't believe that needs to happen first, since the only thing that will happen without the PdfStreamConverter.jsm changes is that the disableRange regression simply won't be fixed yet.

@timvandermeij Mind landing this to get the ball rolling; thanks!
(Also, please feel free to mark a couple of my comments above as off-topic if you like.)


@rvandermeulen Could I please trouble you to include the following diff, https://gist.github.com/Snuffleupagus/960c70fe1775af04f8711f1699580a94, when the changes from this PR are uplifted to mozilla-central next?
(I'm still struggling with the whole Phabricator situation, and the Windows setup is "interesting" to say the least, so I'm unsure when I'll be able to submit patches myself.)

@rvandermeulen
Copy link

Sure thing! FWIW, I've been using https://bitbucket.org/kmaglione/hgext/src/default/phabricator.py on Windows rather than messing with arc and it works well. Feel free to drop me a line on email or IRC if you need help setting it up.

@timvandermeij
Copy link
Contributor

/botio-linux preview

@pdfjsbot
Copy link

pdfjsbot commented Apr 4, 2019

From: Bot.io (Linux m4)


Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/0c1279cc15b084e/output.txt

@pdfjsbot
Copy link

pdfjsbot commented Apr 4, 2019

From: Bot.io (Linux m4)


Success

Full output at http://54.67.70.0:8877/0c1279cc15b084e/output.txt

Total script time: 1.84 mins

Published

@timvandermeij timvandermeij merged commit 072c586 into mozilla:master Apr 4, 2019
@timvandermeij
Copy link
Contributor

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants