[Firefox regression] Fix `disableRange=true` bug in `PDFDataTransportStream` #10675

Snuffleupagus · 2019-03-24T12:42:51Z

Currently if trying to set disableRange=true in the built-in PDF Viewer in Firefox, either through about:config or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing.

Obviously it's not a good idea to set disableRange=true, however it seems that this bug affects the PDF Viewer in Firefox even with default settings:

In the case where initialData already contains the entire file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete.
(In the case where the data arrives, via streaming, before being specifically requested through requestDataRange, we're also forced to re-fetch data unnecessarily.) This part was removed, to reduce the scope/risk of the patch somewhat.

In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).

/cc @yurydelendik Could you please, time permitting of course, check that this patch makes sense?

Snuffleupagus · 2019-03-24T13:02:03Z

Apparently I'm not really understanding the code well enough to fix this, since one of the existing unit-tests started failing now :-(

Edit: I've also tested setting this._done = true; only in the _queuedChunks case, but while it fixes the test failure it does nothing for the bug which prompted this patch.

pdf.js/test/unit/api_spec.js

Lines 1496 to 1527 in 9b5a937

    
           it('should fetch document info and page using range and streaming', 
        
               function(done) { 
        
             const initialDataLength = 4000; 
        
             let fetches = 0, loadingTask; 
        
             dataPromise.then(function(data) { 
        
               const initialData = data.subarray(0, initialDataLength); 
        
               const transport = new PDFDataRangeTransport(data.length, initialData); 
        
               transport.requestDataRange = function(begin, end) { 
        
                 fetches++; 
        
                 if (fetches === 1) { 
        
                   // Send rest of the data on first range request. 
        
                   transport.onDataProgressiveRead(data.subarray(initialDataLength)); 
        
                 } 
        
                 waitSome(function() { 
        
                   transport.onDataRange(begin, data.subarray(begin, end)); 
        
                 }); 
        
               }; 
        
               loadingTask = getDocument(transport); 
        
               return loadingTask.promise; 
        
             }).then(function(pdfDocument) { 
        
               expect(pdfDocument.numPages).toEqual(14); 
        
               return pdfDocument.getPage(10); 
        
             }).then(function(pdfPage) { 
        
               expect(pdfPage.rotate).toEqual(0); 
        
               expect(fetches).toEqual(1); 
        
               waitSome(function() { 
        
                 loadingTask.destroy().then(done); 
        
               }); 
        
             }).catch(done.fail);

yurydelendik

Really good. Thank you for the patch.

yurydelendik · 2019-03-25T12:42:56Z

Actually let me think more about this PR. As I recall, we cannot trust _contentLength.

yurydelendik · 2019-03-25T12:57:32Z

I spy

pdf.js/src/core/chunked_stream.js

Line 116 in 89da03f

if (end <= this.progressiveDataLength) {

. It means if file is partially loaded, the range request is ignored. I wonder if we can do the same at the display layer, in case if it is just a async timing issue between data arriving to core layer and display one.

Snuffleupagus · 2019-03-25T14:33:02Z

Actually let me think more about this PR. As I recall, we cannot trust _contentLength.

Excellent point about contentLength not being trusted, I should have thought about that myself!

So the current patch obviously won't cut it, as far as solutions go, but I'm wondering if a possible fix would be to dispatch a message (e.g. "progressiveComplete") from https://searchfox.org/mozilla-central/rev/56705678f5fc363be5e0237e1686f619b0d23009/browser/extensions/pdfjs/content/PdfStreamConverter.jsm#626-628 which is then passed to PDFDataTransportStream such that the PDFDataTransportStreamReader._done property can be set?

It means if file is partially loaded, the range request is ignored. I wonder if we can do the same at the display layer, in case if it is just a async timing issue between data arriving to core layer and display one.

Good find, that looks very much relevant here as well. Implementing something similar for PDFDataTransportStream doesn't look too difficult (famous last words) so I can try that later.

Snuffleupagus · 2019-03-26T15:25:34Z

The updated patch seems to work both with disableRange=true in Firefox and with the unit-tests, and doesn't require outright trusting the contentLength. Note that it will also require a few minor, in my opinion at least, changes to the code in mozilla-central; please see the diff below.

@yurydelendik Thanks for helping out here; does this look like a more viable approach to you?
Also, the ideas from #10675 (comment) are now deferred for a later patch, since I didn't want to touch too much code and thus risk regressing file loading in general.

diff --git a/browser/extensions/pdfjs/content/PdfStreamConverter.jsm b/browser/extensions/pdfjs/content/PdfStreamConverter.jsm
--- a/browser/extensions/pdfjs/content/PdfStreamConverter.jsm
+++ b/browser/extensions/pdfjs/content/PdfStreamConverter.jsm
@@ -168,16 +168,19 @@ PdfDataListener.prototype = {
     let writeOffset = 0;
     while (this.buffers.length) {
       let buffer = this.buffers.shift();
       combinedArray.set(buffer, writeOffset);
       writeOffset += buffer.length;
     }
     return combinedArray;
   },
+  get isDone() {
+    return !!this.isDataReady;
+  },
   finish: function PdfDataListener_finish() {
     this.isDataReady = true;
     if (this.oncompleteCallback) {
       this.oncompleteCallback(this.readData());
     }
   },
   error: function PdfDataListener_error(errorCode) {
     this.errorCode = errorCode;
@@ -601,45 +604,53 @@ class RangedChromeActions extends Chrome
     // requests, which we need to abort when we leave the page
     domWindow.addEventListener("unload", function unload(e) {
       domWindow.removeEventListener(e.type, unload);
       self.abortLoading();
     });
   }
 
   initPassiveLoading() {
-    var data;
+    let data, done;
     if (!this.streamingEnabled) {
       this.originalRequest.cancel(Cr.NS_BINDING_ABORTED);
       this.originalRequest = null;
       data = this.dataListener.readData();
+      done = this.dataListener.isDone;
       this.dataListener = null;
     } else {
       data = this.dataListener.readData();
+      done = this.dataListener.isDone;
 
       this.dataListener.onprogress = (loaded, total) => {
         this.domWindow.postMessage({
           pdfjsLoadAction: "progressiveRead",
           loaded,
           total,
           chunk: this.dataListener.readData(),
         }, PDF_VIEWER_ORIGIN);
       };
       this.dataListener.oncomplete = () => {
+        if (!done && this.dataListener.isDone) {
+          this.domWindow.postMessage({
+            pdfjsLoadAction: "progressiveDone",
+          }, PDF_VIEWER_ORIGIN);
+        }
         this.dataListener = null;
       };
     }
 
     this.domWindow.postMessage({
       pdfjsLoadAction: "supportsRangedLoading",
       rangeEnabled: this.rangeEnabled,
       streamingEnabled: this.streamingEnabled,
       pdfUrl: this.pdfUrl,
       length: this.contentLength,
       data,
+      done,
     }, PDF_VIEWER_ORIGIN);
 
     return true;
   }
 
   requestDataRange(args) {
     if (!this.rangeEnabled) {
       return;

Snuffleupagus · 2019-03-26T15:31:27Z

/botio unittest

pdfjsbot · 2019-03-26T15:31:29Z

From: Bot.io (Windows)

Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.215.176.217:8877/51aec803513ac1b/output.txt

pdfjsbot · 2019-03-26T15:31:29Z

From: Bot.io (Linux m4)

Received

Command cmd_unittest from @Snuffleupagus received. Current queue size: 0

Live output at: http://54.67.70.0:8877/d6a084b392613fa/output.txt

…Stream` Currently if trying to set `disableRange=true` in the built-in PDF Viewer in Firefox, either through `about:config` or via the URL hash, the PDF document will never load. It appears that this has been broken for a couple of years, without anyone noticing. Obviously it's not a good idea to set `disableRange=true`, however it seems that this bug affects the PDF Viewer in Firefox even with default settings: - In the case where `initialData` already contains the *entire* file, we're forced to dispatch a range request to re-fetch already available data just so that file loading may complete. - (In the case where the data arrives, via streaming, before being specifically requested through `requestDataRange`, we're also forced to re-fetch data unnecessarily.) *This part was removed, to reduce the scope/risk of the patch somewhat.* In the cases outlined above, we're having to re-fetch already available data thus potentially delaying loading/rendering of PDF files in Firefox (and wasting resources in the process).

pdfjsbot · 2019-03-26T15:34:24Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/d6a084b392613fa/output.txt

Total script time: 2.86 mins

Unit Tests: Passed

pdfjsbot · 2019-03-26T15:37:21Z

From: Bot.io (Windows)

Success

Full output at http://54.215.176.217:8877/51aec803513ac1b/output.txt

Total script time: 5.86 mins

Unit Tests: Passed

yurydelendik

Approach is right. (The end goal was to have FirefoxCom match Streams API). Let's do it as you implemented for now. Thanks.

timvandermeij · 2019-03-29T22:12:13Z

Have the changes in mozilla-central already been made, since I think that that's a requirement for merging this, or is that not strictly necessary to land this here?

Snuffleupagus · 2019-03-29T22:15:28Z

Have the changes in mozilla-central already been made, since I think that that's a requirement for merging this, or is that not strictly necessary to land this here?

Ideally, they should probably land simultaneously with the changes being made here.

Also, I'm no longer able to submit patches for mozilla-central, since you now need a 2FA device to do so. Currently I don't have a device that I'm able to use for this purpose, and having to essentially pay for the "privilege" of being able to submit a patch I've written in my spare time just feels wrong to me :-(

timvandermeij · 2019-03-29T22:34:39Z

Also, I'm no longer able to submit patches for mozilla-central, since you now need a 2FA device to do so.

It's also (usually?) possible for 2FA that you're presented with a secret key (I believe it's 16 characters) so that you can use, for example, Google Authenticator from a desktop. Refer to https://www.labnol.org/internet/google-authenticator-for-desktop/25341/ and https://github.com/grahammitchell/google-authenticator. Perhaps you already thought of such a solution though, but I thought I'd just mention it since it might help out with not needing an actual device.

Snuffleupagus · 2019-03-30T07:55:18Z

Perhaps you already thought of such a solution though, but I thought I'd just mention it since it might help out with not needing an actual device.

While that ought to work, and there's even e.g. a Firefox addon that could probably be used, doesn't it sort of negate the whole point of 2FA by using one and the same device though?
Anyway, I really appreciate your suggestions here!

timvandermeij · 2019-03-30T13:25:53Z

doesn't it sort of negate the whole point of 2FA by using one and the same device though

Not necessarily. Two-factor authentication is nothing more than adding an additional step to the login process. It doesn't actually have to be another device, it should just be another type of data that someone else cannot obtain (easily). From https://en.wikipedia.org/wiki/Wikipedia:Simple_2FA: "You can use apps like WinAuth and KeeWeb to handle 2FA tokens on a desktop computer. This is the recommended way to use 2FA if you don't have a smartphone or tablet computer". I would highly recommend saving the 2FA secret in a password manager so it's properly encrypted. If your username/password would become public after a database breach, the 2FA token would still prevent people from logging in to your account.

Snuffleupagus · 2019-04-04T14:48:44Z

Have the changes in mozilla-central already been made, since I think that that's a requirement for merging this, or is that not strictly necessary to land this here?

Looking at this again I don't believe that needs to happen first, since the only thing that will happen without the PdfStreamConverter.jsm changes is that the disableRange regression simply won't be fixed yet.

@timvandermeij Mind landing this to get the ball rolling; thanks!
(Also, please feel free to mark a couple of my comments above as off-topic if you like.)

@rvandermeulen Could I please trouble you to include the following diff, https://gist.github.com/Snuffleupagus/960c70fe1775af04f8711f1699580a94, when the changes from this PR are uplifted to mozilla-central next?
(I'm still struggling with the whole Phabricator situation, and the Windows setup is "interesting" to say the least, so I'm unsure when I'll be able to submit patches myself.)

rvandermeulen · 2019-04-04T14:54:43Z

Sure thing! FWIW, I've been using https://bitbucket.org/kmaglione/hgext/src/default/phabricator.py on Windows rather than messing with arc and it works well. Feel free to drop me a line on email or IRC if you need help setting it up.

timvandermeij · 2019-04-04T21:05:10Z

/botio-linux preview

pdfjsbot · 2019-04-04T21:05:11Z

From: Bot.io (Linux m4)

Received

Command cmd_preview from @timvandermeij received. Current queue size: 0

Live output at: http://54.67.70.0:8877/0c1279cc15b084e/output.txt

pdfjsbot · 2019-04-04T21:07:02Z

From: Bot.io (Linux m4)

Success

Full output at http://54.67.70.0:8877/0c1279cc15b084e/output.txt

Total script time: 1.84 mins

Published

Viewer: http://54.67.70.0:8877/0c1279cc15b084e/web/viewer.html

timvandermeij · 2019-04-04T21:07:53Z

Thank you!

Snuffleupagus force-pushed the PDFDataTransportStream-disableRange branch from 1e449bc to 9592e3e Compare March 24, 2019 12:46

timvandermeij added the core label Mar 24, 2019

timvandermeij requested a review from yurydelendik March 24, 2019 12:46

timvandermeij added regression firefox-specific and removed firefox-specific labels Mar 24, 2019

Snuffleupagus force-pushed the PDFDataTransportStream-disableRange branch 2 times, most recently from 080ca42 to 89da03f Compare March 24, 2019 15:50

yurydelendik approved these changes Mar 25, 2019

View reviewed changes

yurydelendik self-requested a review March 25, 2019 12:57

Snuffleupagus force-pushed the PDFDataTransportStream-disableRange branch from 89da03f to 0f14872 Compare March 26, 2019 15:11

Snuffleupagus force-pushed the PDFDataTransportStream-disableRange branch from 0f14872 to bb384dd Compare March 26, 2019 15:34

yurydelendik approved these changes Mar 26, 2019

View reviewed changes

Snuffleupagus mentioned this pull request Mar 28, 2019

Avoid dispatching range requests to fetch PDF data that's already loaded with streaming (PR 10675 follow-up) #10694

Merged

timvandermeij merged commit 072c586 into mozilla:master Apr 4, 2019

timvandermeij removed the regression label Apr 4, 2019

Snuffleupagus deleted the PDFDataTransportStream-disableRange branch April 5, 2019 09:30

Snuffleupagus mentioned this pull request Apr 6, 2019

[Firefox] Ensure that loading progress is reported, and the loadingBar updated, when disableRange=true is set #10714

Merged

Snuffleupagus mentioned this pull request Nov 1, 2021

[Firefox] Handle errors if loading failed before the "supportsRangedLoading" message was sent (bug 1732141) #14217

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Firefox regression] Fix `disableRange=true` bug in `PDFDataTransportStream` #10675

[Firefox regression] Fix `disableRange=true` bug in `PDFDataTransportStream` #10675

Snuffleupagus commented Mar 24, 2019 •

edited

Loading

Snuffleupagus commented Mar 24, 2019 •

edited

Loading

yurydelendik left a comment

yurydelendik commented Mar 25, 2019

yurydelendik commented Mar 25, 2019

Snuffleupagus commented Mar 25, 2019

Snuffleupagus commented Mar 26, 2019

Snuffleupagus commented Mar 26, 2019

pdfjsbot commented Mar 26, 2019

pdfjsbot commented Mar 26, 2019

pdfjsbot commented Mar 26, 2019

pdfjsbot commented Mar 26, 2019

yurydelendik left a comment

timvandermeij commented Mar 29, 2019

Snuffleupagus commented Mar 29, 2019 •

edited

Loading

timvandermeij commented Mar 29, 2019 •

edited

Loading

Snuffleupagus commented Mar 30, 2019 •

edited

Loading

timvandermeij commented Mar 30, 2019 •

edited

Loading

Snuffleupagus commented Apr 4, 2019

rvandermeulen commented Apr 4, 2019

timvandermeij commented Apr 4, 2019

pdfjsbot commented Apr 4, 2019

pdfjsbot commented Apr 4, 2019

timvandermeij commented Apr 4, 2019

[Firefox regression] Fix disableRange=true bug in PDFDataTransportStream #10675

[Firefox regression] Fix disableRange=true bug in PDFDataTransportStream #10675

Conversation

Snuffleupagus commented Mar 24, 2019 • edited Loading

Snuffleupagus commented Mar 24, 2019 • edited Loading

yurydelendik left a comment

Choose a reason for hiding this comment

yurydelendik commented Mar 25, 2019

yurydelendik commented Mar 25, 2019

Snuffleupagus commented Mar 25, 2019

Snuffleupagus commented Mar 26, 2019

Snuffleupagus commented Mar 26, 2019

pdfjsbot commented Mar 26, 2019

From: Bot.io (Windows)

Received

pdfjsbot commented Mar 26, 2019

From: Bot.io (Linux m4)

Received

pdfjsbot commented Mar 26, 2019

From: Bot.io (Linux m4)

Success

pdfjsbot commented Mar 26, 2019

From: Bot.io (Windows)

Success

yurydelendik left a comment

Choose a reason for hiding this comment

timvandermeij commented Mar 29, 2019

Snuffleupagus commented Mar 29, 2019 • edited Loading

timvandermeij commented Mar 29, 2019 • edited Loading

Snuffleupagus commented Mar 30, 2019 • edited Loading

timvandermeij commented Mar 30, 2019 • edited Loading

Snuffleupagus commented Apr 4, 2019

rvandermeulen commented Apr 4, 2019

timvandermeij commented Apr 4, 2019

pdfjsbot commented Apr 4, 2019

From: Bot.io (Linux m4)

Received

pdfjsbot commented Apr 4, 2019

From: Bot.io (Linux m4)

Success

Published

timvandermeij commented Apr 4, 2019

[Firefox regression] Fix `disableRange=true` bug in `PDFDataTransportStream` #10675

[Firefox regression] Fix `disableRange=true` bug in `PDFDataTransportStream` #10675

Snuffleupagus commented Mar 24, 2019 •

edited

Loading

Snuffleupagus commented Mar 24, 2019 •

edited

Loading

Snuffleupagus commented Mar 29, 2019 •

edited

Loading

timvandermeij commented Mar 29, 2019 •

edited

Loading

Snuffleupagus commented Mar 30, 2019 •

edited

Loading

timvandermeij commented Mar 30, 2019 •

edited

Loading