Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race condition causes annotations to sometimes fail to anchor in PDFs #1330

Closed
robertknight opened this issue Aug 27, 2019 · 2 comments · Fixed by #1352
Closed

Race condition causes annotations to sometimes fail to anchor in PDFs #1330

robertknight opened this issue Aug 27, 2019 · 2 comments · Fixed by #1352
Assignees

Comments

@robertknight
Copy link
Member

Steps to reproduce

Go to https://via.hypothes.is/https://www.nsqol.org/wp-content/uploads/2019/02/National-Standards-for-Quality-Online-Teaching.pdf and select the "Public" group

Expected result

There is an annotation on the last page (https://hypothes.is/a/LJMxNsjFEem55S9pNtX0Aw). It should always anchor.

Actual result

Testing in Chrome 78.0.3887.7, I see the annotation fail to anchor once every few attempts.

Inspecting the store state in the sidebar showed that no anchoring timeout had occurred but instead the annotation had just failed to anchor:

Screenshot 2019-08-27 13 27 48

Adding a breakpoint in the client code in annotator.bundle.js where anchoring failures are handled (the catch clause in response to a failed anchor - see code), I got the following exception:

Screenshot 2019-08-27 13 36 05

Which points to pdfPage being undefined here:

.pdfPage.getTextContent({

Looking at the implementation of the getPageView method in PDF.js which returns the object that is expected to have a pdfPage property, it looks like there are indeed certain states when that method could return undefined. Reading through the implementation, it looks like we should be using pdfDocument.getPage whenever we want a reference to a page, which returns a Promise<PDFPageProxy> on which we can call getTextContent directly.

@robertknight
Copy link
Member Author

Assigning this to myself since I'd already started looking into solutions.

@robertknight
Copy link
Member Author

To reproduce the issue with the client and Via locally more easily, apply the following diff to Via:

diff --git a/static/pdfjs/web/viewer.js b/static/pdfjs/web/viewer.js
index ca12733..4d501ab 100644
--- a/static/pdfjs/web/viewer.js
+++ b/static/pdfjs/web/viewer.js
@@ -4304,9 +4304,14 @@ var PDFViewer = (function pdfViewer() {
       var firstPagePromise = pdfDocument.getPage(1);
       this.firstPagePromise = firstPagePromise;
 
+      const delay = ms => new Promise(resolve => setTimeout(resolve, ms));
+
       // Fetch a single page so we can get a viewport that will be the default
       // viewport for all pages
-      return firstPagePromise.then(function(pdfPage) {
+      return firstPagePromise.then(async function(pdfPage) {
+        // TESTING
+        await delay(500);
+
         var scale = this._currentScale || 1.0;
         var viewport = pdfPage.getViewport(scale * CSS_UNITS);
         for (var pageNum = 1; pageNum <= pagesCount; ++pageNum) {

Note that this will cause a couple of unrelated errors in the console and the PDF page will not render immediately until you scroll it a bit, but it demonstrates the orphaning issue.

robertknight added a commit that referenced this issue Sep 5, 2019
Fix a race condition that could occur when the client tried to anchor
annotations in a PDF while the PDF was still loading.

PDF.js' `PDFViewer` class initializes its `PDFPageView` objects, as
returned by `PDFViewer.getPageView`, asynchronously. While the page
views are being initialized, `PDFViewer.pagesCount` returns 0 and
`PDFViewer.getPageView` will return a nullish value. When loading has
progressed further, `getPageView` returns a `PDFPageView` but with no
`pdfPage` property. Once the page views are fully ready, a bubbling
"pagesloaded" event is dispatched at the PDF viewer's container DOM
element.

The PDF anchoring code previously assumed that page views were always
immediately available and they always had a `pdfPage` property. If this
was not the case, anchoring would fail and all/many annotations would
appear as orphans in the client. See the associated issue for ways to
reproduce this locally with Via.

Resolve the issue by making the `getPageView` helper which wraps
`PDFViewer.getPageView` async, and block in that function until the page
views are fully ready.

 - Make `getPageView` helper async and block if necessary until page
   views are ready.
 - Add a test to simulate scenarios where page views are not ready yet
 - Change the `quotePositionCache` cache to store page indexes rather
   than `PDFPageView` objects. This means that less code in the file
   needs to know about `PDFPageView` objects, and deal with the fact
   that they may not be immediately available. It also means it should
   be easier to adapt the code to future PDF.js API changes.

Fixes #1330
robertknight added a commit that referenced this issue Sep 5, 2019
Fix a race condition that could occur when the client tried to anchor
annotations in a PDF while the PDF was still loading.

PDF.js' `PDFViewer` class initializes its `PDFPageView` objects, as
returned by `PDFViewer.getPageView`, asynchronously. While the page
views are being initialized, `PDFViewer.pagesCount` returns 0 and
`PDFViewer.getPageView` will return a nullish value. When loading has
progressed further, `getPageView` returns a `PDFPageView` but with no
`pdfPage` property. Once the page views are fully ready, a bubbling
"pagesloaded" event is dispatched at the PDF viewer's container DOM
element.

The PDF anchoring code previously assumed that page views were always
immediately available and they always had a `pdfPage` property. If this
was not the case, anchoring would fail and all/many annotations would
appear as orphans in the client. See the associated issue for ways to
reproduce this locally with Via.

Resolve the issue by making the `getPageView` helper which wraps
`PDFViewer.getPageView` async, and block in that function until the page
views are fully ready.

 - Make `getPageView` helper async and block if necessary until page
   views are ready.
 - Add a test to simulate scenarios where page views are not ready yet
 - Change the `quotePositionCache` cache to store page indexes rather
   than `PDFPageView` objects. This means that less code in the file
   needs to know about `PDFPageView` objects, and deal with the fact
   that they may not be immediately available. It also means it should
   be easier to adapt the code to future PDF.js API changes.

Fixes #1330
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants