Race condition causes annotations to sometimes fail to anchor in PDFs #1330

robertknight · 2019-08-27T13:15:41Z

Steps to reproduce

Go to https://via.hypothes.is/https://www.nsqol.org/wp-content/uploads/2019/02/National-Standards-for-Quality-Online-Teaching.pdf and select the "Public" group

Expected result

There is an annotation on the last page (https://hypothes.is/a/LJMxNsjFEem55S9pNtX0Aw). It should always anchor.

Actual result

Testing in Chrome 78.0.3887.7, I see the annotation fail to anchor once every few attempts.

Inspecting the store state in the sidebar showed that no anchoring timeout had occurred but instead the annotation had just failed to anchor:

Adding a breakpoint in the client code in annotator.bundle.js where anchoring failures are handled (the catch clause in response to a failed anchor - see code), I got the following exception:

Which points to pdfPage being undefined here:

client/src/annotator/anchoring/pdf.js

Line 76 in 4dfebbd

.pdfPage.getTextContent({

Looking at the implementation of the getPageView method in PDF.js which returns the object that is expected to have a pdfPage property, it looks like there are indeed certain states when that method could return undefined. Reading through the implementation, it looks like we should be using pdfDocument.getPage whenever we want a reference to a page, which returns a Promise<PDFPageProxy> on which we can call getTextContent directly.

The text was updated successfully, but these errors were encountered:

robertknight · 2019-09-05T07:07:46Z

Assigning this to myself since I'd already started looking into solutions.

robertknight · 2019-09-05T07:43:35Z

To reproduce the issue with the client and Via locally more easily, apply the following diff to Via:

diff --git a/static/pdfjs/web/viewer.js b/static/pdfjs/web/viewer.js
index ca12733..4d501ab 100644
--- a/static/pdfjs/web/viewer.js
+++ b/static/pdfjs/web/viewer.js
@@ -4304,9 +4304,14 @@ var PDFViewer = (function pdfViewer() {
       var firstPagePromise = pdfDocument.getPage(1);
       this.firstPagePromise = firstPagePromise;
 
+      const delay = ms => new Promise(resolve => setTimeout(resolve, ms));
+
       // Fetch a single page so we can get a viewport that will be the default
       // viewport for all pages
-      return firstPagePromise.then(function(pdfPage) {
+      return firstPagePromise.then(async function(pdfPage) {
+        // TESTING
+        await delay(500);
+
         var scale = this._currentScale || 1.0;
         var viewport = pdfPage.getViewport(scale * CSS_UNITS);
         for (var pageNum = 1; pageNum <= pagesCount; ++pageNum) {

Note that this will cause a couple of unrelated errors in the console and the PDF page will not render immediately until you scroll it a bit, but it demonstrates the orphaning issue.

Fix a race condition that could occur when the client tried to anchor annotations in a PDF while the PDF was still loading. PDF.js' `PDFViewer` class initializes its `PDFPageView` objects, as returned by `PDFViewer.getPageView`, asynchronously. While the page views are being initialized, `PDFViewer.pagesCount` returns 0 and `PDFViewer.getPageView` will return a nullish value. When loading has progressed further, `getPageView` returns a `PDFPageView` but with no `pdfPage` property. Once the page views are fully ready, a bubbling "pagesloaded" event is dispatched at the PDF viewer's container DOM element. The PDF anchoring code previously assumed that page views were always immediately available and they always had a `pdfPage` property. If this was not the case, anchoring would fail and all/many annotations would appear as orphans in the client. See the associated issue for ways to reproduce this locally with Via. Resolve the issue by making the `getPageView` helper which wraps `PDFViewer.getPageView` async, and block in that function until the page views are fully ready. - Make `getPageView` helper async and block if necessary until page views are ready. - Add a test to simulate scenarios where page views are not ready yet - Change the `quotePositionCache` cache to store page indexes rather than `PDFPageView` objects. This means that less code in the file needs to know about `PDFPageView` objects, and deal with the fact that they may not be immediately available. It also means it should be easier to adapt the code to future PDF.js API changes. Fixes #1330

klemay mentioned this issue Aug 27, 2019

Spike: PDF annotations orphaning in LMS app hypothesis/lms#875

Closed

robertknight added the Added to sprint label Aug 27, 2019

lyzadanger removed the Added to sprint label Sep 4, 2019

robertknight self-assigned this Sep 5, 2019

robertknight mentioned this issue Sep 5, 2019

Fix race condition when anchoring in a PDF during initial document load #1352

Merged

robertknight closed this as completed in #1352 Sep 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition causes annotations to sometimes fail to anchor in PDFs #1330

Race condition causes annotations to sometimes fail to anchor in PDFs #1330

robertknight commented Aug 27, 2019

robertknight commented Sep 5, 2019

robertknight commented Sep 5, 2019

Race condition causes annotations to sometimes fail to anchor in PDFs #1330

Race condition causes annotations to sometimes fail to anchor in PDFs #1330

Comments

robertknight commented Aug 27, 2019

robertknight commented Sep 5, 2019

robertknight commented Sep 5, 2019