-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Race condition causes annotations to sometimes fail to anchor in PDFs #1330
Comments
Assigning this to myself since I'd already started looking into solutions. |
To reproduce the issue with the client and Via locally more easily, apply the following diff to Via: diff --git a/static/pdfjs/web/viewer.js b/static/pdfjs/web/viewer.js
index ca12733..4d501ab 100644
--- a/static/pdfjs/web/viewer.js
+++ b/static/pdfjs/web/viewer.js
@@ -4304,9 +4304,14 @@ var PDFViewer = (function pdfViewer() {
var firstPagePromise = pdfDocument.getPage(1);
this.firstPagePromise = firstPagePromise;
+ const delay = ms => new Promise(resolve => setTimeout(resolve, ms));
+
// Fetch a single page so we can get a viewport that will be the default
// viewport for all pages
- return firstPagePromise.then(function(pdfPage) {
+ return firstPagePromise.then(async function(pdfPage) {
+ // TESTING
+ await delay(500);
+
var scale = this._currentScale || 1.0;
var viewport = pdfPage.getViewport(scale * CSS_UNITS);
for (var pageNum = 1; pageNum <= pagesCount; ++pageNum) { Note that this will cause a couple of unrelated errors in the console and the PDF page will not render immediately until you scroll it a bit, but it demonstrates the orphaning issue. |
robertknight
added a commit
that referenced
this issue
Sep 5, 2019
Fix a race condition that could occur when the client tried to anchor annotations in a PDF while the PDF was still loading. PDF.js' `PDFViewer` class initializes its `PDFPageView` objects, as returned by `PDFViewer.getPageView`, asynchronously. While the page views are being initialized, `PDFViewer.pagesCount` returns 0 and `PDFViewer.getPageView` will return a nullish value. When loading has progressed further, `getPageView` returns a `PDFPageView` but with no `pdfPage` property. Once the page views are fully ready, a bubbling "pagesloaded" event is dispatched at the PDF viewer's container DOM element. The PDF anchoring code previously assumed that page views were always immediately available and they always had a `pdfPage` property. If this was not the case, anchoring would fail and all/many annotations would appear as orphans in the client. See the associated issue for ways to reproduce this locally with Via. Resolve the issue by making the `getPageView` helper which wraps `PDFViewer.getPageView` async, and block in that function until the page views are fully ready. - Make `getPageView` helper async and block if necessary until page views are ready. - Add a test to simulate scenarios where page views are not ready yet - Change the `quotePositionCache` cache to store page indexes rather than `PDFPageView` objects. This means that less code in the file needs to know about `PDFPageView` objects, and deal with the fact that they may not be immediately available. It also means it should be easier to adapt the code to future PDF.js API changes. Fixes #1330
robertknight
added a commit
that referenced
this issue
Sep 5, 2019
Fix a race condition that could occur when the client tried to anchor annotations in a PDF while the PDF was still loading. PDF.js' `PDFViewer` class initializes its `PDFPageView` objects, as returned by `PDFViewer.getPageView`, asynchronously. While the page views are being initialized, `PDFViewer.pagesCount` returns 0 and `PDFViewer.getPageView` will return a nullish value. When loading has progressed further, `getPageView` returns a `PDFPageView` but with no `pdfPage` property. Once the page views are fully ready, a bubbling "pagesloaded" event is dispatched at the PDF viewer's container DOM element. The PDF anchoring code previously assumed that page views were always immediately available and they always had a `pdfPage` property. If this was not the case, anchoring would fail and all/many annotations would appear as orphans in the client. See the associated issue for ways to reproduce this locally with Via. Resolve the issue by making the `getPageView` helper which wraps `PDFViewer.getPageView` async, and block in that function until the page views are fully ready. - Make `getPageView` helper async and block if necessary until page views are ready. - Add a test to simulate scenarios where page views are not ready yet - Change the `quotePositionCache` cache to store page indexes rather than `PDFPageView` objects. This means that less code in the file needs to know about `PDFPageView` objects, and deal with the fact that they may not be immediately available. It also means it should be easier to adapt the code to future PDF.js API changes. Fixes #1330
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Steps to reproduce
Go to https://via.hypothes.is/https://www.nsqol.org/wp-content/uploads/2019/02/National-Standards-for-Quality-Online-Teaching.pdf and select the "Public" group
Expected result
There is an annotation on the last page (https://hypothes.is/a/LJMxNsjFEem55S9pNtX0Aw). It should always anchor.
Actual result
Testing in Chrome 78.0.3887.7, I see the annotation fail to anchor once every few attempts.
Inspecting the store state in the sidebar showed that no anchoring timeout had occurred but instead the annotation had just failed to anchor:
Adding a breakpoint in the client code in
annotator.bundle.js
where anchoring failures are handled (thecatch
clause in response to a failedanchor
- see code), I got the following exception:Which points to
pdfPage
beingundefined
here:client/src/annotator/anchoring/pdf.js
Line 76 in 4dfebbd
Looking at the implementation of the
getPageView
method in PDF.js which returns the object that is expected to have apdfPage
property, it looks like there are indeed certain states when that method could return undefined. Reading through the implementation, it looks like we should be usingpdfDocument.getPage
whenever we want a reference to a page, which returns aPromise<PDFPageProxy>
on which we can callgetTextContent
directly.The text was updated successfully, but these errors were encountered: