Annotations immediately orphan when selecting text in a PDF that includes last character on page #1329

robertknight · 2019-08-27T12:01:20Z

This issue was created as part of investigating hypothesis/lms#875.

Steps to reproduce

Go to https://via.hypothes.is/https://arxiv.org/pdf/1908.04683v3.pdf, scroll down to page 5 and select the text beginning at the bottom of the page from "For our experiments" up to and including the page number "5" at the bottom of the page.
Click "Annotate"

Expected result: The selected text is highlighted and a new annotation appears in the "Annotations" tab in the client
Actual result: The selected text is not highlighted and the annotation appears in the "Orphans" tab in the client

After reloading the page, the annotation continues to be an orphan instead of anchoring.

Notes

The issue occurs whenever a selection to annotate matches these criteria:

The text is unique enough within the document that it does not fuzzily/roughly match any other text
The selected text includes the last character in the page's text. More specifically, it needs to extend up to the end of the invisible div.textLayer element that PDF.js creates for each page to enable text selection.

The text was updated successfully, but these errors were encountered:

mkdir-washington-edu · 2019-09-13T13:29:27Z

Also occurs here: https://hypothesis.zendesk.com/agent/tickets/5926

robertknight · 2019-10-07T17:04:13Z

The immediate problem here is an upstream issue in the dom-anchor-text-position library. In brief:

When anchoring a quote within a PDF, the client fetches the text of each page and attempts to locate the quote within the text
If successful, there are two possibilities: Either the page where the quote was found, in which case there will be a text layer <div> used for text selection, or it is not, in which case there will just be a placeholder for the text layer
If there is a text layer, the quote is mapped to a text position selector within the text layer, and the text position selector is mapped to a DOM Range using dom-anchor-text-position. This is basically the same as if we were doing HTML anchoring.

There is a problem in (3) if the generated text position includes the final character of the text layer <div> as this triggers an exception in dom-anchor-text-quote.

Although the basic problem is not actually specific to PDF annotation, it essentially never happens for HTML annotation because there are always some characters in an HTML document beyond what the user can actually select. With PDF annotation however, it is actually possible to select the last character in the text layer.

I've submitted a PR and test cases upstream. Depending on how soon Randall is able to take a look and whether any further changes are needed, we'll either upgrade the library or temporarily use a fork.

Address hypothesis/client#1329 and close #2.

This currently fails to a bug on `dom-anchor-text-position`'s `toRange` implementation. See #1329

Add a new implementation of conversion from text positions (that is, offsets within an element's `textContent`) to DOM `Range`s along with test cases. This addresses an issue with the existing implementation of `toRange` in the `dom-anchor-text-position` package where conversion fails when the text position includes the end of the element's text. Even if/when the issue is addressed upstream, I think it would be useful to retain these test cases to guard against future regressions. See #1329

Change PDF anchoring to use the new text position => Range implementation from the `src/annotator/anchoring/text-position` module which fixes an issue when the last text in a PDF page is selected. Fixes #1329

Add a new implementation of conversion from text positions (that is, offsets within an element's `textContent`) to DOM `Range`s along with test cases. This addresses an issue with the existing implementation of `toRange` in the `dom-anchor-text-position` package where conversion fails when the text position includes the end of the element's text. Even if/when the issue is addressed upstream, I think it would be useful to retain these test cases to guard against future regressions. See #1329

Change PDF anchoring to use the new text position => Range implementation from the `src/annotator/anchoring/text-position` module which fixes an issue when the last text in a PDF page is selected. Fixes #1329

Address hypothesis/client#1329 and close #2.

klemay mentioned this issue Aug 27, 2019

Spike: PDF annotations orphaning in LMS app hypothesis/lms#875

Closed

robertknight added the Added to sprint label Aug 27, 2019

lyzadanger removed the Added to sprint label Sep 4, 2019

robertknight mentioned this issue Sep 23, 2019

Make unexpected errors during anchoring easier to see/debug #1370

Open

robertknight self-assigned this Oct 7, 2019

robertknight mentioned this issue Oct 7, 2019

Handle start/end positions equal to the root's text content length tilgovi/dom-anchor-text-position#7

Closed

tilgovi added a commit to tilgovi/dom-anchor-text-position that referenced this issue Oct 8, 2019

Fix toRange handling of overruns and empty roots

9dea0b4

Address hypothesis/client#1329 and close #2.

tilgovi added a commit to tilgovi/dom-anchor-text-position that referenced this issue Oct 12, 2019

Fix toRange handling of overruns and empty roots

f9465a4

Address hypothesis/client#1329 and close #2.

robertknight added a commit that referenced this issue Oct 22, 2019

Add a failing test case for annotation matching last text on page

623a359

This currently fails to a bug on `dom-anchor-text-position`'s `toRange` implementation. See #1329

robertknight mentioned this issue Oct 22, 2019

Fix PDF anchoring when annotation refers to last text on a page #1449

Merged

tilgovi added a commit to tilgovi/dom-anchor-text-position that referenced this issue Oct 28, 2019

Use dom-seek v5 to handle toRange exceptions

9178ef1

Address hypothesis/client#1329 and close #2.

tilgovi added a commit to tilgovi/dom-anchor-text-position that referenced this issue Oct 28, 2019

Use dom-seek v5 to handle toRange exceptions

24e278a

Address hypothesis/client#1329 and close #2.

LMS007 mentioned this issue Oct 28, 2019

Creating annotation fails when selection starts or ends in a link in a PDF #1464

Closed

robertknight closed this as completed in #1449 Oct 29, 2019

tilgovi added a commit to tilgovi/dom-anchor-text-position that referenced this issue Mar 26, 2020

Use dom-seek v5 to handle toRange exceptions

c0368ac

Address hypothesis/client#1329 and close #2.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Annotations immediately orphan when selecting text in a PDF that includes last character on page #1329

Annotations immediately orphan when selecting text in a PDF that includes last character on page #1329

robertknight commented Aug 27, 2019

mkdir-washington-edu commented Sep 13, 2019

robertknight commented Oct 7, 2019

Annotations immediately orphan when selecting text in a PDF that includes last character on page #1329

Annotations immediately orphan when selecting text in a PDF that includes last character on page #1329

Comments

robertknight commented Aug 27, 2019

mkdir-washington-edu commented Sep 13, 2019

robertknight commented Oct 7, 2019