Anchoring rewrite #2362

Merged
merged 82 commits into from Jul 17, 2015

Projects

None yet

3 participants

@tilgovi
Contributor
tilgovi commented Jul 14, 2015

No description provided.

tilgovi added some commits Mar 31, 2015
@tilgovi tilgovi Anchoring rewrite
It went down something like this:

๐ŸŒ†โœจ๐Ÿ‘ฝ๐Ÿ’ฆ๐Ÿ™๐Ÿ’ฅ๐ŸŒ‡
92198c6
@tilgovi tilgovi Make the anchoring integration asynchronous
The core of anchoring in the guest code is now asynchronous.

- Anchoring annotations happens asynchronously in the setup function.

- Creating an annotation is asynchronous, and depends upon the resolution
  of the targets, which may depend on the resolution of the document metadata.
  The first events fire only after this happens.

- The Anchor API can now optionally return Promises when converting ranges
  and selectors, though all the present implementations are synchronous.
85588e4
@tilgovi tilgovi Take options parameter in anchoring methods
Rather than having each of the anchoring implementations have different
call signatures for their methods, use an options argument that varies
in content between the implementations, but probably shares some core
options like the root node.
dc143f9
@tilgovi tilgovi Use dom-seek instead of text-walker b293689
@tilgovi tilgovi Use platform Promise when available f294504
@tilgovi tilgovi Clean up some try/catch with Promises ff3ac89
@tilgovi tilgovi Move anchoring into its own module feda483
@tilgovi tilgovi Move highlighter out of lib 54122ec
@tilgovi tilgovi No more domChange events 4ec9d59
@tilgovi tilgovi Draw highlights in an animation frame 0bd115e
@tilgovi tilgovi Remove dead IE code 63cc91d
@tilgovi tilgovi Depend on main.js 5fab20e
@tilgovi tilgovi Support basic document info without plugins 185a98b
@tilgovi tilgovi Sort out the PDF metadata anew d44df90
@tilgovi tilgovi Add scrollables option to the BucketBar a678c6d
@tilgovi tilgovi Fix PDF page rendered event target 0f96031
@tilgovi tilgovi Naive re-anchoring attempts on PDF page render 54ac2a7
@tilgovi tilgovi PDF TextPosition and TextQuote starting to work
TextPositionAnchor can be generated and re-anchored within a PDF so
long as the selection doesn't span a page boundary.

TextQuoteAnchor is generated, but not re-anchored.
2e482bd
@tilgovi tilgovi Remove unnecessary fat arrows 15a6d87
@tilgovi tilgovi Bucket bar public update method 9cf6bae
@tilgovi tilgovi Update bucket bar after anchoring 7c38f28
@tilgovi tilgovi Restore highlight events a90971e
@tilgovi tilgovi PDF anchoring to unrendered pages 59cac03
@tilgovi tilgovi Restore PDF.js highlight color overrides 8a89aef
@tilgovi tilgovi Actually pass the TextQuote position hint 8c0117e
@tilgovi tilgovi Fall back to searching without context
Sometimes the context might hinder the anchoring if the context has
changed but the quote has not.
353b75e
@tilgovi tilgovi Fix bucket bar bottom bucket on some documents a7832f9
@tilgovi tilgovi Update for latest dom-seek 16ec4a7
@tilgovi tilgovi Add NodeIterator shim for IE a9fde36
@tilgovi tilgovi Use vanilla diff-match-patch 4f60030
@tilgovi tilgovi Fuzzy anchoring for PDF ac6509e
@tilgovi tilgovi Anchor attempts happen in an animation frame f7e82a2
@tilgovi tilgovi Newest node-iterator-shim 469a1cd
@tilgovi tilgovi Whitespace 9b3e335
@tilgovi tilgovi Only reanchor when a pdf page has placeholders 599f21f
@tilgovi tilgovi PDF page offset caching b8f54f1
@tilgovi tilgovi Big refactoring of anchor storage and flow
In order to clean up vocabulary and better serve the needs of dynamic
pages, refactor `setupAnnotation` to deal with re-setup, unhighlighting
targets that have been removed from the annotation and re-anchoring
anchors that have lost their ranges.

- Store the Range on the anchor

  * The BucketBar would use it for getBoundingClientRect, except that
    it's a bit broken on Chrome.

  * The PDF Plugin can delete it to trigger re-anchoring no page load

- Use MutationObservers in the PDF plugin and make smarter judgements
  about when pages are done rendering by using the renderState in
  addition to the text layer renderingDone flag.
c75e72e
@tilgovi tilgovi Use platform Promise when available d79b151
@tilgovi tilgovi Middle v-align placeholder bucket on the pdf page d87b607
@tilgovi tilgovi Position quote assertion for unrendered PDF pages
Avoids falsely anchoring to positions that won't resolve once a PDF
page renders.
397e47b
@tilgovi tilgovi Fuzzy anchoring on unrendered PDF pages
Also change the PDF anchoring to leverage the HTML anchoring by
allowing the options to be passed into the module-level anchor
functions in the pdf and html modules so that the pdf module can
be only responsible for finding the appropriate page and then
delegating.
bd9995e
@tilgovi tilgovi The prefix is probably not worth it on long quotes
It'll cause more false positives than it helps.
aa1a994
@tilgovi tilgovi Consistency fixes
- anchor / describe both take options
- TextQuoteAnchor takes position as an option but it's not stored
- helper closures sprinkled throughout have fewer _ prefixes
40668dd
@tilgovi tilgovi Remove all the fancy throttling of anchoring
This is premature optimization and makes things way more difficult
to reason about.
0414f3f
@tilgovi tilgovi Make the bucket bar listen to new anchor events
I won't try to do anything special here yet, instead just calling
it imperatively. Sort this out later.
feb43ab
@tilgovi tilgovi Restore the location sort for annotations d5deb03
@tilgovi tilgovi Update highlighter tests 08a627a
@tilgovi tilgovi Some edge case handling in Guest ffb57d2
@tilgovi tilgovi Get guest tests passing cadf0e5
@tilgovi tilgovi Add basic anchor storage tests for setupAnnotation 37a3b47
@tilgovi tilgovi Update add/remove anchors to annotations tests 1a06aba
@tilgovi tilgovi Add deleteAnnotation tests fcc6572
@tilgovi tilgovi Let the application manage an anchoring cache
This is maybe too verbose for now but at least it's explicit and
doesn't assume that anchoring implementations can be fully
responsible for cache invalidation.
6e309e0
@tilgovi tilgovi Use the cache for quote -> position d69879a
@tilgovi tilgovi Search pdf pages serialially and finish fast
Rather than searching all the pages in parallel, search them serially
and terminate early with a match.
e552070
@tilgovi tilgovi Search pages outward from expected position
Rather than searching PDF pages in order, use the position anchor
(when available) to spiral the search outward from the page that
likely contains the quote.
e449d19
@tilgovi tilgovi Store the normalized range in the RangeAnchor
This makes it more robust against change by storing its description
at the container level rather than a DOM Range object that could get
mutated unpredictably by text node splitting.
bd01990
@tilgovi tilgovi Get rid of 'new Promise(raf)'
I'm not sure this works because I'm not sure the first then function
will be called synchronously by the render frame invoking resolve.
4a26ae9
@tilgovi tilgovi Upgrade to dom-seek v1.0.0-beta.2 fd9e897
@tilgovi tilgovi Clean up the PDF cache usage b7ebdcf
@tilgovi tilgovi Match PDF.js behavior for skipping whitespace 880e614
@tilgovi tilgovi Upgrade dom-seek and node-iterator-shim 8dbd233
@tilgovi tilgovi Make the choice of anchoring less magical
And hide the cache details from the default implementation that
doesn't use them.
a3cf4bd
@tilgovi tilgovi Resolve the anchor positions before frame sync 2695d7c
@tilgovi tilgovi Fall back to context-free anchor when prefix fails 8dccf68
@tilgovi tilgovi Target all affected annotations on highlight event 0d534ce
@tilgovi tilgovi Remove excess asynchrony in anchoring
All of the anchoring types are actually synchronous, for now. Having
promise callbacks between the selector->anchor and anchor->range
conversions sometimes creates race conditions when multiple
annotations anchor concurrently. Punt on this for now.
1f915a4
@tilgovi tilgovi Remove unused variable 66968b2
@tilgovi tilgovi Restore some asynchrony more beautifully
While transforming a selector to a range must occur synchronously,
the individual strategies can be attempted in separate animation
frames.

Also, make this more beautiful.
0ff05a2
@tilgovi tilgovi Propagate errors in animationPromise bd7e36c
@tilgovi tilgovi Fix tests for promise-returning anchoring contract 82657d3
@tilgovi tilgovi Switch to new standalone anchoring libs
9710990
@tilgovi
Contributor
tilgovi commented Jul 14, 2015

Discuss.

@dwhly
Member
dwhly commented Jul 14, 2015

AWESOME.
screen shot 2015-07-14 at 3 35 11 pm

@tilgovi tilgovi commented on the diff Jul 14, 2015
h/static/scripts/annotator/anchoring/pdf.coffee
+
+
+###*
+# Anchor a set of selectors.
+#
+# This function converts a set of selectors into a document range using.
+# It encapsulates the core anchoring algorithm, using the selectors alone or
+# in combination to establish the best anchor within the document.
+#
+# :param Element root: The root element of the anchoring context.
+# :param Array selectors: The selectors to try.
+# :param Object options: Options to pass to the anchor implementations.
+# :return: A Promise that resolves to a Range on success.
+# :rtype: Promise
+####
+exports.anchor = (root, selectors, options = {}) ->
@tilgovi
tilgovi Jul 14, 2015 Contributor

I think this needs some cleaning up. The placeholder stuff is confusing and could use comments. The anchorByPosition function could be broken out for readability and given parameters rather than closing over the local scope.

@nickstenning
nickstenning Jul 15, 2015 Member

Agreed. I think both anchorByPosition and findInPages probably need extracting. I think a lot of the utility functions that essentially abstract over the PDF.js interface could do with being in their own module with collaboration tests for how they exercise PDF.js.

@nickstenning
nickstenning Jul 15, 2015 Member

And anchorByPosition itself could probably be reimplemented on top of a isPageRendered function in such a module.

@tilgovi tilgovi commented on the diff Jul 14, 2015
h/static/scripts/annotator/anchoring/types.coffee
+xpathRange = Annotator.Range
+
+
+# Helper function for throwing common errors
+missingParameter = (name) ->
+ throw new Error('missing required parameter "' + name + '"')
+
+
+###*
+# class:: RangeAnchor(range)
+#
+# This anchor type represents a DOM Range.
+#
+# :param Range range: A range describing the anchor.
+###
+class RangeAnchor
@tilgovi
tilgovi Jul 14, 2015 Contributor

This class isn't broken out into a separate library. I'm actually not sure how useful it is in practice (the position anchors perform very well) and it doesn't map straightforwardly to any Web Annotations Data Model selector types, so I'm holding it here for the moment.

@nickstenning
nickstenning Jul 15, 2015 Member

Let's leave it in for now but I'm totally up for removing it later. Certainly if it's not useful above and beyond the position selectors, then given that it's the cause of some of Annotator's more idiosyncratic (read: annoying) behaviour, such as splitting textNodes in the DOM when you make selections, I'd be all for removing it once we've checked the database to ensure it's not needed.

@tilgovi tilgovi and 1 other commented on an outdated diff Jul 14, 2015
h/static/scripts/annotator/guest.coffee
@@ -1,7 +1,20 @@
-Promise = require('es6-promise').Promise
+raf = require('raf')
+Promise = global.Promise ? require('es6-promise').Promise
@tilgovi
tilgovi Jul 14, 2015 Contributor

This pattern happens quite a bit. It's important because the es6-promise lib doesn't appear smart enough to delegate to the native implementation and using the polyfill has a negative performance impact due to setTimeout.

@nickstenning
nickstenning Jul 15, 2015 Member

Not sure I understand this. Why does using the polyfill have a negative performance impact, when it's a no-op in environments that already define Promise?

@tilgovi
tilgovi Jul 15, 2015 Contributor

Oh, I totally understand now. I should be just getting Promise from the global and doing require('es6-promise/polyfill'). The entire problem is because I'm requiring the polyfill version directly. Thanks for catching that.

@tilgovi
tilgovi Jul 15, 2015 Contributor

It's actually easier than that. require('es6-promise') runs the polyfill automatically but exports the shim. The solution is just to require('es6-polyfill') but ignore its exports rather than using them.

@tilgovi tilgovi commented on the diff Jul 14, 2015
h/static/scripts/annotator/guest.coffee
TextSelection: {}
- TextRange: {}
- TextPosition: {}
- TextQuote: {}
- FuzzyTextAnchors: {}
- FragmentSelector: {}
+
+ # Anchoring module
+ anchoring: require('./anchoring/html')
@tilgovi
tilgovi Jul 14, 2015 Contributor

I made this a first-class citizen of the the Annotator instance rather than something for the options hash. It feels right. I don't want the options to grow forever.

@tilgovi tilgovi commented on the diff Jul 14, 2015
h/static/scripts/annotator/guest.coffee
@@ -68,7 +70,6 @@ module.exports = class Guest extends Annotator
this.publish(event, args)
formatter: (annotation) =>
formatted = {}
- formatted.uri = @getHref()
@tilgovi
tilgovi Jul 14, 2015 Contributor

This is orthogonal to these changes. I can restore this if you all want. On the backend, we're indexing this as an alias for target.source anyway.

@nickstenning
nickstenning Jul 15, 2015 Member

Seems fine to me. It doesn't make sense for us to keep storing this field when the rest of the model explicitly says we have multiple targets.

@nickstenning
nickstenning Jul 15, 2015 Member

Although, actually -- does the currently client use this field at all? Could removing it cause problems where annotations made with the new client don't work in older deployed versions?

@tilgovi
tilgovi Jul 15, 2015 Contributor

I don't think so. I'll check.

@tilgovi tilgovi commented on the diff Jul 14, 2015
h/static/scripts/annotator/guest.coffee
else
- @plugins.Document.uri()
-
- # Utility function to get a de-hashed form of the document URI
- getHref: -> @_removeHash @getRawHref()
-
- # Utility function to filter metadata and de-hash the URIs
- getMetadata: =>
- metadata = @plugins.Document?.metadata
- metadata.link?.forEach (link) => link.href = @_removeHash link.href
- metadata
+ uriPromise = Promise.reject()
+ metadataPromise = Promise.reject()
+
+ uriPromise = uriPromise.catch(-> decodeURIComponent(window.location.href))
@tilgovi
tilgovi Jul 14, 2015 Contributor

Fallsbacks. There's a sensible result even without any plugin support for document metadata.

@nickstenning
nickstenning Jul 15, 2015 Member

Not sure we want/need to do a decodeURIComponent here in the client, do we? Insofar as this is basically a "normalisation" operation, and let's just do all of that on the server.

@tilgovi
tilgovi Jul 15, 2015 Contributor

I don't know. If a user types a URL with reserved characters into the location bar the browser will encode it and window.location.href returns the encoded version. I thought it made some sense to store the user-facing version. Willing to consider pros and cons.

@tilgovi tilgovi commented on the diff Jul 14, 2015
h/static/scripts/annotator/guest.coffee
@@ -211,21 +161,131 @@ module.exports = class Guest extends Annotator
this.removeEvents()
- setupAnnotation: ->
- annotation = super
- this.plugins.CrossFrame.sync([annotation])
- annotation
+ setupAnnotation: (annotation) ->
@tilgovi
tilgovi Jul 14, 2015 Contributor

Help me out and let me know what needs commenting here.

@tilgovi tilgovi and 1 other commented on an outdated diff Jul 14, 2015
h/static/scripts/annotator/guest.coffee
- event.stopPropagation()
- this.focusAnnotations []
-
- # When clicking on a highlight, tell the sidebar to bring up the viewer for the relevant annotations
- onAnchorClick: (event) =>
- if @visibleHighlights
- event.stopPropagation()
- this.selectAnnotations (event.data.getAnnotations event),
- (event.metaKey or event.ctrlKey)
+ onHighlightMouseover: (event) ->
+ return unless @visibleHighlights
+ annotation = $(event.currentTarget).data('annotation')
+ annotations = event.annotations ?= []
+ annotations.push(annotation)
+ if event.target is event.currentTarget
+ setTimeout => this.focusAnnotations(annotations)
@tilgovi
tilgovi Jul 14, 2015 Contributor

These might need a comment. I'm letting these events bubble now and passing along an Array as event.annotations. The innermost handler fires off an asynchronous task here. By the time it runs, the Array is filled.

In some sense I find this cleaner than walking up the tree. It doesn't stop the event from bubbling, which is always better if possible (less interference with the page).

@nickstenning
nickstenning Jul 15, 2015 Member

Simple and elegant, but definitely needs a comment. I don't think you can assume that everybody knows that setTimeout(fn) executes after all currently pending event handlers. Even people who do know this might take a moment to recall this fact, so yes. Comment needed.

@tilgovi tilgovi commented on the diff Jul 14, 2015
h/static/scripts/annotator/host.coffee
@@ -39,8 +39,6 @@ module.exports = class Host extends Guest
# Host frame dictates the toolbar options.
this.on 'panelReady', =>
- this.anchoring._scan() # Scan the document
@tilgovi
tilgovi Jul 14, 2015 Contributor

๐Ÿ•

@tilgovi tilgovi commented on the diff Jul 14, 2015
h/static/scripts/annotator/main.js
Toolbar: {container: '.annotator-frame'}
};
-// Simple IE autodetect function
-// See for example https://stackoverflow.com/questions/19999388/jquery-check-if-user-is-using-ie/21712356#21712356
-var ua = window.navigator.userAgent;
-if ((ua.indexOf("MSIE ") > 0) || // for IE <=10
- (ua.indexOf('Trident/') > 0) || // for IE 11
- (ua.indexOf('Edge/') > 0)) { // for IE 12
- options["DomTextMapper"] = {"skip": true}
+// Document metadata plugins
+if (window.PDFViewerApplication) {
@tilgovi
tilgovi Jul 14, 2015 Contributor

As much as possible I was trying to pull the document type switches out of the Guest and do it in one place, here. Probably, we can finish that by making documentInfo or something a first class citizen just as I did with the anchoring property.

@tilgovi tilgovi commented on the diff Jul 14, 2015
h/static/scripts/annotator/main.js
Toolbar: {container: '.annotator-frame'}
};
-// Simple IE autodetect function
-// See for example https://stackoverflow.com/questions/19999388/jquery-check-if-user-is-using-ie/21712356#21712356
-var ua = window.navigator.userAgent;
@tilgovi
tilgovi Jul 14, 2015 Contributor

Fuzzy anchoring should work on IE now!

@tilgovi tilgovi commented on the diff Jul 14, 2015
h/static/scripts/annotator/monkey.coffee
@@ -1,153 +0,0 @@
-Annotator = require('annotator')
@tilgovi
tilgovi Jul 14, 2015 Contributor

๐Ÿ’ ๐Ÿ”ซ

@nickstenning
nickstenning Jul 15, 2015 Member

Poor monkey.

@tilgovi tilgovi and 1 other commented on an outdated diff Jul 14, 2015
h/static/scripts/annotator/plugin/pdf.coffee
- # Determine whether a given page has been rendered
- _isPageRendered: (index) ->
- @_viewer.pages[index]?.textLayer?.renderingDone
-
- # Get the root DOM node of a given page
- getRootNodeForPage: (index) ->
- @_viewer.pages[index].textLayer.textLayerDiv
-
- constructor: ->
- # Set references to objects that moved around in different versions
- # of PDF.js, and define a few methods accordingly
- if PDFViewerApplication?
- @_app = PDFViewerApplication
- @_viewer = @_app.pdfViewer
+ pluginInit: ->
+ cache = {
@tilgovi
tilgovi Jul 14, 2015 Contributor

This is less than ideal. The cache is stored here because I didn't want anything stateful in anchoring/pdf.

@tilgovi
tilgovi Jul 14, 2015 Contributor

It's less than ideal in the sense that it splits responsibility across two places.

@nickstenning
nickstenning Jul 15, 2015 Member

I think it would be better -- certainly clearer -- to keep this entirely encapsulated in anchoring/pdf, and possibly expose a reset or clearCache function in the public interface of that module.

@tilgovi tilgovi and 1 other commented on an outdated diff Jul 14, 2015
h/static/scripts/annotator/plugin/pdf.coffee
+ quotePosition: {}
+ }
+
+ anchoring = require('../anchoring/pdf')
+
+ @annotator.anchoring = {
+ anchor: (root, selectors, options = {}) ->
+ options = extend({}, options, {cache})
+ return anchoring.anchor(root, selectors, options)
+ describe: (root, range, options = {}) ->
+ options = extend({}, options, {cache})
+ return anchoring.describe(root, range, options)
+ }
+
+ @pdfViewer = PDFViewerApplication.pdfViewer
+ @pdfViewer.viewer.classList.add('has-transparent-text-layer')
@tilgovi
tilgovi Jul 14, 2015 Contributor

I'm not even sure we still support versions old enough not to have a transparent text layer. We may just want to change our CSS.

Also, we can look into using the annotation layer that PDF.js has, which would make a hell of a lot more sense.

@tilgovi
tilgovi Jul 14, 2015 Contributor

But that would mean a separate highlighter implementation for PDF.js.

@nickstenning
nickstenning Jul 15, 2015 Member

A project for another day.

@tilgovi tilgovi and 1 other commented on an outdated diff Jul 14, 2015
h/static/scripts/annotator/plugin/pdf.coffee
- # OK, we are ready to rock.
- @_pendingScanResolve()
+ switch page.renderingState
@tilgovi
tilgovi Jul 14, 2015 Contributor

This probably needs explanations. It's interacting with the anchoring/pdf code. This code here is ensuring that when pages render/unrender that we're making sure that's detectable later by re-initializing the textLayer variable and yanking out the placeholder div that holds the highlight spans for un-rendered pages. Pulling the highlights out causes the annotations to re-anchor below.

@nickstenning
nickstenning Jul 15, 2015 Member

Yes please comment.

@tilgovi tilgovi and 1 other commented on an outdated diff Jul 14, 2015
h/static/scripts/annotator/plugin/pdf.coffee
- # Do whatever we need to do after scanning
- @_onAfterScan()
+ for anchor in annotator.anchors when anchor.highlights?
+ if anchor.annotation in refreshAnnotations
@tilgovi
tilgovi Jul 14, 2015 Contributor

We might consider having client-side only unique ids for annotation objects. Then we could do this sort of thing with a hash.

@nickstenning
nickstenning Jul 15, 2015 Member

It would also prevent the O(N) scanning that we do for normal pages when removing highlights. Again, though, one for another day.

@tilgovi tilgovi commented on an outdated diff Jul 14, 2015
h/static/scripts/annotator/plugin/pdf.coffee
- metadata.link.push {href: documentURI}
-
- metadata
-
- # Public: Get metadata (when the doc is loaded). Returns a promise.
- getMetaData: =>
- new Promise (resolve, reject) =>
- if @anchoring.document.waitForInit?
- @anchoring.document.waitForInit().then =>
- try
- resolve @_metadata()
- catch error
- reject "Internal error"
- else
- reject "Not a PDF dom mapper."
+ annotator.plugins.BucketBar?.update()
@tilgovi
tilgovi Jul 14, 2015 Contributor

This can maybe be removed since setupAnnotation should take care of it.

@tilgovi
tilgovi Jul 14, 2015 Contributor

Oh, but maybe I wanted the removed highlights to disappear from the bucket bar instantly.

@tilgovi tilgovi commented on the diff Jul 14, 2015
h/static/scripts/annotator/plugin/textselection.coffee
# calls Annotator's onSuccessfulSelection method.
#
# event - The event triggered this. Usually it's a mouseup Event,
- # but that's not necessary. The coordinates will be used,
@tilgovi
tilgovi Jul 14, 2015 Contributor

This comment was simply out of date.

@tilgovi tilgovi commented on the diff Jul 14, 2015
h/static/scripts/cross-frame.coffee
@@ -20,7 +20,10 @@ module.exports = class CrossFrame
new Discovery($window, options)
createAnnotationSync = ->
- whitelist = ['$highlight', '$orphan', 'target', 'document', 'uri']
+ whitelist = [
+ '$anchors', '$highlight', '$orphan',
+ 'target', 'document', 'uri'
@tilgovi
tilgovi Jul 14, 2015 Contributor

I guess if I removed uri in the other place I should remove it here.

@tilgovi tilgovi commented on the diff Jul 14, 2015
h/static/scripts/vendor/diff_match_patch_uncompressed.js
@@ -1,2200 +0,0 @@
-/**
@tilgovi
tilgovi Jul 14, 2015 Contributor

Yay! No more patched vendor libs. We'll have to see about restoring show differences without this.

@tilgovi tilgovi commented on the diff Jul 14, 2015
package.json
@@ -85,9 +91,21 @@
]
},
"angular-mock": "global:angular.mock",
- "diff-match-patch": "diff_match_patch",
- "dom-text-mapper": "DomTextMapper",
- "dom-text-matcher": "DomTextMatcher",
+ "dom-anchor-fragment": "domAnchorFragment",
@tilgovi
tilgovi Jul 14, 2015 Contributor

Gross hackery. These are all UMD modules and we need to make sure they don't get tricked on pages that use AMD.

@tilgovi
tilgovi Jul 14, 2015 Contributor

By that I mean, I made sure that they don't, but it's only through all this nasty stuff.

@nickstenning
nickstenning Jul 15, 2015 Member

Not sure I understand. Aren't we requiring the unbundled versions of these? At which point it's all up to browserify?

@nickstenning
nickstenning Jul 15, 2015 Member

Ah, wait, ok. No, we're not, because they're written in ES6. Hmm. Ok.

@tilgovi tilgovi commented on the diff Jul 14, 2015
package.json
@@ -85,9 +91,21 @@
]
},
"angular-mock": "global:angular.mock",
- "diff-match-patch": "diff_match_patch",
- "dom-text-mapper": "DomTextMapper",
- "dom-text-matcher": "DomTextMatcher",
+ "dom-anchor-fragment": "domAnchorFragment",
+ "dom-anchor-text-position": {
+ "depends": [
+ "dom-seek:seek"
+ ],
+ "exports": "domAnchorTextPosition"
@tilgovi
tilgovi Jul 14, 2015 Contributor

I find this awkward. I should maybe rename the files. Even if the packages and repos are dom-anchor-foo maybe if I make the files FooAnchor.js then babel will make more sensible UMD wrappers that export FooAnchor to the global scope instead of domAnchorFoo.

@tilgovi
Contributor
tilgovi commented Jul 14, 2015

@dwhly yup!!!

It's a little misleading. 2200 lines just moved from a bundled diff-match-patch to one pulled in from npm.
Probably another 600 just moved out to the separate anchoring libraries.

Still, it's probably a net -2300 or so.

@dwhly
Member
dwhly commented Jul 14, 2015

2300 hi5s

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/anchoring/html.coffee
+
+querySelector = (type, root, selector, options) ->
+ doQuery = (resolve, reject) ->
+ try
+ anchor = type.fromSelector(root, selector, options)
+ range = anchor.toRange(options)
+ resolve(range)
+ catch error
+ reject(error)
+ return new Promise(doQuery)
+
+
+###*
+# Anchor a set of selectors.
+#
+# This function converts a set of selectors into a document range using.
@nickstenning
nickstenning Jul 15, 2015 Member

Minor comment blip here. Might want to complete the sentence!

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/anchoring/types.coffee
@@ -0,0 +1,55 @@
+Annotator = require('annotator')
+$ = Annotator.$
@nickstenning
nickstenning Jul 15, 2015 Member

Unused AFAICT.

@nickstenning nickstenning commented on the diff Jul 15, 2015
h/static/scripts/annotator/anchoring/html.coffee
+ return querySelector(TextQuoteAnchor, root, quote, options)
+
+ return promise
+
+
+exports.describe = (root, range, options = {}) ->
+ types = [FragmentAnchor, RangeAnchor, TextPositionAnchor, TextQuoteAnchor]
+
+ selectors = for type in types
+ try
+ anchor = type.fromRange(root, range, options)
+ selector = anchor.toSelector(options)
+ catch
+ continue
+
+ return selectors
@nickstenning
nickstenning Jul 15, 2015 Member

Just want to say awesome this file is. In particular, I really like how you've simplified the thing we discussed many times -- dependence of the anchor types on information from other selectors -- into optional additional arguments to toRange/toSelector.

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/anchoring/pdf.coffee
+ count = (textContent) ->
+ if total + textContent.length >= offset
+ offset = total
+ return Promise.resolve({index, offset, textContent})
+ else
+ index++
+ total += textContent.length
+ return getPageTextContent(index, cache).then(count)
+
+ return getPageTextContent(0, cache).then(count)
+
+
+###*
+# Anchor a set of selectors.
+#
+# This function converts a set of selectors into a document range using.
@nickstenning
nickstenning Jul 15, 2015 Member

Ditto here with the slightly truncated comment.

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/anchoring/html.coffee
+ position = selector
+ options.hint = position.start # TextQuoteAnchor hint
+ when 'TextQuoteSelector'
+ quote = selector
+ when 'RangeSelector'
+ range = selector
+
+ # Assert the quote matches the stored quote, if applicable
+ maybeAssertQuote = (range) ->
+ if quote?.exact? and range.toString() != quote.exact
+ throw new Error('quote mismatch')
+ else
+ return range
+
+ # Until we successfully anchor, we fail.
+ promise = Promise.reject('unable to anchor')
@nickstenning
nickstenning Jul 15, 2015 Member

When reading this pattern:

promise = Promise.reject('boom')

if foo?
  promise = promise.catch ->
    return myNewPromise()

if bar?
  promise = promise.catch ->
    return myOtherPromise()

return promise

My first thought was: "would this be simpler and clearer as..."

promise = null

if foo?
  promise = myNewPromise()

if bar?
  promise = myOtherPromise()

if not promise?
  return Promise.reject('boom')
return promise

Of course the answer is "no", because the entire point is that you're chaining up the anchoring strategies so that if one of them fails to anchor the next one is tried.

I think adding a comment to this effect here, and pointing out that the chain goes, top to bottom, from "fast but brittle" to "slow but robust", would be useful.

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/anchoring/pdf.coffee
+
+getPage = (pageIndex) ->
+ return PDFViewerApplication.pdfViewer.pages[pageIndex]
+
+
+getPageTextContent = (pageIndex, cache) ->
+ if cache[pageIndex]?
+ return Promise.resolve(cache[pageIndex])
+ else
+ joinItems = ({items}) ->
+ nonEmpty = (item.str for item in items when /\S/.test(item.str))
+ textContent = nonEmpty.join('')
+ cache[pageIndex] = textContent
+ return textContent
+
+ return PDFViewerApplication.pdfViewer.getPageTextContent(pageIndex)
@nickstenning
nickstenning Jul 15, 2015 Member

It might be worth noting here that pdfViewer.getPageTextContent isn't obviously public API for PDF.js.

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/anchoring/pdf.coffee
+getNodeTextLayer = (node) ->
+ until node.classList?.contains('page')
+ node = node.parentNode
+ return node.getElementsByClassName('textLayer')[0]
+
+
+getPage = (pageIndex) ->
+ return PDFViewerApplication.pdfViewer.pages[pageIndex]
+
+
+getPageTextContent = (pageIndex, cache) ->
+ if cache[pageIndex]?
+ return Promise.resolve(cache[pageIndex])
+ else
+ joinItems = ({items}) ->
+ nonEmpty = (item.str for item in items when /\S/.test(item.str))
@nickstenning
nickstenning Jul 15, 2015 Member

If you know more about the rationale behind doing this, that would be another candidate for a comment, IMO. It's not immediately clear to me when reading this code why we're doing this normalisation/sanitisation step, and it's not clear that it wouldn't potentially break if we got back something like

<TextItem str="hello"><TextItem str=" "><TextItem str="world">

which would presumably be completely valid from PDF.js's point of view? (I don't know, which is why I'm asking...)

@nickstenning nickstenning commented on the diff Jul 15, 2015
h/static/scripts/annotator/anchoring/pdf.coffee
+ else
+ findInPages(pages)
+
+ return promise
+
+
+exports.describe = (root, range, options = {}) ->
+ cache = options.cache ? {}
+ range = new xpathRange.BrowserRange(range).normalize()
+
+ startTextLayer = getNodeTextLayer(range.start)
+ endTextLayer = getNodeTextLayer(range.end)
+
+ # XXX: range covers only one page
+ if startTextLayer isnt endTextLayer
+ throw new Error('selecting across page breaks is not supported')
@nickstenning
nickstenning Jul 15, 2015 Member

Is this a regression? How is this error relayed to the user?

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/anchoring/pdf.coffee
+ pages = []
+ while left.length or right.length
+ if right.length
+ pages.push(right.shift())
+ if left.length
+ pages.push(left.pop())
+ return findInPages(pages)
+ else
+ findInPages(pages)
+
+ return promise
+
+
+exports.describe = (root, range, options = {}) ->
+ cache = options.cache ? {}
+ range = new xpathRange.BrowserRange(range).normalize()
@nickstenning
nickstenning Jul 15, 2015 Member

Do we even need to use xpathRange here? Given that a native browser range has startContainer and endContainer, and each of these are guaranteed not to span a text layer boundary, won't that do?

@nickstenning nickstenning commented on the diff Jul 15, 2015
h/static/scripts/annotator/anchoring/types.coffee
+# :param Range range: A range describing the anchor.
+###
+class RangeAnchor
+ constructor: (root, range) ->
+ unless root? then missingParameter('root')
+ unless range? then missingParameter('range')
+ @root = root
+ @range = xpathRange.sniff(range).normalize(@root)
+
+ @fromRange: (root, range) ->
+ return new RangeAnchor(root, range)
+
+ # Create and anchor using the saved Range selector.
+ @fromSelector: (root, selector) ->
+ data = {
+ start: selector.startContainer
@nickstenning
nickstenning Jul 15, 2015 Member

Huh, you learn something every day. I didn't realise we'd diverged from the underlying implementation by renaming {start, end} to {startContainer, endContainer}.

@tilgovi
tilgovi Jul 15, 2015 Contributor

Yeah. That was a totally unnecessary thing I did once upon a time.

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/plugin/pdf.coffee
- # If we have not yet finished the initial scanning, then we are
- # not interested.
- return unless @pageInfo?
+ @observer = new MutationObserver((mutations) => this.update())
@nickstenning
nickstenning Jul 15, 2015 Member

โค๏ธ๐Ÿ’š๐Ÿ’™

@nickstenning nickstenning commented on the diff Jul 15, 2015
h/static/scripts/annotator/guest.coffee
- # Are going to be able to use the PDF plugin here?
- if window.PDFTextMapper?.applicable()
- # If we can, let's load the PDF plugin.
- @options.PDF = {}
@nickstenning
nickstenning Jul 15, 2015 Member

So what now actually causes the PDF plugin to be loaded on PDF documents? I can't atm see how/where this happens.

@tilgovi
tilgovi Jul 15, 2015 Contributor

main.js

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/guest.coffee
@@ -211,21 +161,131 @@ module.exports = class Guest extends Annotator
this.removeEvents()
- setupAnnotation: ->
- annotation = super
- this.plugins.CrossFrame.sync([annotation])
- annotation
+ setupAnnotation: (annotation) ->
+ self = this
+ root = @element[0]
+
+ anchors = []
@nickstenning
nickstenning Jul 15, 2015 Member

A brief comment here explaining what these three containers are for ("what is a dead highlight? what is the difference between an anchor and an anchoredTarget?") would be good.

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/guest.coffee
@@ -211,21 +161,131 @@ module.exports = class Guest extends Annotator
this.removeEvents()
- setupAnnotation: ->
- annotation = super
- this.plugins.CrossFrame.sync([annotation])
- annotation
+ setupAnnotation: (annotation) ->
+ self = this
+ root = @element[0]
+
+ anchors = []
+ anchoredTargets = []
+ deadHighlights = []
+
+ annotation.target ?= []
+
+ locate = (target) ->
@nickstenning
nickstenning Jul 15, 2015 Member

"Docstring" style comments on these three functions.

@nickstenning nickstenning commented on the diff Jul 15, 2015
h/static/scripts/annotator/guest.coffee
annotation
createHighlight: ->
- annotation = $highlight: true
- this.publish 'beforeAnnotationCreated', [annotation]
- this.plugins.CrossFrame.sync([annotation])
- annotation
+ return this.createAnnotation({$highlight: true})
+
+ deleteAnnotation: (annotation) ->
+ anchors = []
+ targets = []
+ unhighlight = []
+
+ for anchor in @anchors
@nickstenning
nickstenning Jul 15, 2015 Member

I think Array.prototype.filter is supported for IE9 and up, so it might be clearer to use that here.

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/guest.coffee
+ top: rect.top + window.scrollY
+
+ return anchor
+
+ sync = (anchors) ->
+ annotation.$anchors = ({pos} for {pos} in anchors)
+ annotation.$orphan = anchors.length > 0
+ for anchor in anchors
+ if anchor.range?
+ annotation.$orphan = false
+
+ self.anchors = self.anchors.concat(anchors)
+ self.plugins.BucketBar?.update()
+ self.plugins.CrossFrame?.sync([annotation])
+
+ for anchor in self.anchors.splice(0, self.anchors.length)
@nickstenning
nickstenning Jul 15, 2015 Member

I think Array.prototype.filter is supported for IE9 and up, so it might be clearer to use that here.

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/guest.coffee
+ self.plugins.BucketBar?.update()
+ self.plugins.CrossFrame?.sync([annotation])
+
+ for anchor in self.anchors.splice(0, self.anchors.length)
+ if anchor.annotation is annotation
+ if anchor.range? and anchor.target in annotation.target
+ anchors.push(anchor)
+ anchoredTargets.push(anchor.target)
+ else if anchor.highlights?
+ deadHighlights.push(anchor.highlights)
+ delete anchor.highlights
+ delete anchor.range
+ else
+ self.anchors.push(anchor)
+
+ deadHighlights = Array::concat(deadHighlights...)
@nickstenning
nickstenning Jul 15, 2015 Member

Not sure I understand what the purpose of this line is...

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/guest.coffee
+ if anchor.annotation is annotation
+ if anchor.range? and anchor.target in annotation.target
+ anchors.push(anchor)
+ anchoredTargets.push(anchor.target)
+ else if anchor.highlights?
+ deadHighlights.push(anchor.highlights)
+ delete anchor.highlights
+ delete anchor.range
+ else
+ self.anchors.push(anchor)
+
+ deadHighlights = Array::concat(deadHighlights...)
+ raf -> highlighter.removeHighlights(deadHighlights)
+
+ for target in annotation.target when target not in anchoredTargets
+ anchor = locate(target).then(highlight)
@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/guest.coffee
+ createAnnotation: (annotation = {}) ->
+ self = this
+ root = @element[0]
+
+ ranges = @selectedRanges ? []
+ @selectedRanges = null
+
+ getSelectors = (range) ->
+ options = {
+ cache: self.anchoringCache
+ ignoreSelector: '[class^="annotator-"]'
+ }
+ return self.anchoring.describe(root, range, options)
+
+ setDocumentInfo = ({metadata, uri}) ->
+ annotation.uri = uri
@nickstenning
nickstenning Jul 15, 2015 Member

So you've removed one place where we set annotation.uri. We should probably be consistent, no?

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/guest.coffee
+ ignoreSelector: '[class^="annotator-"]'
+ }
+ return self.anchoring.describe(root, range, options)
+
+ setDocumentInfo = ({metadata, uri}) ->
+ annotation.uri = uri
+ if metadata?
+ annotation.document = metadata
+
+ setTargets = ([info, selectors]) ->
+ source = info.uri
+ annotation.target = ({source, selector} for selector in selectors)
+
+ info = this.getDocumentInfo().then(setDocumentInfo)
+ selectors = Promise.all(ranges.map(getSelectors))
+ targets = Promise.all([info, selectors]).then(setTargets)
@nickstenning
nickstenning Jul 15, 2015 Member

Does this work? info is the return value of a .then which doesn't return anything... or rather returns two different things depending on whether there's any metadata. Either an explicit return in setDocumentInfo or simply doing the .then(setDocumentInfo) would make this clearer.

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/guest.coffee
+ return self.anchoring.describe(root, range, options)
+
+ setDocumentInfo = ({metadata, uri}) ->
+ annotation.uri = uri
+ if metadata?
+ annotation.document = metadata
+
+ setTargets = ([info, selectors]) ->
+ source = info.uri
+ annotation.target = ({source, selector} for selector in selectors)
+
+ info = this.getDocumentInfo().then(setDocumentInfo)
+ selectors = Promise.all(ranges.map(getSelectors))
+ targets = Promise.all([info, selectors]).then(setTargets)
+
+ targets.then(=> this.setupAnnotation(annotation))
@nickstenning
nickstenning Jul 15, 2015 Member

You've got self in scope here. Might as well use it...

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/highlighter.coffee
+# cssClass - A CSS class to use for the highlight (default: 'annotator-hl')
+#
+# Returns an array of highlight Elements.
+exports.highlightRange = (normedRange, cssClass='annotator-hl') ->
+ white = /^\s*$/
+
+ hl = $("<span class='#{cssClass}'></span>")
+
+ # Ignore text nodes that contain only whitespace characters. This prevents
+ # spans being injected between elements that can only contain a restricted
+ # subset of nodes such as table rows and lists. This does mean that there
+ # may be the odd abandoned whitespace node in a paragraph that is skipped
+ # but better than breaking table layouts.
+
+ nodes = $(normedRange.textNodes()).filter((i) -> not white.test @nodeValue)
+ r = nodes.wrap(hl).parent().show().toArray()
@nickstenning
nickstenning Jul 15, 2015 Member

An explicit return here would be clearer. And what is show() doing here?

@nickstenning nickstenning commented on an outdated diff Jul 15, 2015
h/static/scripts/annotator/plugin/pdf.coffee
- # Save the extracted content to our page information registery
- @pageInfo[pageIndex] =
- index: pageIndex
- content: content
+ update: ->
@nickstenning
nickstenning Jul 15, 2015 Member

A comment here about what exactly is being updated and when this function is called would be great.

@nickstenning
Member

Okay, there's obviously a lot here and I couldn't honestly say I've gone over it all with a fine-tooth comb, but I have looked over most of it now.

Most of my questions are basic comprehension issues and/or suggestions for inline commentary.

In general, it looks great. I'll give the branch a proper end-to-end test tomorrow. I don't see anything massive blocking merge, but I would like to get some of the more obscure parts of the code documented.

The only other reservation I have is about testing some of the functionality in html.coffee and pdf.coffee -- particularly the latter. Ultimately I am prepared to hold my tongue and save this for another day, but if the abstractions over PDF.js could make it into another module with collaboration tests against the PDF.js interface, that would make me double-extra-plus-happy.

@tilgovi
Contributor
tilgovi commented Jul 15, 2015

Handled everything except for the cleanup of anchoring/pdf closures. I'm leaving .uri (though it's set by createAnnotation rather than the formatter now) because it's a supported facet for the client-side search filtering. We can delete it when we clean up or remove that code.

@tilgovi tilgovi Upgrade PDF.js to f6a8110
This is the version in Firefox 39
5cb0501
@nickstenning nickstenning merged commit 47fa018 into master Jul 17, 2015

3 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage remained the same at 58.465%
Details
@nickstenning
Member

I'd still like to see pdf.coffee cleaned up and tested, but ๐ŸŽ‰ ๐ŸŽˆ ๐ŸŽ† โœจ on this work.

@nickstenning nickstenning deleted the anchoring-rewrite branch Jul 17, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment