-
Notifications
You must be signed in to change notification settings - Fork 532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Changing Content Problem #115
Comments
One case that's not covered here is when there are some sucesses and some failures. I guess showing the successes and also showing a link to the most recent stored, annotated page would work, but there could be lots of hidden annotations - it could span back much further than one revision? |
A fair point if you interpreted what I wrote as being above as being an "all-or-nothing" attempt for all the annotations on the page, but I think it's pretty clear that the problem you describe goes away if you treat this as a per-annotation algorithm. In fact, there's a pretty fundamental problem with what I've described above, namely that the Timegate returns a link to the Memento in the form of an HTTP So, this suggests a required but much nicer UI: if any of the annotations in the page completely fail to load, we display a notification somewhere saying "Some annotations couldn't be loaded on this page, because it's changed since they were added. [Load this page in the history viewer] to see them". That would be a link to a MementoFox/Etherpad-style time-slider viewer hosted on AnnotateIt. This solves a number of problems at once, such as how to reinstantiate the annotator in the Memento, how to know which URL to search the annotator store for, and so on. |
This was my contribution to the debate (on-list): http://lists.okfn.org/pipermail/annotator-dev/2012-March/000263.html |
Some links of interest from the Annotator community call:
|
Noting another heuristic approach to this taken by @donohoe for Emphasis.js. See the blog post for details, but the principle is to identify paragraphs of text in a page by the first letters of the first N words of the first and last sentences of a paragraph. So this paragraph would be It occurs to me that extending this idea, by doing a Levenshtein comparison of the set of first letters of all the words in a sentence between paragraphs, is massively cheaper than doing full Levenshtein and probably works pretty well. What I really want to avoid is something which has obvious pathological cases. For example, Emphasis will fail if the first or last sentences are removed/prepended/appended to. |
@nickstenning I note I mentioned the matching of sections of text in the list post I linked above: " the other option i know of here is to do hashing of small string sections of the document to generate your identifiers". Agree that you will want to extend to be more subtle. |
No, it will not fail. It splits the Key and checks against the First and Last sentences. In some cases the Paragraph has leading sentence removed but last sentence remains the same (and similar variations of modification). Emphasis, within a level of tolerance, finds the best match and takes that as the link. From the blog post:
|
Sorry, my point was not to "diss" Emphasis, but to point out that there are special cases in which it has no option but to fail. If you remove the first and last sentence of a paragraph, there is no possible way this algorithm can recover. My suggestion was simply that the constraints you have (a need for short keys that will fit nicely in URLs) is not one we share, and so we could employ more robust strategies such as first-letter-of-every-word-in-annotation. |
I didn't take it that way! Sorry if I cam across as such. You are correct - if both first and last sentences were removed the link will fail. However I would argue that is a good thing. If you were linking to a chunk of text that changed significantly then it has probably lost its original intent and your link should not be valid any more. |
Sorry if it's already something discussed that or if I'm missing something but why don't you just require the caller to provide the information you miss like the new location ? |
This work of research might be helpful for solving this problem: Robust Intra-document Locations |
Got a wiki page over here with lots of links to more: https://github.com/hypothesis/h/wiki/robust-anchors I'd like to see more discussion of where Annotator could change to support extensible anchoring methods. |
From this point forward, I'm going to be keeping the Annotator issue tracker for bug reports only. Enhancements and feature requests should be made on the mailing list. As this is a feature request, I'm going to close this issue. If you feel that I've miscategorised the discussion, and there is a genuine unaddressed bug, feel free to reopen with an explanation. |
So, I've been thinking about this, and it really can't be that hard to get an 80% case working. As I see it, there are two scenarios we have to worry about.
quote
field, and if it found an exact match, it would update the annotation's ranges and save it back to the server.Accept-Datetime
header set to the value of the annotationupdated
field, and if we get a nonzero list of links back we can pop up a message to the user along the lines of "hey, this page has changed since some annotations were made, [click here] to see them and links to historical versions of this page."Can someone tell me if I've missed anything obvious?
Really, all we need is a plugin to encapsulate the dumb heuristics side of this (which would potentially include flagging the annotations in the UI as "I tried to automatically reposition myself" and only actually saving them if a human confirms that they make sense) and a plugin to talk to a timegate and display appropriate UI.
The text was updated successfully, but these errors were encountered: