Search for one URI #2348

Merged
merged 1 commit into from Jul 19, 2015

Projects

None yet

4 participants

@nickstenning
Member

Previously, when loading Hypothesis into a page, the client would make
one search request for each and every URI found on the page, including
those found by the document metadata scanners (in the Annotator
"Document" and "PDF" plugins.)

This is wasteful of network bandwidth, and moreover should not be
necessary in most cases due to the ability to expand to equivalent URIs
on the server side.

The component responsible for coordinating communication across page
frames is the "crossframe" service. This services maintains a registry
of page frames (previously called "providers") and their associated
metadata. This commit reduces the metadata stored to simply the page
URL, or in the case of PDF documents, the fingerprint URN.

@landscape-bot

Code Health
Code quality remained the same when pulling 77ff51b on hypothesis:search-for-one-uri into 561544e on hypothesis:master.

@nickstenning
Member

@tilgovi fancy reviewing this?

@tilgovi
Contributor
tilgovi commented Jul 9, 2015

If it pleases I'd like to wait until I merge anchoring-rewrite. Otherwise, that's going to get messy.

@tilgovi
Contributor
tilgovi commented Jul 9, 2015

I'll get a PR for it open today.

@nickstenning nickstenning Only make one search query per frame on page load
Previously, when loading Hypothesis into a page, the client would make
one search request for each and every URI found on the page, including
those found by the document metadata scanners (in the Annotator
"Document" and "PDF" plugins.)

This is wasteful of network bandwidth, and moreover should not be
necessary in most cases due to the ability to expand to equivalent URIs
on the server side.

The component responsible for coordinating communication across page
frames is the "crossframe" service. This services maintains a registry
of page frames (previously called "providers") and their associated
metadata. This commit reduces the metadata stored to simply the page
URL, or in the case of PDF documents, the fingerprint URN.

It's worth noting that this *does* change behaviour in one important
way: if a page contains document equivalence data that we've never seen
before, and one of the alternate URIs has already been annotated, then
this change will mean we do not load annotations until that page itself
has been annotated (and thus the document equivalence data updated). I
am assuming that this is an extreme edge case.
4f75515
@nickstenning
Member

Rebased.

@landscape-bot

Code Health
Code quality remained the same when pulling 4f75515 on hypothesis:search-for-one-uri into 47fa018 on hypothesis:master.

@tilgovi
Contributor
tilgovi commented Jul 17, 2015

This is fine, but it will cause the first person to visit a new URL to not see any annotations. I'm not sure that's acceptable.

For instance, if one person annotates a page heavily and shares it with a friend, that friend might show up and find themselves redirected to the mobile version and see no annotations.

@tilgovi
Contributor
tilgovi commented Jul 17, 2015

Would love @judell thoughts here.

What's changing is that we won't be returning all the annotations on all the aliases for a page until an annotation is saved that adds the alias to the server-side database.

@tilgovi
Contributor
tilgovi commented Jul 17, 2015

I might prefer to see us send multiple URIs in a post body or something rather than just send one URI.

@nickstenning
Member

So in the long run, I'm imagining that the submission of document metadata will simply be orthogonal. When you make a search query, a prior request will have submitted the metadata for the page you're on.

If we want to hold off on breaking this edge case until then, we'll need a way to submit OR queries for multiple URLs, which we don't currently have. I'm not overly keen to change the public API in this way without some careful thought.

@nickstenning
Member

And just so we're clear, it doesn't necessarily mean the first person visiting that page sees no annotations, it means the first person visiting a page for which we have no equivalence metadata, but where equivalence metadata exists on that page sees no annotations.

So in the example you gave, I'm assuming it's pretty likely the original annotation(s) will have carried equivalence metadata that referenced the mobile site, and thus it will all work just fine.

The clear case is where annotations are first made on a PDF representation -- there the only metadata we can retrieve is the current URL and the PDF fingerprint -- so people who visit the HTML representation (even if that HTML representation contains a link to the PDF) will be missing out.

I'm sure you're aware of all this -- just adding it here for clarity for other readers.

@tilgovi
Contributor
tilgovi commented Jul 17, 2015

Is it typical to have, say, rel=alternate on a site that points at the
mobile version? I didn't think so. Maybe I'm wrong.

On Fri, Jul 17, 2015, 12:06 Nick Stenning notifications@github.com wrote:

And just so we're clear, it doesn't necessarily mean the first person
visiting that page sees no annotations, it means the first person visiting
a page for which we have no equivalence metadata, but where equivalence
metadata exists on that page
sees no annotations.

So in the example you gave, I'm assuming it's pretty likely the original
annotation(s) will have carried equivalence metadata that referenced the
mobile site, and thus it will all work just fine.

The clear case is where annotations are first made on a PDF representation
-- there the only metadata we can retrieve is the current URL and the
PDF fingerprint -- so people who visit the HTML representation (even if
that HTML representation contains a link to the PDF) will be missing out.

I'm sure you're aware of all this -- just adding it here for clarity for
other readers.


Reply to this email directly or view it on GitHub
#2348 (comment).

@judell
Contributor
judell commented Jul 17, 2015

"So in the long run, I'm imagining that the submission of document metadata will simply be orthogonal. When you make a search query, a prior request will have submitted the metadata for the page you're on."

My understanding also.

"It means the first person visiting a page for which we have no equivalence metadata, but where equivalence metadata exists on that page sees no annotations."

Got it (I think), thanks.

"It's pretty likely the original annotation(s) will have carried equivalence metadata that referenced the mobile site"

Checking my understanding: that equivalence metadata was gathered server-side when the original annotation arrived, correct? (Or, perhaps, injected by manual or automated mechanisms as yet undefined.)

@tilgovi
Contributor
tilgovi commented Jul 19, 2015

I'm gonna merge this optimistically. Sites that are serious about SEO should have a rel=alternate to the mobile version just as they have a rel=canonical back to a canonical version.

We'll see how it goes.

@tilgovi tilgovi merged commit 31f8b3e into master Jul 19, 2015

3 checks passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
continuous-integration/travis-ci/push The Travis CI build passed
Details
coverage/coveralls Coverage remained the same at 58.465%
Details
@tilgovi tilgovi deleted the search-for-one-uri branch Jul 19, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment