New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

opt-in way to save annotated pages to internet archive #84

Open
judell opened this Issue Dec 9, 2016 · 7 comments

Comments

Projects
3 participants
@judell

judell commented Dec 9, 2016

Problem you are trying to address with this feature

Not so much a problem as an opportunity. We've long discussed, with folks at the Internet Archive, the idea that annotated pages could be saved to the archive, and that if an annotated page went 404 it could still be retrieved from there and used to anchor annotations.

Your solution

We can proceed in phases.

Phase 1: Save to archive.

Phase 2: Retrieve otherwise lost pages from archive.

Phase 1 alone has value to the user (knowing the targets of your annotations are preserved) and to the mission (helping to make the foundation of the web's annotation layer more durable and reliable).

And Phase 1 is quite simple to do:

http://jonudell.net/h/save-to-wayback.mp4

We've talked amongst ourselves, and with IA folks, about whether this could happen by default or would require user opt-in, I favor the latter, with a setting "Save Annotated Pages to Internet Archive" that defaults to "No" -- but of course that's a discussion to have.

@chdorner

This comment has been minimized.

Show comment
Hide comment
@chdorner

chdorner Dec 9, 2016

I couldn't find any documentation on that POST endpoint on their side, is that not publicly available?

Other than that it might be interesting to initiate the archiving server-side, one less thing for the client to care about and we can easily do it in a background worker.

chdorner commented Dec 9, 2016

I couldn't find any documentation on that POST endpoint on their side, is that not publicly available?

Other than that it might be interesting to initiate the archiving server-side, one less thing for the client to care about and we can easily do it in a background worker.

@judell

This comment has been minimized.

Show comment
Hide comment
@judell

judell Dec 9, 2016

Actually I misspoke in the screencast, it's simply (and from a REST purist standpoint, politically-incorrectly) a GET :-)

Here it is as a bookmarklet: javascript:location.href='http://web.archive.org/save/'+location.href

might be interesting to initiate the archiving server-side

Sure, that'd be even better. I only have access to the client so that's how I demoed.

A while back somebody did an academic study of the then-available set of public H annotations, and estimated the % of target pages at risk of vanishing from the web, I can't find that paper but it was -- as you might imagine -- a distressingly high %.

judell commented Dec 9, 2016

Actually I misspoke in the screencast, it's simply (and from a REST purist standpoint, politically-incorrectly) a GET :-)

Here it is as a bookmarklet: javascript:location.href='http://web.archive.org/save/'+location.href

might be interesting to initiate the archiving server-side

Sure, that'd be even better. I only have access to the client so that's how I demoed.

A while back somebody did an academic study of the then-available set of public H annotations, and estimated the % of target pages at risk of vanishing from the web, I can't find that paper but it was -- as you might imagine -- a distressingly high %.

@ajpeddakotla ajpeddakotla moved this from Discovery/Spec to Icebox in Feature Inbox Mar 1, 2017

@judell

This comment has been minimized.

Show comment
Hide comment
@judell

judell Mar 20, 2017

Updating this to show the Save to Wayback option that's now in the investigative toolkit.

image

judell commented Mar 20, 2017

Updating this to show the Save to Wayback option that's now in the investigative toolkit.

image

@segdeha

This comment has been minimized.

Show comment
Hide comment
@segdeha

segdeha Mar 20, 2017

Member

Looks extremely simple to implement (at least in a naive way)!

To truly help with durability, this should probably be on by default. Users are not likely to opt-in and I can't think of a good reason they would opt out?

A simple first cut would be to hit the composed URL with every annotation. Future work could have us be a better citizen and keep track of when we last submitted the URL and do it at most once per day.

Member

segdeha commented Mar 20, 2017

Looks extremely simple to implement (at least in a naive way)!

To truly help with durability, this should probably be on by default. Users are not likely to opt-in and I can't think of a good reason they would opt out?

A simple first cut would be to hit the composed URL with every annotation. Future work could have us be a better citizen and keep track of when we last submitted the URL and do it at most once per day.

@segdeha

This comment has been minimized.

Show comment
Hide comment
@segdeha

segdeha Mar 20, 2017

Member

And, I suppose, we should be smart about it and only submit canonical URLs, correct?

Member

segdeha commented Mar 20, 2017

And, I suppose, we should be smart about it and only submit canonical URLs, correct?

@judell

This comment has been minimized.

Show comment
Hide comment
@judell

judell Mar 20, 2017

"this should probably be on by default" Ideally, though we've wondered about whether/how to explain to a user that it's happening.

"I can't think of a good reason they would opt out?" Me neither but who knows how somebody would react who hadn't been aware? Maybe an in-app message that explains, the first few times, what's happening, with a pointer to the relevant setting?

"A simple first cut would be to hit the composed URL with every annotation" Yep, trivial.

"only submit canonical URLs, correct?" As we've recently determined, yes, that'd be a good idea!

judell commented Mar 20, 2017

"this should probably be on by default" Ideally, though we've wondered about whether/how to explain to a user that it's happening.

"I can't think of a good reason they would opt out?" Me neither but who knows how somebody would react who hadn't been aware? Maybe an in-app message that explains, the first few times, what's happening, with a pointer to the relevant setting?

"A simple first cut would be to hit the composed URL with every annotation" Yep, trivial.

"only submit canonical URLs, correct?" As we've recently determined, yes, that'd be a good idea!

@judell

This comment has been minimized.

Show comment
Hide comment
@judell

judell Apr 10, 2017

One approach to phase 2 (recover/reanchor): http://jonudell.net/h/wayback-hypothesis-01.mp4

judell commented Apr 10, 2017

One approach to phase 2 (recover/reanchor): http://jonudell.net/h/wayback-hypothesis-01.mp4

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment