Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.
Sign upopt-in way to save annotated pages to internet archive #84
Comments
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
chdorner
Dec 9, 2016
I couldn't find any documentation on that POST endpoint on their side, is that not publicly available?
Other than that it might be interesting to initiate the archiving server-side, one less thing for the client to care about and we can easily do it in a background worker.
chdorner
commented
Dec 9, 2016
|
I couldn't find any documentation on that POST endpoint on their side, is that not publicly available? Other than that it might be interesting to initiate the archiving server-side, one less thing for the client to care about and we can easily do it in a background worker. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
judell
Dec 9, 2016
Actually I misspoke in the screencast, it's simply (and from a REST purist standpoint, politically-incorrectly) a GET :-)
Here it is as a bookmarklet: javascript:location.href='http://web.archive.org/save/'+location.href
might be interesting to initiate the archiving server-side
Sure, that'd be even better. I only have access to the client so that's how I demoed.
A while back somebody did an academic study of the then-available set of public H annotations, and estimated the % of target pages at risk of vanishing from the web, I can't find that paper but it was -- as you might imagine -- a distressingly high %.
judell
commented
Dec 9, 2016
•
|
Actually I misspoke in the screencast, it's simply (and from a REST purist standpoint, politically-incorrectly) a GET :-) Here it is as a bookmarklet: javascript:location.href='http://web.archive.org/save/'+location.href
Sure, that'd be even better. I only have access to the client so that's how I demoed. A while back somebody did an academic study of the then-available set of public H annotations, and estimated the % of target pages at risk of vanishing from the web, I can't find that paper but it was -- as you might imagine -- a distressingly high %. |
ajpeddakotla
moved this from Discovery/Spec
to Icebox
in Feature Inbox
Mar 1, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
judell
commented
Mar 20, 2017
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
segdeha
Mar 20, 2017
Member
Looks extremely simple to implement (at least in a naive way)!
To truly help with durability, this should probably be on by default. Users are not likely to opt-in and I can't think of a good reason they would opt out?
A simple first cut would be to hit the composed URL with every annotation. Future work could have us be a better citizen and keep track of when we last submitted the URL and do it at most once per day.
|
Looks extremely simple to implement (at least in a naive way)! To truly help with durability, this should probably be on by default. Users are not likely to opt-in and I can't think of a good reason they would opt out? A simple first cut would be to hit the composed URL with every annotation. Future work could have us be a better citizen and keep track of when we last submitted the URL and do it at most once per day. |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
segdeha
Mar 20, 2017
Member
And, I suppose, we should be smart about it and only submit canonical URLs, correct?
|
And, I suppose, we should be smart about it and only submit canonical URLs, correct? |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
judell
Mar 20, 2017
"this should probably be on by default" Ideally, though we've wondered about whether/how to explain to a user that it's happening.
"I can't think of a good reason they would opt out?" Me neither but who knows how somebody would react who hadn't been aware? Maybe an in-app message that explains, the first few times, what's happening, with a pointer to the relevant setting?
"A simple first cut would be to hit the composed URL with every annotation" Yep, trivial.
"only submit canonical URLs, correct?" As we've recently determined, yes, that'd be a good idea!
judell
commented
Mar 20, 2017
•
|
"this should probably be on by default" Ideally, though we've wondered about whether/how to explain to a user that it's happening. "I can't think of a good reason they would opt out?" Me neither but who knows how somebody would react who hadn't been aware? Maybe an in-app message that explains, the first few times, what's happening, with a pointer to the relevant setting? "A simple first cut would be to hit the composed URL with every annotation" Yep, trivial. "only submit canonical URLs, correct?" As we've recently determined, yes, that'd be a good idea! |
This comment has been minimized.
Show comment
Hide comment
This comment has been minimized.
judell
Apr 10, 2017
One approach to phase 2 (recover/reanchor): http://jonudell.net/h/wayback-hypothesis-01.mp4
judell
commented
Apr 10, 2017
|
One approach to phase 2 (recover/reanchor): http://jonudell.net/h/wayback-hypothesis-01.mp4 |

judell commentedDec 9, 2016
Problem you are trying to address with this feature
Not so much a problem as an opportunity. We've long discussed, with folks at the Internet Archive, the idea that annotated pages could be saved to the archive, and that if an annotated page went 404 it could still be retrieved from there and used to anchor annotations.
Your solution
We can proceed in phases.
Phase 1: Save to archive.
Phase 2: Retrieve otherwise lost pages from archive.
Phase 1 alone has value to the user (knowing the targets of your annotations are preserved) and to the mission (helping to make the foundation of the web's annotation layer more durable and reliable).
And Phase 1 is quite simple to do:
http://jonudell.net/h/save-to-wayback.mp4
We've talked amongst ourselves, and with IA folks, about whether this could happen by default or would require user opt-in, I favor the latter, with a setting "Save Annotated Pages to Internet Archive" that defaults to "No" -- but of course that's a discussion to have.