-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added functionality for using savepagenow with authentication #45
Conversation
Love patch. Here's my picky list of picky stuff. If we get this stuff in I'm ready to merge.
|
… for sphinx napoleon for Python 3.10
…ith -a and keys that are invalid
I've added the unit test(which will only pass when savepagenow is repackaged, because it is being imported as a library into tests and doesn't have access to the new method yet), changed the user agent back to savepagenow, added documentation (had to change to a new version of sphinx napoleon in order to do so and this resulted in a lot of files 'changed', added custom error messages and ran black and pylint. should be ready for review. |
I made a few tweaks and merged this in. Mainly I'd like to have more specific env variable names. The rest of my changes are all gloss. Can you point me to where you sourced the 4 vs 12 request limit facts? |
https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit |
Thanks. We should be out as version 1.3.0. Give it a try. Thanks again. |
Works great! Thanks again. |
The document has been out of date for a while. It seems they didn't update the document to reflect it (it was around May that it occurred), but they changed the limit for authenticated users to 6 per minute, and for anonymous users to 3. |
@overcast07 I'm not sure where you got those numbers. I've been in direct communication with the Internet Archive folks. |
I created and frequently use a Bash script that can submit a list of URLs to Save Page Now, both with and without authentication. I haven't been in contact with the Internet Archive about it (I just didn't have much of a reason to) and they have never tried to contact me. In my testing, it has been impossible to submit more than 6 URLs per minute for several months. The script submits URLs as frequently as every 3 seconds, and has done this for about 2 years, so it was quite noticeable when there suddenly started being a long gap between successful URL submissions after every 6th URL. Previously, the actual limit was probably 12 URLs, but it wasn't calculated in the same way until earlier this year (you could submit more than 12 URLs per minute by submitting them rapidly before the first one started processing), and shortly after they fixed this the limit was reduced to 6. The website provides an endpoint (https://web.archive.org/save/status/user) which tells you if you don't have any slots left to use. The Bash script (since May 2023) uses the data that when authenticated to check if the site will return the "You have already reached the limit of active Save Page Now sessions" message for the next URL submitted, to avoid repeatedly receiving that error message. |
Was originally alerted to this by user overcast07 in this thread: palewire#45 Confirmed with the Internet Archive team the new rate limit
Hey @overcast07, I've contacted the Internet Archive team and just want you to know that you are correct. I've made a new PR to update the documentation for savepagenow: #48 |
This PR adds the ability to use authentication to do wayback saves. The user needs to create local environment variables 'secret' which has their S3 secret key from the Internet Archive and 'access_key' which has their access key from the Internet archive as described in the Wayback API spec here: https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit?pli=1
They are optional, so it falls back to default unauthenticated saves