Added functionality for using savepagenow with authentication #45

duckduckgrayduck · 2023-06-30T02:37:29Z

This PR adds the ability to use authentication to do wayback saves. The user needs to create local environment variables 'secret' which has their S3 secret key from the Internet Archive and 'access_key' which has their access key from the Internet archive as described in the Wayback API spec here: https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit?pli=1

They are optional, so it falls back to default unauthenticated saves

savepagenow/api.py

palewire · 2023-06-30T17:13:05Z

Love patch. Here's my picky list of picky stuff. If we get this stuff in I'm ready to merge.

Lets get a unit test that does the auth. Please pull from env variables. I can add a login for myself to link with the GitHub Account so it can run in the cloud
Lets add a little snippet to the documentation explain how to do this and updating whatever would be outdated
Add a custom exception for when a bad user name or password is provided

… for sphinx napoleon for Python 3.10

…ith -a and keys that are invalid

duckduckgrayduck · 2023-06-30T21:05:40Z

I've added the unit test(which will only pass when savepagenow is repackaged, because it is being imported as a library into tests and doesn't have access to the new method yet), changed the user agent back to savepagenow, added documentation (had to change to a new version of sphinx napoleon in order to do so and this resulted in a lot of files 'changed', added custom error messages and ran black and pylint. should be ready for review.

palewire · 2023-07-01T13:12:47Z

I made a few tweaks and merged this in. Mainly I'd like to have more specific env variable names. The rest of my changes are all gloss.

Can you point me to where you sourced the 4 vs 12 request limit facts?

duckduckgrayduck · 2023-07-01T13:51:46Z

https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit
Page 8
Max captures per minute for authenticated users = 12 and for anonymous users = 4.

palewire · 2023-07-01T14:41:45Z

Thanks. We should be out as version 1.3.0. Give it a try. Thanks again.

duckduckgrayduck · 2023-07-01T23:52:51Z

Works great! Thanks again.

overcast07 · 2023-09-26T23:09:16Z

https://docs.google.com/document/d/1Nsv52MvSjbLb2PCpHlat0gkzw0EvtSgpKHu4mk0MnrA/edit Page 8 Max captures per minute for authenticated users = 12 and for anonymous users = 4.

The document has been out of date for a while. It seems they didn't update the document to reflect it (it was around May that it occurred), but they changed the limit for authenticated users to 6 per minute, and for anonymous users to 3.

duckduckgrayduck · 2023-09-27T09:11:38Z

@overcast07 I'm not sure where you got those numbers. I've been in direct communication with the Internet Archive folks.

overcast07 · 2023-09-27T11:48:15Z

@overcast07 I'm not sure where you got those numbers. I've been in direct communication with the Internet Archive folks.

I created and frequently use a Bash script that can submit a list of URLs to Save Page Now, both with and without authentication. I haven't been in contact with the Internet Archive about it (I just didn't have much of a reason to) and they have never tried to contact me.

In my testing, it has been impossible to submit more than 6 URLs per minute for several months. The script submits URLs as frequently as every 3 seconds, and has done this for about 2 years, so it was quite noticeable when there suddenly started being a long gap between successful URL submissions after every 6th URL. Previously, the actual limit was probably 12 URLs, but it wasn't calculated in the same way until earlier this year (you could submit more than 12 URLs per minute by submitting them rapidly before the first one started processing), and shortly after they fixed this the limit was reduced to 6.

The website provides an endpoint (https://web.archive.org/save/status/user) which tells you if you don't have any slots left to use. The Bash script (since May 2023) uses the data that when authenticated to check if the site will return the "You have already reached the limit of active Save Page Now sessions" message for the next URL submitted, to avoid repeatedly receiving that error message.

Was originally alerted to this by user overcast07 in this thread: palewire#45 Confirmed with the Internet Archive team the new rate limit

duckduckgrayduck · 2023-09-29T11:55:46Z

Hey @overcast07, I've contacted the Internet Archive team and just want you to know that you are correct. I've made a new PR to update the documentation for savepagenow: #48

Added functionality for using savepagenow with authentication

89730c2

duckduckgrayduck requested a review from palewire as a code owner June 30, 2023 02:37

palewire reviewed Jun 30, 2023

View reviewed changes

savepagenow/api.py Outdated Show resolved Hide resolved

duckduckgrayduck added 6 commits June 30, 2023 13:40

Change user agent back to savepagenow

60392b5

Add unit test for capture with authentication flag

72f0e36

Added documentation for capture with authentication and a deprecation…

d13126c

… for sphinx napoleon for Python 3.10

Ran pylint and black

5b0bd0a

Added custom error message for both missing keys when trying to run w…

656ec14

…ith -a and keys that are invalid

Ran pylint and black

df4d02f

Fixed a relative import for Python 3.10

3f0b4af

palewire merged commit 73652ed into palewire:main Jul 1, 2023

duckduckgrayduck mentioned this pull request Sep 29, 2023

Update documentation to reflect new rate limit for requests #48

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added functionality for using savepagenow with authentication #45

Added functionality for using savepagenow with authentication #45

duckduckgrayduck commented Jun 30, 2023

palewire commented Jun 30, 2023 •

edited

Loading

duckduckgrayduck commented Jun 30, 2023

palewire commented Jul 1, 2023

duckduckgrayduck commented Jul 1, 2023

palewire commented Jul 1, 2023

duckduckgrayduck commented Jul 1, 2023

overcast07 commented Sep 26, 2023

duckduckgrayduck commented Sep 27, 2023

overcast07 commented Sep 27, 2023 •

edited

Loading

duckduckgrayduck commented Sep 29, 2023

Added functionality for using savepagenow with authentication #45

Added functionality for using savepagenow with authentication #45

Conversation

duckduckgrayduck commented Jun 30, 2023

palewire commented Jun 30, 2023 • edited Loading

duckduckgrayduck commented Jun 30, 2023

palewire commented Jul 1, 2023

duckduckgrayduck commented Jul 1, 2023

palewire commented Jul 1, 2023

duckduckgrayduck commented Jul 1, 2023

overcast07 commented Sep 26, 2023

duckduckgrayduck commented Sep 27, 2023

overcast07 commented Sep 27, 2023 • edited Loading

duckduckgrayduck commented Sep 29, 2023

palewire commented Jun 30, 2023 •

edited

Loading

overcast07 commented Sep 27, 2023 •

edited

Loading