Skip to content

Archive a list of URLs using the Wayback Machine

License

Notifications You must be signed in to change notification settings

rybesh/capture-urls

Repository files navigation

Archive a list of URLs using the Wayback Machine

** You need Python 3.10 or later to run this script. **

This script uses the Save Page Now 2 Public API.

To use it:

  1. Clone or download and unzip this repository.

  2. Install the required Python libraries. Assuming you cloned or unzipped this repository to the directory path/to/capture-urls/:

    cd path/to/capture-urls/
    make
    
  3. Go to https://archive.org/account/s3.php and get your S3-like API keys.

  4. In path/to/capture-urls/, create a file called secret.py with the following contents:

    ACCESS_KEY = 'your access key'
    SECRET_KEY = 'your secret key'

    (Use the actual values of your access key and secret key, not your access key and your secret key.)

  5. Optionally edit config.py to your liking.

  6. Archive your URLs:

    cat urls.txt | ./capture-urls.py > archived-urls.txt
    

    urls.txt should contain a list of URLs to be archived, one on each line.

  7. Archiving URLs can take a long time. You can interrupt the process with Ctrl-C. This will create a file called progress.json with the state of the archiving process so far. If you start the process again, it will pick up where it left off. You can add new URLs to urls.txt before you restart the process.

  8. When it finishes running you should have a list of the archived URLs in archived-urls.txt.