Skip to content
A simple downloader client for the Wayback Machine
Python
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore
README.md
wayback_downloader.py

README.md

wayback-downloader

A simple downloader client for the Wayback Machine written in Python.

Usage:
	python <script.py> {--help|-h}
	python <script.py> [--threads <threads>] [--matchType {exact|prefix|host|domain}] [--from <timestamp>] [--to <timestamp>] [--limit <snapshots>] [--dry] <url>

Options:
	--help, -h		Display this help message and exit

	--threads, -T	Number of downloading threads (default: 10)

	--matchType, -m	What results will be downloaded based on <url>
		exact		Download results matching exactly <url>
		prefix		Download results under the path <url>
		host		Download results from host of <url>
		domain		Download results from host of <url> and all subhosts of <url>

	--from, -f		Download results that were captured after this timestamp
	--to, -t		Download results that were captured before this timestamp
		Both <from> and <to> must be a prefix of "yyyyMMddhhmmss"

	--limit, -l		Download at most <snapshots> snapshots

	--dry, -d		List items to be downloaded without downloading them

Example:
	Use the following command:
		python <script.py> --matchType prefix --from 2010 --to 201606 --limit 1000 example.org
	To download at most 1000 abarity pages under example.org between the year of 2010 and the month of June 2016 (inclusive).

For more information, see: https://github.com/internetarchive/wayback/blob/master/wayback-cdx-server/README.md
You can’t perform that action at this time.