Download any website from the Internet Archive Wayback Machine.
PHP
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
Test
lib add support for index.php in the url path Oct 31, 2017
websites
.gitignore initial commit Nov 22, 2015
.travis.yml initial commit Nov 22, 2015
LICENSE
README.md
bootstrap.php initial commit Nov 22, 2015
downloader.php initial commit Nov 22, 2015
phpunit.xml initial commit Nov 22, 2015

README.md

WayBack Downloader

Build Status

Download any website from the Internet Archive Wayback Machine.

Installation

  1. Clone repo git clone https://github.com/pavelnovitsky/wayback-machine-download.git
  2. Setup write permissions on the "websites" folder

Basic Usage

Run WayBack Downloader with the base url of the website you want to retrieve as a parameter (e.g., http://example.com):

php downloader.php -h http://example.com

Downloaded files are saved to the websites/{domain}/* directory. For this example it will be websites/example.com/

Options

  • -h, --host — mandatory parameter, base url of the downloaded website
  • -t, --timestamp — optional parameter to set the earliest date of the Web Archive snapshots. WayBack Downloader won't download files added before the specified date. Timestamp format: yyyyMMddhhmmss

Examples

http://web.archive.org/web/20060716231334/http://example.com

php downloader.php -h http://example.com

php downloader.php --host=http://example.com

php downloader.php -h http://example.com -t 20060716231334

php downloader.php --host=http://example.com --timestamp=20060716231334

TODO

  • Add full test coverage
  • Add separated timestamp options "from" and "to"
  • Add optional url filter (ex.: only directory, *.jpg, etc)
  • Add results limiting
  • Access Control support

Resources used

Wayback CDX Server API

Contributing

You are welcome to contribute with pull requests

Bug tracking

WayBack Downloader uses GitHub issues. If you have found bug, please create an issue.

License

This library is released under the terms of the MIT License.