Skip to content

ikreymer/pywb-ia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pywb IA Tools

This project contains the setup for running pywb web archive replay system with the Internet Archive web archives.

It is still in experimental/alpha phase and should only be used for testing replay only.

Installation

pip install -r requirements.txt which includes installing latest pywb and uwsgi and gevent.

Run with uwsgi uwsgi.ini (The current default is gevent+uwsgi but feel free to modify uwsgi.ini as needed)

Available Tools

Alternate Wayback Machine Replay /web/

  • /web/ -> replays from https://web.archive.org/web/

For example, http://localhost:8080/web/20111231161728//example.com/ will replay equivalent content from http://web.archive.org/web/20111231161728/http://www.iana.org/domains/example/ using pywb replay system.

Archive-It Service Replay /ait/

  • /ait/ -> replays from http://wayback.archive-it.org/<COLLID>
  • /ait/all/ -> replays from http://wayback.archive-it.org/all/

<COLLID> corresponds to a collection from the http://archive-it.org/ service.

Single Item Replay /item/

  • /item/<ITEMNAME> -> replays from WARC files stored under http://archive.org/details/<ITEMNAME>

For any public ITEMNAME that has a cdx files, replay content from that item only. This will download the item .idx file locally on first use, and access the .cdx.gz and WARC remotely. The item's .idx, .cdx.gz and WARC files must be accessible for this to work.

About

pywb setup for Internet Archive web archives

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published