Please note: this project is no longer being actively developed.
WebArchivePlayer is a new desktop tool which provides a simple point-and-click wrapper for viewing any web archive file (in WARC and ARC format).
To create a web archive (WARC) file of your own, you can use the free https://webrecorder.io/ service to browse any page and then download the recorded WARC file.
The player allows users to pick one or more ARC/WARC from their local machine and browse the contents from any browser. No internet connection is necessary in order to browse the archive.
Usage (Windows and OS X Apps)
- Download the latest version:
Double click to open. (For OS X, open the .dmg file to mount the volume and extract the player). You may have to agree to allow open files from the internet, and to allow making internet connections (windows only). This still new software and other distribution methods may be added in the future.
A file dialog will show up. Browse to an existing WARC or ARC file(s).
You can use https://webrecorder.io to record pages as you browse and then download the WARC file.
A browser will open to http://localhost:8090/ listing all the pages in the archive.
Click on any page listed to view the replay. Or, enter a url to search the full archive.
To exit, simply close the WebArchivePlayer window.
Usage for All Platforms -- Running from python source
Currently, executable versions are available only for OS X and Windows.
However, the player should work on any system that has Python 2.7.x, but requires a little bit more setup.
On other systems (or to build from source):
Clone this repo:
git clone https://github.com/ikreymer/webarchiveplayer.git; cd webarchiveplayer
Install by running
python setup.py install(optionally using a virtualenv)
If a W/ARC file argument is omitted, the player will attempt to start in GUI mode and show a File Open dialog.
However, in order to run in GUI mode, the wxPython toolkit will also need to be installed seperately.
Refer to instructions at wxPython Download page for your platform.
wxPython and virtualenv
wxPython does not by default work in virtualenv. The simplest way to make it work is to symlink the system
wxredirect.pth to the virtualenv site-packages directory. For example, on OS X, if you've installed `virtualenv [myenv]
ln -s /Library/Python/2.7/site-packages/wxredirect.pth [myenv]/lib/python2.7/site-packages/wxredirect.pth
If a W/ARC file argument is passed to the player, eg:
The player will select that file and skip the File Open dialog. Installation of wxPython is not required when specifiyng the WARC explicitly via command line.
The OS X and Windows applications also support specifying the file via command line.
Custom Preset Archive Mode
In addition to opening files, WebArchivePlayer can now also be used to provide a point-and-click launcher for any pywb archive.
config.yaml file is present in the working directory (same directory as WebArchivePlayer), the specified configuration will be loaded
instead of a file prompt.
This can be used to distribute specific archives together with WebArchivePlayer.
Certain aspects of the player can also be modified in the
config.yaml, including changing the contents
from 'Web Archive Player' to any custom title and HTML page.
webarchiveplayer: # initial page to load on start-up # eg: http://localhost:8090/my_coll/http://example.com/ start_url: my_coll/http://example.com/ # set initial width of player window width: 400 # set initial height of player window height: 250 # set window title title: My Archive # Load custom contents from local HTML desc_html: ./desc.html # Auto-load WARCs from specified directory (supported from 1.4.6) auto_load_dir: ./warcs/
For example, one could distribute a WARC together with the player and provide a custom setup. This includes automatically indexing WARCs on load to allow quick drop in, or configuring a multi-collection archive.
With version 1.4.6, webarchiveplayer supports indexing WARCs automatically from a designated directory. Archive files are indexed on each load to allow for dropping or updating the files more easily.
To setup, all that's needed is a
config.yaml with the following:
webarchiveplayer: auto_load_dir: ./warcs title: 'My Archive' desc_html: ./desc_page.html
If WebArchivePlayer is placed in the same directory as the
the player will automatically load and index all WARC/ARC files found in this directory.
warcs may also be placed in an
This allows for an archive to be more easily transported (eg. as a tar-ball or zip file).
The last two params allow for customizing the WebArchivePlayer window.
title param specifies the window title, while the
desc_html param specifies
the contents of the WebArchivePlayer window.
Create multi-collection archive
The following steps describe creating static archive with preset collections and indexed archive files:
Create new directory
my_archiveand switch to it.
Copy the WebArchivePlayer application to
wb-manager init my_coll
wb-manager add my_coll <path/to/warc>
my_archive, perhaps with
webarchiveplayer: start_url: my_coll/http://example.com/ title: My Archive Demo
Now, when WebArchivePlayer is started in
my_archive, it will use the WARC in
http://localhost:8090/my_coll/http://example.com/as the starting URL.
my_archivedir can be distributed as a standlone archive and player.
Building GUI Binaries
The binaries can be built by running the builds scripts from the
Note: wxPython must be installed for this to work. If running in virtualenv, follow instructions above. The install script will not run if it can't find wxPython
OS X: (output written to
cd app ./build-osx.sh
Windows: (output copied to
cd app build-windows.bat
Ensure config file as desc HTML are read as utf-8
Update to pywb 0.33.1
auto_load_dir option in
archive/config.yaml) which specifies a directory
from which to automatically load WARCs on startup.
Update to pywb 0.32.1
Support Webrecorder collection WARCs, read pages/bookmarks from all
Update to pywb 0.30.1 Support reading of WARC files with non-HTTP response records (which are skipped).
Build using Python 3 and pywb 0.30.0, using latest pyinstaller
page detect: re-enable reading pagelist from
json-metadata if present in WARC
Support multiple instances by picking a random port if 8090 is not available Ensure HTML 'resource' records are included in page list Display error dialog before quitting if unable to read and index WARC/ARCs. Switch to pywb 0.11.1, many improvements in indexing and replay
Custom preset archive support with custom
Use HTML for main window rendering
Switch to pywb 0.10.9.1 for more rewriting improvements
Update to pywb 0.10.8, rewriting improvements, add pywb version display
Update to pywb 0.10.6, significant replay improvements
Fix issue where page listing only lists pages for one WARC/ARC when multiple are selected. Build scripts check for wxPython installation.
Update to use latest pywb release (0.8.3)
Support opening multiple WARC/ARC files at once. Also fix issue with opening files with spaces in filename.
How it Works
WebArchivePlayer is a simple wrapper over the pywb web archiving tools using pyinstaller to create a standalone, GUI wrapper. The wxPython toolkit is used to provide the GUI. The wrapper starts a local server which serves content from the selected web archive, using pywb to handle the rest.
Consult the pywb documentation for more info on web archive replay.
Questions / Issues
Please feel free to open an issue on this page for any problems / questions / concerns regarding this tool. This is a brand new software, so feedback is encouraged.
Another project, which in part inspired WebArchivePlayer, is Mat Kelly's excellent WAIL project, which provides a GUI for different web crawling and replay systems.