Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

see if a URL is available in a web archive somewhere on the web

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 webarchives
Octocat-spinner-32 .gitignore
Octocat-spinner-32 README.md
Octocat-spinner-32 setup.py
Octocat-spinner-32 test.py
README.md

webarchives

webarchives is a Python module for easily determining if a given URL is available from a known Web archiving project. The idea is that it could be handy in situations where you have a URL, but the URL no longer resolves, and you would like to see the content. Web archiving projects are being run by national libraries, archives and non-profits that are part of the International Internet Preservation Consortium.

The genesis for webarchives was work done by the Memento Project on the Memento Proxy which provided the seed for the scraping backend modules used by webarchives.

Usage

The webarchives module provides a function lookup, which you pass a url that you want to lookup in the Web archives. lookup will return a list of (time, url) tuples. Each tuple represents when the requested url was archived and where the archived representation can be retrieved from.

import webarchives

print webarchives.lookup("http://www.geocities.com/homestead/homedir.html")
Something went wrong with that request. Please try again.