webarchives

webarchives is a Python module for easily determining if a given URL is available from a known Web archiving project. The idea is that it could be handy in situations where you have a URL, but the URL no longer resolves, and you would like to see the content. Web archiving projects are being run by national libraries, archives and non-profits that are part of the International Internet Preservation Consortium.

The genesis for webarchives was work done by the Memento Project on the Memento Proxy which provided the seed for the scraping backend modules used by webarchives.

Usage

The webarchives module provides a function lookup, which you pass a url that you want to lookup in the Web archives. lookup will return a list of (time, url) tuples. Each tuple represents when the requested url was archived and where the archived representation can be retrieved from.

import webarchives

print webarchives.lookup("http://www.geocities.com/homestead/homedir.html")

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
webarchives		webarchives
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

webarchives

Usage

About

Releases

Packages

Languages

edsu/webarchives

Folders and files

Latest commit

History

Repository files navigation

webarchives

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages