internetarchive Internet Archive
forked from mikemccabe/analyze_ocr
Parse OCR result files for pagenos, tables of contents, etc.
forked from rajbot/autocrop
This package contains a tool for automatically cropping and deskewing images of book pages captured by an Internet Archive Scribe bookscanner.
Command line retrieval of torrents using transmission-daemon (via transmission-remote)
forked from rajbot/CDX-Writer
Python script to create CDX index files of WARC data
Reduce annoying 404 pages by automatically checking for an archived copy in the Wayback Machine. Learn more about this Test Pilot experiment at https://testpilot.firefox.com/
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
forked from iipc/openwayback-access-control
web access control (exclusion oracle) tools for optional use with wayback machine