Grow your team on GitHub
GitHub is home to over 28 million developers working together. Join them to grow your own development teams, manage permissions, and collaborate on projects.Sign up
A chrome browser extension
One webpage for every book ever published!
brozzler - distributed browser-based web crawler
page diff for wayback machine
Python Client Library for the Archive.org OpenLibrary API
MIRROR of upstream IA repository
Heritrix is the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler project.
The Internet Archive BookReader
WARC writing MITM HTTP/S proxy
web access control (exclusion oracle) tools for optional use with wayback machine
A Python 3.4 application that calculates and returns simhash values for Internet Archive's snapshots
Internet Archive utility which converts abbyy to epub3
Internet Archive Decentralized Web Common API
IA's public Wayback Machine (moved from SourceForge)
Decentralized web Gateway for Internet Archive
rethinkdb python library
Sort-friendly URI Reordering Transform (SURT) python module
A queue-controlled browser automation tool for improving web crawl quality
A repository of cleanup bots implementing the openlibrary-client
Trough: Big data, small databases.
Archive.org OPDS Bookserver - A standard for digital book distribution
Python script to create CDX index files of WARC data