Create your own GitHub profile
Sign up for your own profile on GitHub, the best place to host code, manage projects, and build software alongside 28 million developers.Sign up
An Apache Spark framework for easy data processing, extraction as well as derivation for Web archives and archival collections, developed by the Internet Archive and L3S Research Center.
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
Don't write specs anymore, just save 'em while testing your code interactively. Specs will become a byproduct.
Scripts to transfer archive.org collections, using https://github.com/jjjake/internetarchive
Analyze digitized books from the Internet Archive remotely with ArchiveSpark