Report or block helgeho
Contact Support about this user's behavior.Report abuse
An Apache Spark framework for easy data processing, extraction as well as derivation for Web archives and archival collections, developed by the Internet Archive and L3S Research Center.
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
Don't write specs anymore, just save 'em while testing your code interactively. Specs will become a byproduct.
Scripts to transfer archive.org collections, using https://github.com/jjjake/internetarchive
Analyze digitized books from the Internet Archive remotely with ArchiveSpark