Report or block helgeho
Contact Support about this user's behavior.Report abuse
An Apache Spark framework that facilitates access to Web Archives, enables easy data extraction as well as derivation, developed by the Internet Archive and L3S Research Center.
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
Don't write specs anymore, just save 'em while testing your code interactively. Specs will become a byproduct.
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)
Scripts to transfer archive.org collections, using https://github.com/jjjake/internetarchive
Analyze digitized books from the Internet Archive remotely with ArchiveSpark