Report or block helgeho
Contact Support about this user's behavior.Report abuse
L3S Research Center
- Hannover, Germany
An Apache Spark framework that facilitates access to Web Archives, enables easy data extraction as well as derivation, developed by the Internet Archive and L3S Research Center.
An easy-to-use and highly customizable crawler that enables you to create your own little Web archives (WARC/CDX)
A Splitable Hadoop InputFormat for Concatenated GZIP Files and *.(w)arc.gz
Don't write specs anymore, just save 'em while testing your code interactively. Specs will become a byproduct.
Scripts to transfer archive.org collections, using https://github.com/jjjake/internetarchive
Analyze digitized books from the Internet Archive remotely with ArchiveSpark