Miscellaneous tools for processing WARC files from the CommonCrawl
Go
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
detect-chinese
read-meta
.gitignore
LICENSE
README.md

README.md

Warc Tools

Some rather use-case-specific tools for pulling stuff out of the Common Crawl data on AWS.

License

This code is Licensed under the MIT License

Copyright © 2013 Kevin Bullaughey