Common Crawl Miner

This is a tool for mining parallel web pages from the CommonCrawl data hosted on AWS. It is based on the CommonCrawl example codebase:

This was developed during the 2012 Machine Translation Marathon by:

Herve Saint-Amand (herve@saintamh.org) Magdalena Plamada (plamada@cl.uzh.ch) Jason Smith (jrs026@gmail.com)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
lib		lib
scripts		scripts
src		src
README.md		README.md
build.xml		build.xml

Provide feedback