This is a tool for mining parallel web pages from the CommonCrawl data hosted on AWS. It is based on the CommonCrawl example codebase:
https://github.com/commoncrawl/commoncrawl-examples
This was developed during the 2012 Machine Translation Marathon by:
Herve Saint-Amand (herve@saintamh.org) Magdalena Plamada (plamada@cl.uzh.ch) Jason Smith (jrs026@gmail.com)