No description or website provided.
Failed to load latest commit information.
app use any() instead of count()>0 Aug 15, 2012
conf added linkfind Aug 14, 2012
project init Aug 14, 2012
public first version Aug 14, 2012
.gitignore first version Aug 14, 2012
Procfile for heroku Aug 14, 2012 README Aug 14, 2012 README Aug 14, 2012

Link Reverse!

Mining of the CommonCrawl Corpus

The mining of the common crawl corpus has been done in Spark. My experimental source code is available. I did various experiments with mining links in documents, but at the end, settled on something relatively simple: just show which pages link to a certain URL.


This webapp shows the results. There are two limitations: first, for tractability of the prototype, I am only including links to the domain Second, I've only mined the two first valid segments in CommonCrawl.