Skip to content

namin/linkrev

master
Switch branches/tags
Code

Latest commit

 

Git stats

Files

Permalink
Failed to load latest commit information.
Type
Name
Latest commit message
Commit time
app
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Link Reverse!

Mining of the CommonCrawl Corpus

The mining of the common crawl corpus has been done in Spark. My experimental source code is available. I did various experiments with mining links in documents, but at the end, settled on something relatively simple: just show which pages link to a certain URL.

Results

This webapp shows the results. There are two limitations: first, for tractability of the prototype, I am only including links to the domain mit.edu. Second, I've only mined the two first valid segments in CommonCrawl.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages