Code to download and produce a datafile of Google's top 1000 sites
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


Google's list of the top 1000 sites on the web, based on data gathered by DoubleClick.

Result data can be found top_1000_sites.tsv

To run, just redirect output of to a file. In my case, I ran 

python -i > top_1000_sites.tsv

to produce top_1000_sites.tsv

data sourced from

Inspired by HuffPo article

This site is not exhaustive. According to Google's description:
"Keep in mind that the list excludes adult sites, ad networks, domains that don't have publicly visible content or don't load properly, and certain Google sites."