Code to download and produce a datafile of Google's top 1000 sites
Switch branches/tags
Nothing to show
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
README
top_1000_sites.py
top_1000_sites.tsv

README

Google's list of the top 1000 sites on the web, based on data gathered by DoubleClick.

Result data can be found top_1000_sites.tsv

To run, just redirect output of top_1000_sites.py to a file. In my case, I ran 

python -i top_1000_sites.py > top_1000_sites.tsv

to produce top_1000_sites.tsv

data sourced from http://www.google.com/adplanner/static/top1000/

Inspired by HuffPo article http://www.huffingtonpost.com/2011/06/24/most-visited-sites-2011_n_883756.html#s297574&title=17__Youkucom


This site is not exhaustive. According to Google's description:
"Keep in mind that the list excludes adult sites, ad networks, domains that don't have publicly visible content or don't load properly, and certain Google sites."