Join GitHub today
[DEPRECATED] Scripts to get the aggregated page views for all articles of a WM project (before pageview API was released)
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Failed to load latest commit information.|
Note: This requires a lot of disk space. The raw data uses about 1 GB per day of data, The output for 40 days of data takes about 16 GB. 1. Download the raw data If you're running on Tool Labs, call link-month.sh with each year and month you want to index. This will create links in source/ for each of the files in that month. Otherwise, download the hourly pagecounts-*.gz files from http://dumps.wikimedia.org/other/pagecounts-raw/ (projectcounts aren't needed.) 2. Make the list of average daily hitcounts, which will be created in the file hitcounts.raw.gz. Run sh make-raw.sh On Tool Labs, the grid engine should be used: jsub -cwd -j y ./make-raw.sh This will send the output of the command to ~/make-raw.out ; you can monitor this file with tail -f ~/make-raw.out