Scripts to get the aggregated page views for all articles of a WM project (before pageview API was released)
Perl Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
source
target
README
link-month.sh
make-raw.sh

README

Note: This requires a lot of disk space. The raw data uses about 1 GB 
per day of data, The output for 40 days of data takes about 16 GB. 

1. Download the raw data
If you're running on Tool Labs, call link-month.sh with each year and month
you want to index. This will create links in source/ for each of the files in
that month.

Otherwise, download the hourly pagecounts-*.gz files from
http://dumps.wikimedia.org/other/pagecounts-raw/ (projectcounts aren't needed.)

2. Make the list of average daily hitcounts, which will be created
in the file hitcounts.raw.gz. Run
	sh make-raw.sh

On Tool Labs, the grid engine should be used:
	jsub -cwd -j y ./make-raw.sh
This will send the output of the command to ~/make-raw.out ; you can monitor
this file with
	tail -f ~/make-raw.out