[DEPRECATED] Scripts to get the aggregated page views for all articles of a WM project (before pageview API was released)
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
bin
source
target
README
link-month.sh
make-raw.sh

README

Note: This requires a lot of disk space. The raw data uses about 1 GB 
per day of data, The output for 40 days of data takes about 16 GB. 

1. Download the raw data
If you're running on Tool Labs, call link-month.sh with each year and month
you want to index. This will create links in source/ for each of the files in
that month.

Otherwise, download the hourly pagecounts-*.gz files from
http://dumps.wikimedia.org/other/pagecounts-raw/ (projectcounts aren't needed.)

2. Make the list of average daily hitcounts, which will be created
in the file hitcounts.raw.gz. Run
	sh make-raw.sh

On Tool Labs, the grid engine should be used:
	jsub -cwd -j y ./make-raw.sh
This will send the output of the command to ~/make-raw.out ; you can monitor
this file with
	tail -f ~/make-raw.out