Create selections with the best articles of a WM project
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


This tool allows to easily build non-topic centric selections of
Wikipedia articles.

== Requirements ==

To run it, you need:
* MANDATORY: a GNU/Linux system
* MANDATORY: an access to Internet
* MANDATORY: an access to a Wikipedia database
* OPTION: an access to enwp10 rating database for Wikipedia in English

== Context ==

Many Wikipedias, in different languages, are over 500.000 articles and
even if we can provide offline versions with a reasonnable size, this
is still too much for many devices. That's why we need to build
offline versions with only a selections with the TOP best articles.

== Principle ==

This tool builds lists of key values (pageviews, links, ...) about
Wikipedia articles and put them in a directory. These key values are
everything we have as input to build smart selection algorithms. To
get more detalis about the list, read the README in the language based

== Tools ==

* give you the list of all
  wikipedia/languages with more than 500.000 entries.

* takes a language code ('en' for example) as first
  argument and create the directory with all the key values.

* to build/upload lists for all Wikipedia with
  more than 500.000 pages.

* generates on the output a copy of
  pagelinks with in the last column the the link target page id.

* generates a the list Wikipedia in
  English vital articles

* generates the list for projects with
  articles sorted (reverse order) by scores.

== Download ==

You can download the output of that scripts directly from using FTP, HTTP(s) or rsync.

You might be interested by downloading only the last version, here is
a small command (based on rsync) to retrieve the right directory name.

for ENTRY in `rsync --recursive --list-only \
| tr -s ' ' | cut -d ' ' -f5 | grep wiki | grep -v '/' | sort -r` ; \
do RADICAL=`echo $ENTRY | sed 's/_20[0-9][0-9]-[0-9][0-9]//g'`; \
if [[ $LAST != $RADICAL ]] ; then echo $ENTRY ; LAST=$RADICAL ; fi ; done