Browse Wikipedia offline using only perl scripts
Perl
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
README
berkeleyify
wiki

README

This project aims at allowing people to browse Wikipedia offline.

HOWTO:

- download a database dump at http://download.wikimedia.org (you want the "pages-articles" one)
- run "$ perl berkeleyify [lang]wikipedia-page-articles-YYYYMMDD.xml.bz2" into your CGI directory

Requires:  a local webserver (I recommend thttpd), a recent version of Perl (5.14 will do),
Text::Mediawiki, and probably other stuff.

The script berkeleyify will create three files:

- [lang]wikipedia-pages-articles-YYYYMMDD.db:
  This is the actual database.  Articles are stored in compressed blocks of 256 articles.
  This is a read-only database.  It is not designed to be modified.
- [lang]wikipedia-pages-articles-YYYYMMDD.index:
  This berkeley database associates each article title to the position in the .db file.
- [lang]wikipedia-pages-articles-YYYYMMDD.titles:
  This is a plain text list of titles.  It is used with grep to search for titles.

LICENSE:

License information is repeated in each individual files.