This project aims at allowing people to browse Wikipedia offline.
- download a database dump at (you want the "pages-articles" one)
- run "$ perl berkeleyify [lang]wikipedia-page-articles-YYYYMMDD.xml.bz2" into your CGI directory
Requires: a local webserver (I recommend thttpd), a recent version of Perl (5.14 will do),
Text::Mediawiki, and probably other stuff.
The script berkeleyify will create three files:
- [lang]wikipedia-pages-articles-YYYYMMDD.db:
This is the actual database. Articles are stored in compressed blocks of 256 articles.
This is a read-only database. It is not designed to be modified.
- [lang]wikipedia-pages-articles-YYYYMMDD.index:
This berkeley database associates each article title to the position in the .db file.
- [lang]wikipedia-pages-articles-YYYYMMDD.titles:
This is a plain text list of titles. It is used with grep to search for titles.
License information is repeated in each individual files.
