Skip to content
perl port of scy's levitation
Perl Shell
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


This is a Perl port of scy's levitation. It reads MediaWiki dump files
revision by revision and writes a data stream to stdout suitable for 
git fast-import.

The first 1000 pages of the german Wikipedia and all their revisions
(about 390000) can be dumped in about 15 min on relatively moderate


You need at least Perl 5.10. The Perl interpreter has to be compiled
with threads support.

You also need a working C compiler for the inline SHA1 C function.
Currently this _must_ be gcc 4.3 callable as 'gcc-4.3'. This will be
fixed soon.

You need the following modules and their dependencies from CPAN:

- Regexp::Common
- Inline
- Compress::Raw::Zlib
- Carp::Assert

- CDB_File
- XML::Bare      >= 0.44
- Deep::Hash::Utils

Some Linux distributions will already have the first set.
Under Debian / Ubuntu the following command should set you:

  sudo apt-get install libregexp-common-perl \
                       libinline-perl libjson-xs-perl \
                       libcompress-raw-zlib-perl libcarp-assert-perl


First, initialize a git repository:

  cd /tmp
  mkdir blawiki
  cd blawiki
  git init

Then, "levitate". This is a three-step process:

  cat /path/to/blawiki-dump.xml | /path/to/levitation-perl/
  LC_ALL=C sort rev-table.txt > rev-sorted.txt
  /path/to/levitation-perl/ | /path/to/levitation-perl/

Alternatively, you can just change to an empty directory and call the
"levitate" helper script with a path to a dump as parameter (may be 
7z, bz2, gz or xml):

  mkdir /tmp/blawiki
  cd /tmp/blawiki
  /path/to/levitation-perl/levitate /path/to/blawiki-dump...

Lots of progress information is printed to standard error, so it may be
best to redirect that to a file.

Have fun.

Something went wrong with that request. Please try again.