perl port of scy's levitation
Perl Shell
Switch branches/tags
Nothing to show
Pull request Compare This branch is 8 commits ahead of sbober:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Failed to load latest commit information.


This is a Perl port of scy's levitation. It reads MediaWiki dump files
revision by revision and writes a data stream to stdout suitable for 
git fast-import.

The first 1000 pages of the german Wikipedia and all their revisions
(about 390000) can be dumped in about 15 min on relatively moderate


You need at least Perl 5.10. The Perl interpreter has to be compiled
with threads support.

You also need a working C compiler for the inline SHA1 C function.
Although gcc 4.3 was specified by old versions of levitation-perl
(circa May 2010), gcc 4.5.2 also appears to work (at least for some
small wiki dumps).

You need zlib, for example the following should work under Debian/Ubuntu:
  sudo apt-get install zlib1g-dev

You need the following modules and their dependencies from CPAN:

- Regexp::Common
- Inline
- Compress::Raw::Zlib
- Carp::Assert
- Devel::Size

- CDB_File
- XML::Bare      >= 0.44
- Deep::Hash::Utils
- File::Path     > 2.04 ? (2.08 is fine; it has to export make_path)

Some Linux distributions will already have the first set.
Under Debian / Ubuntu the following command should set you:

  sudo apt-get install libregexp-common-perl \
                       libinline-perl libjson-xs-perl \
                       libcompress-raw-zlib-perl libcarp-assert-perl \

For Fedora, the equivalent is
  sudo yum install perl-Regexp-Common \
                   perl-Inline perl-JSON-XS \
                   perl-Compress-Raw-Zlib perl-Carp-Assert

For the second set, you may be able to just run:
  sudo cpan -i CDB_File
  sudo cpan -i XML::Bare
  sudo cpan -i Deep::Hash::Utils


First, initialize a git repository:

  cd /tmp
  mkdir blawiki
  cd blawiki
  git init

(Alternately, go to the working directory of an existing git repository where you want to import the dump, such as one from a previous run of levitation-perl).

Then, "levitate". This is a three-step process:

  cat /path/to/blawiki-dump.xml | /path/to/levitation-perl/
  LC_ALL=C sort rev-table.txt > rev-sorted.txt
  /path/to/levitation-perl/ | /path/to/levitation-perl/

Alternatively, you can just change to an empty directory and call the
"levitate" helper script with a path to a dump as parameter (may be 
7z, bz2, gz or xml):

  mkdir /tmp/blawiki
  cd /tmp/blawiki
  /path/to/levitation-perl/levitate /path/to/blawiki-dump...

Lots of progress information is printed to standard error, so it may be
best to redirect that to a file.

Have fun.