Skip to content
This repository

HTTPS clone URL

Subversion checkout URL

You can clone with HTTPS or Subversion.

Download ZIP

perl port of scy's levitation

branch: master

Fetching latest commit…

Octocat-spinner-32-eaf2f5

Cannot retrieve the latest commit at this time

Octocat-spinner-32 Git
Octocat-spinner-32 t
Octocat-spinner-32 .gitignore
Octocat-spinner-32 AUTHORS
Octocat-spinner-32 COPYING
Octocat-spinner-32 Faster.pm
Octocat-spinner-32 PrimitiveXML.pm
Octocat-spinner-32 README
Octocat-spinner-32 THANKS
Octocat-spinner-32 gfi.pl
Octocat-spinner-32 levitate
Octocat-spinner-32 step1.pl
Octocat-spinner-32 step2.pl
README
This is a Perl port of scy's levitation. It reads MediaWiki dump files
revision by revision and writes a data stream to stdout suitable for 
git fast-import.

The first 1000 pages of the german Wikipedia and all their revisions
(about 390000) can be dumped in about 15 min on relatively moderate
hardware.


Dependencies
------------

You need at least Perl 5.10. The Perl interpreter has to be compiled
with threads support.

You also need a working C compiler for the inline SHA1 C function.
Currently this _must_ be gcc 4.3 callable as 'gcc-4.3'. This will be
fixed soon.

You need the following modules and their dependencies from CPAN:

- Regexp::Common
- Inline
- JSON::XS
- Compress::Raw::Zlib
- Carp::Assert

- CDB_File
- XML::Bare      >= 0.44
- Deep::Hash::Utils

Some Linux distributions will already have the first set.
Under Debian / Ubuntu the following command should set you:

  sudo apt-get install libregexp-common-perl \
                       libinline-perl libjson-xs-perl \
                       libcompress-raw-zlib-perl libcarp-assert-perl


Usage
-----

First, initialize a git repository:

  cd /tmp
  mkdir blawiki
  cd blawiki
  git init


Then, "levitate". This is a three-step process:

  cat /path/to/blawiki-dump.xml | /path/to/levitation-perl/step1.pl
  LC_ALL=C sort rev-table.txt > rev-sorted.txt
  /path/to/levitation-perl/step2.pl | /path/to/levitation-perl/gfi.pl


Alternatively, you can just change to an empty directory and call the
"levitate" helper script with a path to a dump as parameter (may be 
7z, bz2, gz or xml):

  mkdir /tmp/blawiki
  cd /tmp/blawiki
  /path/to/levitation-perl/levitate /path/to/blawiki-dump...

Lots of progress information is printed to standard error, so it may be
best to redirect that to a file.

Have fun.

Something went wrong with that request. Please try again.