Migrate a MoinMoin wiki as a Git repository
Switch branches/tags
Nothing to show
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Type Name Latest commit message Commit time
Failed to load latest commit information.
moin2rst @ 74cbf9a
.gitignore
.gitmodules
LICENSE
README.rst
moin2git.py
requirements.txt
wikiconfig.py

README.rst

moin2git

A tool to migrate the content of a MoinMoin wiki to a Git backed wiki engine like Gollum, Realms, Waliki or similar.

Install

git clone --recursive https://github.com/mgaitan/moin2git.git
[sudo] pip install -r requirements.txt

If you also want to convert each page to reStructuredPage format, (see --convert-to-rst) you will need to install MoinMoin:

[sudo] pip install moin

Usage

tin@morochita:~$ python moin2git.py --help
moin2git.py

A tool to migrate the content of a MoinMoin wiki to a Git based system
like Waliki, Gollum or similar.

Usage:
  moin2git.py migrate <data_dir> <git_repo> [--convert-to-rst] [--users-file <users_file>]
  moin2git.py users <data_dir>
  moin2git.py attachments <data_dir> <dest_dir>

Arguments:
    data_dir  Path where your MoinMoin content is
    git_repo  Path to the target repo (created if it doesn't exist)
    dest_dir  Path to copy attachments (created if it doesn't exist)

Options:
    --convert-to-rst    After migrate, convert to reStructuredText
    --users-file        Use users_file to map wiki user to git commit author

Workarounds

If you need to convert the markup to rst, you will need a working moinmoin instance. For a fast and dirty configuration, put your data in a directory named wiki, and copy wikiconfig.py in the same level:

wikiconfig.py
wiki/
├── data/

Then copy moin2git/moin2rst/text_x-rst.py to wiki/data/plugins/formatters/

How it works

MoinMoin is a wiki engine powered by Python that store its content (including pages, history of changes and users) as flat files under the directory /data.

An overview of the structure of this tree is this:

data/
├── cache
│   │     ...
│
├── pages
│   │
│   ├── AdoptaUnNewbie
│   │   ├── cache
│   │   │   ├── hitcounts
│   │   │   ├── pagelinks
│   │   │   └── text_html
│   │   ├── current
│   │   ├── edit-lock
│   │   ├── edit-log
│   │   └── revisions
│   │       ├── 00000001
│   │       ├── 00000002
│   │
│   ├── AlejandroJCura
│   │   ├── cache
│   │   │   ├── pagelinks
│   │   │   └── text_html
│   │   ├── current
│   │   ├── edit-lock
│   │   ├── edit-log
│   │   └── revisions
│   │       ├── 00000001
│   │       ├── 00000002
│   │       └── 00000003
│   │
│   ├── AlejandroJCura(2f)ClassDec(c3b3)
│   │   ├── cache
│   │   │   ├── pagelinks
│   │   │   └── text_html
│   │   ├── current
│   │   ├── edit-lock
│   │   ├── edit-log
│   │   └── revisions
│   │       ├── 00000001
│   │       ├── 00000002
│   │       └── 00000003
 ...
│   └── YynubJakyfe
│       ├── edit-lock
│       └── edit-log
│
└── user
    ├── 1137591729.59.35593
    ├── 1137611536.06.62624
    ├── 1138297101.79.62731
    ├── 1138912320.61.21990
    ├── 1138912840.93.11353
    ...
  • Each wiki page (no matter how deep its url be) is stored in a directory /data/pages/<URL>. For example in our example the url /AlejandroJCura/ClassDec%C3%B3 [1] is data/pages/AlejandroJCura(2f)ClassDec(c3b3)

  • The content itself is in the directory /revisions, describing the history of a page. Each file in this directory is a full version of a the page (not a diff).

  • The file /data/pages/<URL>/current works as a pointer to the current revision (in general, the more recent one, but a page could be "restored" to an older revision). For example:

    tin@morochita:~/lab/moin$ cat data/pages/AlejandroJCura/current
    00000003
  • The edit-log file describes who, when and (if there is a log a message) why:

    tin@morochita:~/lab/moin$ cat data/pages/AlejandroJCura/edit-log
      1141363609000000    00000001    SAVENEW AlejandroJCura  201.235.8.161   161-8-235-201.fibertel.com.ar   1140672427.37.17771     Una pagina para mi?
      1155690306000000    00000002    SAVE    AlejandroJCura  201.231.181.174 174-181-231-201.fibertel.com.ar 1140672427.37.17771
      1218483772000000    00000003    SAVE    AlejandroJCura  201.250.38.50   201-250-38-50.speedy.com.ar 1140672427.37.17771

    The data logged is (in this order, separated by tabs):

    EDITION_TIMESTAMP, REVISION, ACTION, PAGE, IP, HOST, USER_ID, ATTACHMENTS, LOG_MESSAGE

  • The USER_ID point to a file under the directory /data/user contained a lot of information related to the user. For example:

    (preciosa)tin@morochita:~/lab/moin$ cat data/user/1140549890.71.33402
    remember_me=1
    theme_name=pyar
    editor_default=text
    show_page_trail=1
    disabled=0
    quicklinks[]=Noticias
    css_url=
    edit_rows=20
    show_nonexist_qm=0
    show_fancy_diff=1
    tz_offset=-10800
    subscribed_pages[]=
    aliasname=
    remember_last_visit=0
    enc_password={SHA}5kXNi+HjaTCGItkg6yTPNRtSDGE=
    email=mautuc@yahoo(....)
    show_topbottom=0
    editor_ui=freechoice
    datetime_fmt=
    want_trivial=0
    last_saved=1219176737.74
    wikiname_add_spaces=0
    name=MauricioFerrari
    language=
    show_toolbar=1
    edit_on_doubleclick=0
    date_fmt=
    mailto_author=0
    bookmarks{}=

Solving the puzzle

moin2git.py uses git (via the wonderful sh) to handle the history, so don't need multiples files to track differents revision of a page

For instance, in the root of our target directory (the git repo) we should get a file AlejandroJCura:

  • 3 revisions (commits), from revisions/00000001 until revisions/00000003
  • the author name/nickname and email (if available) is parsed from the user file of each revision. To know who and when made what version, moin2git.py parses the edit-log file of each page.

We should also get a file AlejandroJCura/ClassDecó [2] where, in this case, AlejandroJCura/ is a directory.

Commit authors

The option --users-file acepts a file that will be used to map wiki users to git commit authors.

The output of the command moin2git.py users <data_dir> can be used as input. For each users the required fields are name and email.

[1]http://python.org.ar/AlejandroJCura/ClassDec%C3%B3
[2]Note we should parse the ugly escaping. (2f) is / and determines the left part is a directory. (c3b3) means %C3%B3, i.e. ó