Skip to content

jirutka/dokuwiki2adoc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DokuWiki to AsciiDoc Converter

A study in “insanity”.

This project is a converter from DokuWiki to AsciiDoc markup. It’s the most complete converter between these markups you can find.[1]

What’s especially interesting about this converter is the implementation. It’s completely written in sed! Moreover, using just BRE (Basic Regular Expression) with few GNU extensions and some glue in (POSIX) shell. That’s quite insane, isn’t it? I’d say that it’s like trying to dig a hole with just teaspoon, but it works surprisingly good! Hence the subtitle: “A study in insanity”.[2] Read more in But Why Sed?.

Requirements

  • Linux system with common userland (Busybox, GNU coreutils, …​)

  • POSIX-sh compatible shell (e.g. Busybox ash, dash, ZSH, Bash, …​)

  • sed supporting GNU extensions: \?, \+, \|, ^ inside subexpressions, and q with exit code

  • Python for rewriting internal links

Features and Limitations

Most syntactic DokuWiki constructs can be converted to AsciiDoc.

Known limitations in the conversion from DokuWiki to AsciiDoc:

  • No syntax checking is done of the input file.

  • Appending a downloadable file (code snippet) to a Code block is not available (the link to the file is silently discarded by the converter).

  • Anchors (links to a location on the same page) may not work due to different naming conventions.

  • The vertical spanning (over several rows) of table cells is not provided.

  • Text decorations and “text not to be parsed” marks are only converted if opening and closing marks are on the same line.

  • Windows shares and Interwiki links are not converted.

  • Embedding HTML and PHP is not available.

  • RSS/Atom feed integration is not available.

  • Text to image conversion is not available.

Supported Plugins

In addition to the basic DokuWiki syntax, the following plugins are supported:

Plugin Limitations Notes

cli

All attributes (prompt, comment, continue) are ignored.

Converted to a source block with lang “sh”.

code

No limitations.

Code title is converted to a block title.

code2

Line number is ignored, no difference between header and footer.

Header/footer is converted to a block title.

comment

If comment is not properly terminated, the converter terminates with an error.

Content between the marks is discarded from the output.

htmlcomment

Recognized only when opening and closing tags are on the same line.

Content between <!-- and --> tags is discarded from the output.

latex, mathjax

Latex delimiters are not recognized in a nowiki area.

$...$ and $$...$$ are not recognized when opening and closing delimiters are not on the same line.

$...$ is converted only when there’s no alphanumeric character before/after opening/closing $, no space after/before opening/closing $, and no character with a special meaning after opening $.

  • $...$ is converted to stem:[...],

  • $$...$$ is converted to a STEM block,

  • <latex>...</latex> is converted to stem:[...] if opened/closed tags are on the same line, otherwise to a STEM block.

note

No limitations.

Note is converted to a corresponding admonition block, or admonition paragraph if

yalist

Only (un)ordered list item with single extra paragraph denoted by .. on the line immediately after the list item (** or --) is supported.

But Why Sed?

You may ask why I wrote this converter in sed, despite I consider it a bit insane. Well, I’m not the original author.

Back in winter 2017, I needed some converter from DokuWiki to AsciiDoc. The only ready-made solution I found was convtags – a bidirectional (!) converter between DokuWiki and AsciiDoc. It was originally developed for the Slackware Documentation Project that used to run on DokuWiki (see this page). I tried it on few pages and it worked very well, even on complicated tables!

However, convtags was written for an older AsciiDoc syntax and didn’t support some features I needed. It appeared to be quicker to improve convtags, even though it’s written completely in sed, than write a proper converter from scratch. So I started digging into it.

At first it went well, but as I tried it on more pages, more and more bugs (in the original code) started to appear. Not just minor bugs, but even quite fundamental. Eventually I fixed or rewrote most of the convtags’ code.

Now it works very well for wide variety of DokuWiki pages, but it took me much more time than I expected. I regret a little that I didn’t wrote a proper, modular converter from scratch instead. However, it was really very interesting experience (although a bit masochist). I really like regular expressions, sed and challenges… and this was hell challenge! I’ve learned a lot thanks to it.

Credits

dokuwiki2adoc is a fork of convtags created by Didier Spaier for the Slackware Linux Project. The most essential sed magic in this converter is Didier’s work. I’ve fixed many bugs and improved it a lot, yet I still don’t understand how exactly some of the tricks with hold space work. Didier is a real sed master!

License

This project is licensed under BSD-1-Clause License. For the full text of the license, see the LICENSE file.


1. If you know about better, more complete converter from DokuWiki to AsciiDoc, please let me know!
2. This phrase is inspired by Prosody – A study in simplicity.