confluence2html - Generate HTML from a subset of Confluence wiki markup
$ confluence2html < input.txt > output.html
confluence2html
is a command line filter that takes in a stream of text formatted with a subset of Confluence wiki markup and prints it out as HTML. The goal of this project is to provide a reasonable "publish your writing to HTML" experience for users of Confluence wiki syntax.
Unlike some other markup languages, this syntax provides features that are important for technical documents. For example:
- Plaintext Table Syntax
-
You can use Confluence's table syntax instead of having to write HTML tables by hand. This matters a lot if you create and maintain technical documents. (See "Tables" below.)
- Automatic Tables of Contents
-
You can use the
toc
macro to have a nice table of contents generated for your document. (See "Tables of Contents" below).
Note that syntax highlighting plugins are available for Vim, Emacs, and other text editors.
The subset of Confluence markup that we support is defined as follows:
- Headers
-
Two types of header syntax are supported: the classic Confluence Wiki Markup style headings, e.g., 'h2. My Cool Heading'; and an Emacs Outline Mode style heading, e.g., '** My Cool Heading'.
No other formatting inside the header text is supported. For example,
** Introduction to {{confluence2html}}
will not work. - Links
-
Standard links are supported, e.g.,
[Link to some other page on this wiki]
. This will be rewritten to link to a local file namedlink-to-some-other-page-on-this-wiki.html
. If no such file exists, this will be a broken link until the file is created. The easiest (but not only) way to do this is to have another file of Confluence markup in the same directory namedlink-to-some-other-page-on-this-wiki.txt
, which is generated at the same time.Note: the HTML output from the "standard link" syntax is still subject to change. A more flexible design is needed. One possibility is to design several different output formats the user can choose from; another is to allow the user to pass in a custom formatting tag as an argument which would allow them to do their own additional processing on the output, e.g.,
$ confluence2html --link-tag=LINK < page.txt | MOAR_FILTERING
External links are supported:
[Perl home page|http://www.perl.org] [http://www.example.org]
Confluence space keys are not supported, since the concept of "spaces" has no meaning in terms of processing a stream of text. A space feature could be added by a more sophisticated application built using this script.
- Macros
-
We support only a few of Confluence's many macros -- mainly those that are required for a reasonable "publish your writing to HTML" experience. Here's the list:
code
info
tip
note
warning
htmlcomment
table
toc
Other than
toc
, these macros can only be written as single tags on their own line that delimit blocks of text. For example:{info} Have some informative text! {info}
No arguments of the form
{info:title=I am the Title}
are supported. If you want to add a title yourinfo
block for readers, try something like:{info} *Important Information!* Your coffee is ready. {info}
Note that the
info
,tip
,note
,warning
, andtoc
macros output asdiv
tags with class names that correspond to the macro name for easy CSS styling. Example output:<div class="info"> <p>These instructions assume that you have already installed and correctly configured... For more information, see ... </p> </div>
This makes it trivial to add background colors for added emphasis.
- Tables of Contents
-
Prints a table of contents with links to some or all headers on the page. The following arguments are supported (in addition to none at all):
minlevel
,maxlevel
,exclude
-- and they must be supplied in that order. In other words, you must use thetoc
macro in one of the following ways:{toc}
-
Prints a table of contents using all headers on the page.
{toc:minlevel=$N}
-
$N
must be an integer between 1 and 6. This prints a table of contents with a minimum header size of$N
. {toc:minlevel=$N|maxlevel=$M}
-
$N
and$M
must be integers between 1 and 6. This prints a table of contents with a minimum header size of$N
and a maximum header size of$M
. {toc:minlevel=$N|maxlevel=$M|exclude=$REGEX}
-
$N
and$M
must be integers between 1 and 6, and$REGEX
is a Perl regular expression -- note that the regular expression is not surrounded by quotes. This prints a table of contents with a minimum header size of$N
and a maximum header size of$M
, with any headers matching$REGEX
being excluded.
- Lists
-
Ordered and unordered lists are supported. Example:
+ Apple + Banana + Cherry 1. Rhubarb 2. Tomato 3. Pomegranate
- line breaks
-
The line breaks that appear in the HTML output are those that appear in the text file. There is no support for the Confluence forced line break
\\
. - Tables
-
Tables are supported. The only requirement is that you must wrap the table itself in the
{table}
macro, e.g.:{table} || Name || Rank || Serial Number || | Steven Fluffernutter | Sergeant | 314159 | | Christopher Crunch | Captain | 271828 | | ... | ... | ... | | ... | ... | ... | {table}
Note: Having to wrap tables in the
table
macro is probably the most error-prone (and thus annoying) requirement from the perspective of an experienced Confluence user. This limitation should be removed in the future. - Images
-
Images are supported, with the following syntax:
!foo.png:450!
This will insert
foo.png
into the output, at 450 pixels in width. In order for confluence2html to know where to findfoo.png
, you must pass the--image-directory
flag, e.g.,confluence2html --image-directory /path/to/images < in
out.html>.
Many bugs are lurking in this code; it's a total hack. On the roadmap: more tests, refactoring, and perhaps even real parsing.
Rich Loveland, mailto:r@rmloveland.com