Use pandoc to create XML suitable for xml2rfc
Pull request Compare This branch is 316 commits behind miekg:master.
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
ref
vim/syntax
.gitignore
AUTHORS
CONTRIBUTORS
ChangeLog
Makefile
README.mkd
back.pdc
draft.example.txt
middle.pdc
template.xml
transform.xsl
xml-single
xml-wrap

README.mkd

Introduction

This document presents a technique for using Pandoc syntax as a source format for documents in the Internet-Drafts (I-Ds) and Request for Comments (RFC) series.

Pandoc is an "almost plain text" format and therefor particularly well suited for editing RFC-like documents.

Note: this document is typeset in Pandoc and does not render completely correct when reading it on github.

Pandoc to RFC

Pandoc2rfc -- designed to do the right thing, until it doesn't.

When writing we directly wrote the XML. Needless to say it was tedious even thought the XML of xml2rfc is very "light".

During the last few years people have been developing markup languages that are very easy to remember and type. These languages have become known as almost plain text-markup languages. One of the first was the Markdown syntax. One that was developed later and also has found a good balance between the power of expression and its syntax is Pandoc. The power of Pandoc also comes from the fact that it can be translated to numerous output formats, including, but not limited to: HTML, Markdown and docbook XML.

So for my next RFC (if ever!) I decided I wanted to use Pandoc. As xml2rfc uses XML I thought the easiest way would be to create docbook XML and transform that using XSLT. Pandoc2rfc does just that. The conversions are, in some way amusing, as we start off with (almost) plain text, use elaborate XML and end up with plain text again.

+-------------------+   pandoc   +---------+  
| ALMOST PLAIN TEXT |   ------>  | DOCBOOK |  
+-------------------+            +---------+  
              |                       |
non-existent  |                       | xsltproc
 quicker way  |                       |
              v                       v
      +------------+    xml2rfc  +---------+ 
      | PLAIN TEXT |  <--------  | XML2RFC | 
      +------------+             +---------+ 
Figure: Attempt to justify Pandoc2rfc.

The XML generated (the output after the xsltproc step in ) is suitable for inclusion in either the middle or back section of an RFC. The easiest way is to create a template XML file and include the appropriate XML:

<?xml version='1.0' ?>
<!DOCTYPE rfc SYSTEM 'rfc2629.dtd'>

<rfc ipr='trust200902' docName='draft-gieben-pandoc-rfcs-01'>
 <front>
    <title>Writing I-Ds and RFCs using Pandoc</title>
</front>

<middle>
    <?rfc include="middle.xml"?>
</middle>

<back>
    <?rfc include="back.xml"?>
</back>

</rfc>
Figure: A minimal template.xml.

See the Makefile for an example of this. In this case you need to edit 3 documents:

  1. middle.pdc;
  2. back.pdc;
  3. template.xml (probably a fairly static file).

The draft (draft.txt) is automatically created when you call make. This also works for this README.mkd.

It needs xsltproc and pandoc to be installed. See the Pandoc user manual for the details on how to type in Pandoc style.

When using Pandoc2rfc consider adding the following sentence to an Acknowledgements section:

 This document was prepared using Pandoc2rfc.

Starting a new project

When starting a new project with pandoc2rfc you'll need to copy the following files:

  • Makefile
  • xml-single (for creating a single .xml file)
  • xml-wrap (wraps differences between xml2rfc 1.3.x and xml2rfc 2.x)
  • transform.xslt
  • And the above mentioned files:
    • middle.pdc
    • back.pdc
    • template.xml

After that you can start editing.

Supported Features

  • Sections with an anchor and title attributes;
  • Tables with an anchor and postamble;
  • Lists
    • style=symbols;
    • style=numbers;
    • style=letters (lower- and uppercase);
    • style=format %i, roman lowercase numerals;
    • style=format (%d), roman uppercase numerals;
    • style=hanging;
    • style=empty;
  • Figure/artwork with an anchor and postamble;
  • Block quote - not supported by xml2rfc, this is converted to <list style="hanging> paragraph;
  • References
    • external (eref);
    • internal (xref), you can refer to:
      • sections;
      • tables;
      • figures.
  • Citations, by using interal references;
  • Spanx style=verb, style=emph and style=strong.
  • Indexes, by using footnotes;

Unsupported Features

  • Lists inside a table (xml2rfc doesn't handle this);
  • crefs: for comments (no input syntax available), use HTML comments: <!-- ... -->;

The heavy lifting is done by transform.xsl that transforms the XML. The homepage of Pandoc2rfc is this github repository.

Acknowledgements

The following people have helped to make Pandoc2rfc what it is today:

  • Benno Overeinder
  • Erlend Hamnaberg
  • Matthijs Mekking
  • Trygve Laugstøl

This document was prepared using Pandoc2rfc.

Pandoc Constructs

So, what syntax do you need to use to get the correct output? Well, it is just Pandoc. The best introduction to the Pandoc style is given in this README from Pandoc itself.

For convenience we list the most important ones in the following sections.

Paragraph

Paragraphs are separated with an empty line.

Section

Just use the normal sectioning commands available in Pandoc, for instance:

# Section1 One
Bla

Converts to: <section title="Section1 One" anchor="section1-one"> If you have another section that is also named "Section1 One", that anchor will be called "section1-one-1", but only when the sections are in the same source file!

Referencing the section is done with see [](#section1-one), as in see .

List Styles

A good number of styles are supported.

Symbol

A symbol list.

* Item one;
* Item two.

Converts to: <list style="symbol">

Number

A numbered list.

1. Item one;
2. Item two.

Converts to: <list style="numbers">

Empty

Using the default list markers from Pandoc:

A list using the default list markers.

#. First item;
#. Second item.

Converts to: <list style="empty">

Roman

Use the supported Pandoc syntax:

ii. First item;
ii. Second item.

Converts to: <list style="format %i.">. If you use uppercase Roman numerals, they convert to a different style:

II. First item;
II. Second item.

yields: <list style="format (%d) ">

Letter

A numbered list.

a. Item one;
b. Item two.

Converts to: <list style="letters">, uppercasing the letters works too.

Hanging

This is more like a description list, so we need to use:

First item that needs clarification:

:   Explanation one
More stuff, because item is difficult to explain.
* item1
* item2

Second item that needs clarification:

:   Explanation two

Converts to: <list style="hanging"> and <t hangText="First item that...">

If you want a newline after the hangTexts, search for the string OPTION in transform.xsl and uncomment it.

Figure/Artwork

Indent the paragraph with 4 spaces.

Like this

Converts to: <figure><artwork> ... Note that xml2rfc supports a caption with <artwork>. Pandoc does not support this, but Pandoc2rfc does. If you add a Figure: some text as the last line, the artwork gets a <postamble> with the text after Figure:. It will also be possible to reference the artwork. If a caption is supplied the artwork will be centered.

The reference anchor attribute will be: fig: + first 10 (normalized) characters from the caption. Where normalized means:

  • Take the first 10 characters of the caption (i.e. this is the text after the string Figure:);
  • Spaces are translated to a minus -;
  • Uppercase letters translated to lowercase.

So the first artwork with a caption will get fig:a-minimal- as a reference. See for instance .

This anchoring is completely handled from within the xslt, so cool Pandoc stuff, like adding sequence numbers in case of duplicates isn't supported.

Block Quote

This is not supported by xml2rfc, but any paragraph like:

> quoted text

Converts to: <t><list style="hanging" hangIndent="3"> ... paragraph.

References

External

Any reference like:

[Click here](URI)

Converts to: <ulink target="URI">Click here ...

Internal

Any reference like:

[Click here](#localid)

Converts to: <link target="localid">Click here ...

For referring to RFCs (for which you manually need add the reference source in the template, use a include refs.xml or something), you can just use:

[](#RFC2119)

And it does the right thing. Referencing sections is done with:

See [](#pandoc-constructs)

The word 'Section' is inserted automatically: ... see ... For referencing figures/artworks see . For referencing tables see .

Spanx Styles

The verb style can be selected with back-tics: `text` Converts to: <spanx style="verb"> ...

And the emphasis style with asterisks: *text* or underscores: _text_ Converts to: <spanx style="emph"> ...

And the emphasis style with double asterisks: **text** Converts to: <spanx style="strong"> ...

Table

A table can be entered as:

  Right     Left     Center     Default                                                                  
  -------   ------ ----------   -------                                                                  
       12     12        12        12                                                                   
      123     123       123       123                                                                   
        1     1         1         1       

  Table: A caption describing the table.
  Figure: A caption describing the figure describing the table.

Is translated to <texttable> element in xml2rfc. You can choose multiple styles as input, but they all are converted to the same style (plain <texttable>) table in xml2rfc. The column alignment is copied over to the generated XML.

The caption is always translated to a <postamble>. The <preamble> tag isn't supported. If a table has a caption, it will also get a reference. The reference anchor attribute will be: tab: + first 10 (normalized) characters from the caption. Where normalized means:

  • Take the first 10 characters of the caption (i.e. this is the text after the string Table:);
  • Spaces are translated to a minus -;
  • Uppercase letters translated to lowercase.

So the first table with a caption will get tab:a-caption- for reference use. See for instance .

This anchoring is completely handled from within the xslt, so cool Pandoc stuff, like adding sequence numbers in case of duplicates isn't supported.

Indexes

The footnote syntax of Pandoc is slightly abused to support an index. Footnotes are entered in two steps, you have a marker in the text, and later you give actual footnote text. Like this:

[^1]

[^1]: footnote text

We re-use this syntax for the <iref> tag. The above text translates to:

<iref item="footnote text"/>

Sub items are also supported. Use an exclamation mark (!) to separate them:

[^1]: item!sub item

Security Considerations

This document raises no security issues.

IANA Considerations

This document has no actions for IANA.