How do I create a MultiMarkdown document?

dphenderson edited this page Oct 13, 2011 · 19 revisions

How do I create a MultiMarkdown document?

The general concept in MultiMarkdown is that it should be easy for someone to type a plain text file that is human-readable, and then use the MultiMarkdown program to convert that text file into a more complicated computer language such as HTML or LaTeX. This allows you to create high quality output without having to spend hours and hours fiddling with font sizes, margins, etc.

The first step in learning to use MultiMarkdown is to learn how to use Markdown. MultiMarkdown is an extension to Markdown, and builds off of the basic fundamentals used in Markdown.

I recommend starting by familiarizing yourself with the Markdown basics and syntax pages.

Once you're familiar with the basics of Markdown, it will be relatively easy to pick up the advanced features included in MultiMarkdown.

Metadata

It is possible to include special metadata at the top of a MultiMarkdown document, such as title, author, etc. This information can then be used to control how MultiMarkdown processes the document, or can be used in certain output formats in special ways.

Title:   A Sample MultiMarkdown Document  
Author:  Fletcher T. Penney  
Date:    February 9, 2011  
Comment: This is a comment intended to demonstrate  
         metadata that spans multiple lines, yet  
         is treated as a single value.  
Test:    And this is a new key-value pair  

The syntax for including metadata is simple.

  • The metadata must begin at the very top of the document - no blank lines can precede it.

  • Metadata consists of the two parts - the key and the value

  • The metadata key must begin at the beginning of the line. It must start with a letter or number, then the following characters can consist of letters, numbers, spaces, hyphens, or underscore characters.

  • The end of the metadata key is specified with a colon (':')

  • After the colon comes the metadata value, which can consist of pretty much any characters (including new lines). To keep multiline metadata values from being confused with additional metadata, I recommend indenting each new line of metadata. If your metadata value includes a colon, it must be indented to keep it from being treated as a new key-value pair.

  • While not required, I recommend using two spaces at the end of each line of metadata. This will improve the appearance of the metadata section if your document is processed by Markdown instead of MultiMarkdown.

  • Metadata keys are case insensitive and stripped of all spaces during processing. This means that Base Header Level, base headerlevel, and baseheaderlevel are all the same.

  • Metadata is processed as plain text, so it should not include MultiMarkdown markup. It is possible to create customized XSLT files that apply certain processing to the metadata value, but this is not the default behavior.

  • After the metadata is finished, a blank line triggers the beginning of the rest of the document.

Complete documents vs "snippets"

In order to include metadata information such as a title, the HTML document created by MultiMarkdown must be "complete." This means that it starts with something that looks like:

<?xml version="1.0" encoding="UTF-8" standalone="yes" ?>
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml">
<head>

If you include metadata in your document (with two exceptions), then you will generate a complete document. If you don't include metadata, then you will instead generate a "snippet." The snippet will just include the relevant portion of HTML, but will not include the <head> or <body> elements.

Metadata that is only intended to affect the way MultiMarkdown processes the output will not trigger a complete document. Currently, this means you can use Base Header Level or Quotes Language and still output a snippet if you don't include any other metadata.

Document formats other than HTML don't have distinctions between complete documents and snippets. Any included and relevant metadata will be included if present.

"Standard" Metadata keys

There are a few metadata keys that are standardized in MultiMarkdown. You can use any other keys that you desire, but you have to make use of them yourself.

My goal is to keep the list of "standard" metadata keys as short as possible.

Author

This value represents the author of the document and is used in LaTeX documents to generate the title information.

Affiliation

This is used to enter further information about the author --- a link to a website, an employer, academic affiliation, etc.

Base Header Level

This is used to change the top level of organization of the document. For example:

Base Header Level: 2

# Introduction #

Normally, the Introduction would be output as <h1> in HTML, or \part{} in LaTeX. If you're writing a shorter document, you may wish for the largest division in the document to be <h2> or \chapter{}. The Base Header Level metadata tells MultiMarkdown to change the largest division level to the specified value.

This can also be useful when combining multiple documents.

Base Header Level does not trigger a complete document.

Additionally, there are "flavors" of this metadata key for various output formats so that you can specify a different header level for different output formats --- i.e. LaTeX Header Level, HTML Header Level, and ODF Header Level.

Biblio Style

This metadata specifies the name of the BibTeX style to be used, if you are not using natbib.

BibTeX

This metadata specifies the name of the BibTeX file used to store citation information. Do not include the trailing '.bib'.

Copyright

This can be used to provide a copyright string.

CSS

This metadata specifies a URL to be used as a CSS file for the produced document. Obviously, this is only useful when outputting to HTML.

Date

Specify a date to be associated with the document.

HTML Header

You can include raw HTML information to be included in the header. MultiMarkdown doesn't perform any validation on this data --- it just copies it as is.

As an example, this can be useful to link your document to a working MathJax installation (not provided by me):

HTML header:  <script type="text/javascript"
    src="http://example.net/mathjax/MathJax.js">
    </script>

Quotes Language

This is used to specify which style of "smart" quotes to use in the output document. The available options are:

  • dutch
  • english
  • french
  • german
  • germanguillemets
  • swedish

The default is english if not specified. This affects HTML output. To change the language of a document in LaTeX is up to the individual.

Quotes Language does not trigger a complete document.

LaTeX Footer

A special case of the LaTeX Input metadata below. This file will be linked to at the very end of the document.

LaTeX Input

When outputting a LaTeX document it is necessary to include various directions that specify how the document should be formatted. These are not included in the MultiMarkdown document itself --- instead they should be stored separately and linked to with \input{file} commands.

These links can be included in the metadata section. The metadata is processed in order, so I generally break my directives into a group that need to go before my metadata, a group that goes after the metadata but before the document itself, and a separate group that goes at the end of the document, for example:

latex input:        mmd-memoir-header
Title:              MultiMarkdown Example
Base Header Level:  2
latex mode:         memoir
latex input:        mmd-memoir-begin-doc
latex footer:       mmd-memoir-footer

You can download the LaTeX Support files if you want to output documents using the default MultiMarkdown styles. You can then use these as examples to create your own customized LaTeX output.

This function should allow you to do almost anything you could do using the XSLT features from MultiMarkdown 2.0. More importantly, it means that advanced LaTeX users do not have to learn XSLT to customize their code as desired.

LaTeX Mode

When outputting a document to LaTeX, there are two special options that change the output slightly --- memoir and beamer. These options are designed to be compatible with the LaTeX classes of the same names.

ODF Header

You can include raw XML to be included in the header of a file output in OpenDocument format. It's up to you to properly format your XML and get it working --- MultiMarkdown just copies it verbatim to the output.

Title

Self-explanatory.

Cross-References

An oft-requested feature was the ability to have Markdown automatically handle within-document links as easily as it handled external links. To this aim, I added the ability to interpret [Some Text][] as a cross-link, if a header named "Some Text" exists.

As an example, [Metadata][] will take you to the [section describing metadata][Metadata].

Alternatively, you can include an optional label of your choosing to help disambiguate cases where multiple headers have the same title:

### Overview [MultiMarkdownOverview] ##

This allows you to use [MultiMarkdownOverview] to refer to this section specifically, and not another section named Overview. This works with atx- or settext-style headers.

If you have already defined an anchor using the same id that is used by a header, then the defined anchor takes precedence.

In addition to headers within the document, you can provide labels for images and tables which can then be used for cross-references as well.

Image and Link Attributes

Adding attributes to links and images has been requested for a long time on the Markdown discussion list. I was fairly opposed to this, as most of the proposals really disrupted the readability of the syntax. I consider myself a "Markdown purist", meaning that I took John's introduction to heart:

The overriding design goal for Markdown's formatting syntax is to make it as readable as possible. The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it's been marked up with tags or formatting instructions. While Markdown's syntax has been influenced by several existing text-to-HTML filters, the single biggest source of inspiration for Markdown's syntax is the format of plain text email.

Because there was not a syntax proposal that I felt fit this goal, I was generally opposed to the idea.

Then, Choan C. Gálvez proposed a brilliantly simple syntax that stayed out of the way. By simply appending the attributes to the link reference information, which is already removed from the text itself, it doesn't disturb the readability.

For example:

This is a formatted ![image][] and a [link][] with attributes.

[image]: http://path.to/image "Image title" width=40px height=400px
[link]:  http://path.to/link.html "Some Link" class=external
         style="border: solid black 1px;"

This will generate width and height attributes for the image, and a border around the link. And while it can be argued that it does look "like it's been marked up with tags [and] formatting instructions", even I can't argue too strongly against it. The link and the title in quotes already look like some form of markup, and the the additional tags are hardly that intrusive, and they offer a great deal of functionality. They might even be useful in further functions (citations?).

The attributes must continue after the other link/image data, and may contain newlines, but must start at the beginning of the line. The format is attribute=value or attribute="multi word value". Currently, MultiMarkdown does not attempt to interpret or make any use of any of these attributes. Also, you can't have a multiword attribute span a newline.

Tables

I have implemented a syntax for tables similar to that used by Michael Fortin's PHP Markdown Extra.

Basically, it allows you to turn:

|             |          Grouping           ||
First Header  | Second Header | Third Header |
 ------------ | :-----------: | -----------: |
Content       |          *Long Cell*        ||
Content       |   **Cell**    |         Cell |

New section   |     More      |         Data |
And more      |            And more         ||
[Prototype table]

into a [table][Prototype Table].

| | Grouping || First Header | Second Header | Third Header | | ---------- | :-----------: | -----------: | Content | Long Cell || Content | Cell | Cell |

New section | More | Data | And more | And more || [Prototype table]

The requirements are:

  • There must be at least one | per line
  • The "separator" line must contain only |,-,:,., or spaces
  • Cell content must be on one line only
  • Columns are separated by |
  • The first line of the table, and the alignment/divider line, must start at the beginning of the line

Other notes:

  • It is optional whether you have |'s at the beginning and end of lines.

  • To set alignment, you can use a colon to designate left or right alignment, or a colon at each end to designate center alignment, as above. If no colon is present, the default alignment of your system is selected (left in most cases).

  • To indicate that a cell should span multiple columns, there simply add additional pipes (|) at the end of the cell, as shown in the example. If the cell in question is at the end of the row, then of course that means that pipes are not optional at the end of that row....

  • You can use normal Markdown markup within the table cells.

  • Captions are optional, but if present must be at the beginning of the line immediately preceding or following the table, start with [, and end with ]. If you have a caption before and after the table, only the first match will be used.

  • If you have a caption, you can also have a label, allowing you to create anchors pointing to the table. If there is no label, then the caption acts as the label

  • Cells can be empty.

  • You can create multiple <tbody> tags within a table by having a single empty line between rows of the table. This allows your CSS to place horizontal borders to emphasize different sections of the table.

Footnotes

I have added support for footnotes to MultiMarkdown, using the syntax proposed by John Gruber. Note that there is no official support for footnotes yet, so the output format may change, but the input format sounds fairly stable.

To create a footnote, enter something like the following:

Here is some text containing a footnote.[^somesamplefootnote]

[^somesamplefootnote]: Here is the text of the footnote itself.

[somelink]:http://somelink.com

The footnote itself must be at the start of a line, just like links by reference. If you want a footnote to have multiple paragraphs, lists, etc., then the subsequent paragraphs need an extra tab preceding them. You may have to experiment to get this just right, and please let me know of any issues you find.

This is what the final result looks like:

Here is some text containing a footnote.[^somesamplefootnote]

[^somesamplefootnote]: Here is the text of the footnote itself.

Citations

I have included support for basic bibliography features in this version of MultiMarkdown. Please give me feedback on ways to improve this but keep the following in mind:

  1. Bibliography support in MultiMarkdown is rudimentary. The goal is to offer a basic standalone feature, that can be changed using the tool of your choice to a more robust format (e.g. BibTeX, CiteProc). My XSLT files demonstrate how to make this format compatible with BibTeX, but I am not planning on personally providing compatibility with other tools. Feel free to post your ideas and tools to the wiki.

  2. Those needing more detailed function sets for their bibliographies may need customized tools to provide those services. This is a basic tool that should work for most people. Reference librarians will probably not be satisfied however.

To use citations in MultiMarkdown, you use a syntax much like that for anchors:

This is a statement that should be attributed to
its source[p. 23][#Doe:2006].

And following is the description of the reference to be
used in the bibliography.

[#Doe:2006]: John Doe. *Some Big Fancy Book*.  Vanity Press, 2006.

In HTML output, citations are indistinguishable from footnotes.

You are not required to use a locator (e.g. p. 23), and there are no special rules on what can be used as a locator if you choose to use one. If you prefer to omit the locator, just use an empty set of square brackets before the citation:

This is a statement that should be attributed to its 
source[][#Doe:2006].

There are no rules on the citation key format that you use (e.g. Doe:2006), but it must be preceded by a #, just like footnotes use ^.

As for the reference description, you can use Markup code within this section, and I recommend leaving a blank line afterwards to prevent concatenation of several references. Note that there is no way to reformat these references in different bibliography styles; for this you need a program designed for that purpose (e.g. BibTeX).

If you want to include a source in your bibliography that was not cited, you may use the following:

[Not cited][#citekey]

The Not cited bit is not case sensitive.

If you are creating a LaTeX document, the citations will be included, and natbib will be used by default. If you are not using BibTeX and are getting errors about your citations not being compatible with 'Author-Year', you can add the following to your documents metadata:

latex input:        mmd-natbib-plain

This changes the citation style in natbib to avoid these errors, and is useful when you include your citations in the MultiMarkdown document itself.

BibTeX

If you are creating a LaTeX document, and need a bibliography, then you should definitely look into BibTeX and natbib. It is beyond the scope of this document to describe how these two packages work, but it is possible to combine them with MultiMarkdown.

To use BibTeX in a MultiMarkdown document, you need to use the [BibTeX metadata][BibTeX] to specify where your citations are stored.

Since natbib is enabled by default, you have a choice between using the \citep and \citet commands. The following shows how this relates to the MultiMarkdown syntax used.

[#citekey]    => ~\citep{citekey}
[#citekey][]  => ~\citep{citekey}

[foo][#citekey] => ~\citep[foo]{citekey}

[foo\]\[bar][#citekey] => ~\citep[foo][bar]{citekey}


[#citekey;]    => \citet{citekey}
[#citekey;][]  => \citet{citekey}

[foo][#citekey;] => \citet[foo]{citekey}

[foo\]\[bar][#citekey;] => \citet[foo][bar]{citekey}

Definition Lists

MultiMarkdown has support for definition lists using the same syntax used in PHP Markdown Extra. Specifically:

Apple
:   Pomaceous fruit of plants of the genus Malus in 
    the family Rosaceae.
:   An american computer company.

Orange
:   The fruit of an evergreen tree of the genus Citrus.

becomes:

Apple : Pomaceous fruit of plants of the genus Malus in the family Rosaceae. : An american computer company.

Orange : The fruit of an evergreen tree of the genus Citrus.

You can have more than one term per definition by placing each term on a separate line. Each definition starts with a colon, and you can have more than one definition per term. You may optionally have a blank line between the last term and the first definition.

Definitions may contain other block level elements, such as lists, blockquotes, or other definition lists.

Unlike PHP Markdown Extra, all definitions are wrapped in <p> tags. First, I was unable to get Markdown not to create paragraphs. Second, I didn't see where it mattered - the only difference seems to be aesthetic, and I actually prefer the <p> tags in place. Let me know if this is a problem.

See the PHP Markdown Extra page for more information.

Math

MultiMarkdown 2.0 used ASCIIMathML to typeset mathematical equations. There were benefits to using ASCIIMathML, but also some disadvantages.

When rewriting for MultiMarkdown 3.0, there was no straightforward way to implement ASCIIMathML which lead me to look for alternatives. I settled on using MathJax. The advantage here is that the same syntax is supported by MathJax in browsers, and in LaTeX.

This does mean that math will need to be entered into MultiMarkdown documents using the LaTeX syntax, rather than ASCIIMathML.

To enable MathJax support in web pages, you have to include a link to an active MathJax installation --- setting this up is beyond the scope of this document, but it's not too hard.

Here's an example of the metadata setup, and some math:

latex input:    mmd-article-header  
Title:          MultiMarkdown Math Example  
latex input:    mmd-article-begin-doc  
latex footer:   mmd-memoir-footer  
HTML header:    <script type="text/javascript"
    src="http://example.net/mathjax/MathJax.js">
    </script>


An example of math within a paragraph --- \\({e}^{i\pi }+1=0\\)
--- easy enough.

And an equation on it's own:

\\[ {x}_{1,2}=\frac{-b\pm \sqrt{{b}^{2}-4ac}}{2a} \\]

That's it.

Here's what it looks like in action (if you're viewing this document in a supported format):

An example of math within a paragraph --- \({e}^{i\pi }+1=0\) --- easy enough.

And an equation on it's own:

\[ {x}_{1,2}=\frac{-b\pm \sqrt{{b}^{2}-4ac}}{2a} \]

That's it.

Glossary

MultiMarkdown has a feature that allows footnotes to be specified as glossary terms. It doesn't do much for XHTML documents, but the XSLT file that converts the document into LaTeX is designed to convert these special footnotes into glossary entries.

The glossary format for the footnotes is:

[^glossaryfootnote]: glossary: term (optional sort key)
    The actual definition belongs on a new line, and can continue on
    just as other footnotes.

The term is the item that belongs in the glossary. The sort key is optional, and is used to specify that the term should appear somewhere else in the glossary (which is sorted in alphabetical order).

Unfortunately, it takes an extra step to generate the glossary when creating a pdf from a latex file:

  1. You need to have the basic.gst file installed, which comes with the memoir class.

  2. You need to run a special makeindex command to generate the .glo file: makeindex -s `kpsewhich basic.gst` -o "filename.gls" "filename.glo"

  3. Then you run the usual pdflatex command again a few times.

Alternatively, you can use the code below to create an engine file for TeXShop (it belongs in ~/Library/TeXShop/Engines). You can name it something like MemoirGlossary.engine. Then, when processing a file that needs a glossary, you typeset your document once with this engine, and then continue to process it normally with the usual LaTeX engine. Your glossary should be compiled appropriately. If you use TeXShop, this is the way to go.

Note: Getting glossaries to work is a slightly more advanced LaTeX feature, and might take some trial and error the first few times.

#!/bin/ 

set path = ($path /usr/local/teTeX/bin/powerpc-apple-darwin-current 
    /usr/local/bin) # This is actually a continuation of the line above

set basefile = `basename "$1" .tex`

makeindex -s `kpsewhich basic.gst` -o "${basefile}.gls" "${basefile}.glo"

Raw HTML

You can include raw (X)HTML within your document. Exactly what happens with these portions depends on the output format. You can also use the markdown attribute to indicate that MultiMarkdown processing should be applied within the block level HTML tag. This is in addition to the --process-html command line option that causes MultiMarkdown processing to occur within all block level HTML tags.

For example:

<div>This is *not* MultiMarkdown</div>

<div markdown=1>This *is* MultiMarkdown</div>

will produce the following without --process-html:

<div>This is *not* MultiMarkdown</div>

<div>This <em>is</em> MultiMarkdown</div>

and with --process-html:

<div>This is <em>not</em> MultiMarkdown</div>

<div>This <em>is</em> MultiMarkdown</div>

However, the results may be different than anticipated when outputting to LaTeX or other formats. Normally, block level HTML will be ignored when outputting to LaTeX or ODF. The example above would produce the following, leaving out the first <div> entirely:

This \emph{is} MultiMarkdown

And this with --process-html:

This is \emph{not} MultiMarkdown
This \emph{is} MultiMarkdown

You will also notice that the line breaks are different when outputting to LaTeX or ODF, and this can cause the contents of two <div> tags to be placed into a single paragraph.