Skip to content
Browse files

Addition of an unfinished README (it only lacks details about internals)

  • Loading branch information...
1 parent 3ec4196 commit b795385ddcae0c3b120c6c2435eb9adcf23c8c8c Natacha Porté committed Oct 23, 2009
Showing with 281 additions and 0 deletions.
  1. +281 −0 README
@@ -0,0 +1,281 @@
+1. Introduction
+2. Usage
+3. Internals
+For some projects of mine, I wanted a lightweight C library that can parse
+John Gruber's [markdown](
+format into whatever I want, and that is easily extendable.
+The only C implementations of markdown that I know of are [Discount]
+( and [PEG-markdown]
+( Discount seemed a little
+bit too integrated and focused on HTML output for my taste, and
+PEG-markdown seemed to have a lot of dependencies and stuff. So I wrote my
+I like to keep things simple, so I wrote a functions that performs *only*
+markdown parsing: no file reading or writing, no (X)HTML considerations,
+etc. The actual output is performed by a set of dedicated callback
+functions, called here a renderer. Some example renderers are provided, but
+you are free to use your own to output in any format you like.
+This callback mechanism make libupskirt so flexible that it does not need
+any flag or external information besides input text and renderer to
+### Library function call
+The only exported function in libupskirt is `markdown()`:
+ void markdown(struct buf *ob, struct buf *ib, const struct mkd_renderer *rndr);
+- `ob` is the output buffer, where the renderer will append data,
+- `ib` is the input buffer, where the markdown text should be stored prior
+ to the `markdown()` call,
+- `rndr` is a pointer to the renderer structure.
+How to use these structures is explained in the following sections.
+### Buffers: struct buf
+I use `struct buf` extensively in input and output buffers. The initial
+idea was constructing a Pascal-string like structure, to be able to store
+both text and binary data. Hence the members `data`, a char pointer to the
+buffer data, and `size` containing the data length.
+When using a `struct buf` as an output buffer, it is useful to pre-allocate
+the memory area before filling it, so I added an `asize` member containing
+the allocated size of the memory pointed by `data`.
+When accumulating data in a growing memory area, there is a tradeoff
+between memory usage and speed: the more bytes are added each time, the
+less `realloc()` is called, which means potentially less `memcpy()` to a new
+zone, so a faster code, but more memory being allocated for nothing. To
+set the tradeoff on a case-by-case basis, there is a `unit` member in the
+structure: when more memory is needed, `asize` is augmented by a multiple
+of `unit`. So the larger `unit`, the more memory is allocated at once, the
+`reallioc()` is called.
+To further improve code efficiency by removing unneeded memcpy, I added a
+reference count to the structure: the `ref` member.
+Buffers are created using `bufnew()` whose only argument is the value for
+`unit`. `bufrelease()` decreases the reference count of a buffer, and frees
+it when this count is zero. `bufset()` is used to set a `struct buf`
+pointer to point to the given buffer, increasing reference count and
+dealing with special cases like volatile buffers.
+Usually data from `struct buf` are read through direct access of its
+members `data` and `size`. One interesting trick which might not be widely
+known is how to printf a buffer (or any kind of non-zero-terminated
+string) that doesn't contains any zero, using the `%.*s`. For example:
+ printf("Buffer string: \"%.*s\"\n", (int)buf->size, buf->data);
+In case you really need a zero-terminated string, you can call
+`bufnullterm()` which appends a zero character without changing `size`,
+hence the buffer being virtually the same (and will no longer be
+zero-terminated after the following data append) but `data` can be used as
+a regular C string.
+The most common functions to append data into buffers are:
+- `bufprintf()` which behaves like any \*printf function,
+- `bufput()` which is similar to `memcpy()`,
+- `bufputs()` which appends a zero-terminated string to a buffer,
+- `BUFPUTSL()` which is a macro to replace `bufputs()` when using string
+ litterals, because then the data size is known at compile-time, this
+ saves a call to `strlen()`,
+- `bufputc()` for single-character appends.
+Modification of existing data in a buffer is also performed through direct
+access of structure members.
+This covers the basics to handle my `struct buf`, but there might still be
+some interesting stuff to be learned from the header.
+### Renderer: struct mkd_renderer
+Libupskirt only performs the parsing of markdown input, the construction of
+the output is left to a *renderer*, which is a set of callback functions
+called when markdown elements are encountered. Pointers to these functions
+are gathered into a `struct mkd_renderer` along with some renderer-related
+data. I think the struct declaration is pretty obvious:
+ struct mkd_renderer {
+ /* block level callbacks - NULL skips the block */
+ void (*blockcode)(struct buf *ob, struct buf *text, void *opaque);
+ void (*blockquote)(struct buf *ob, struct buf *text, void *opaque);
+ void (*blockhtml)(struct buf *ob, struct buf *text, void *opaque);
+ void (*header)(struct buf *ob, struct buf *text,
+ int level, void *opaque);
+ void (*hrule)(struct buf *ob, void *opaque);
+ void (*list)(struct buf *ob, struct buf *text, int flags, void *opaque);
+ void (*listitem)(struct buf *ob, struct buf *text,
+ int flags, void *opaque);
+ void (*paragraph)(struct buf *ob, struct buf *text, void *opaque);
+ /* span level callbacks - NULL or return 0 prints the span verbatim */
+ int (*autolink)(struct buf *ob, struct buf *link,
+ enum mkd_autolink type, void *opaque);
+ int (*codespan)(struct buf *ob, struct buf *text, void *opaque);
+ int (*double_emphasis)(struct buf *ob, struct buf *text,
+ char c, void *opaque);
+ int (*emphasis)(struct buf *ob, struct buf *text, char c,void*opaque);
+ int (*image)(struct buf *ob, struct buf *link, struct buf *title,
+ struct buf *alt, void *opaque);
+ int (*linebreak)(struct buf *ob, void *opaque);
+ int (*link)(struct buf *ob, struct buf *link, struct buf *title,
+ struct buf *content, void *opaque);
+ int (*raw_html_tag)(struct buf *ob, struct buf *tag, void *opaque);
+ int (*triple_emphasis)(struct buf *ob, struct buf *text,
+ char c, void *opaque);
+ /* renderer data */
+ const char *emph_chars; /* chars that trigger emphasis rendering */
+ void *opaque; /* opaque data send to every rendering callback */
+ };
+The first argument of a renderer function is always the output buffer,
+where the function is supposed to write its output. It's not necessarily
+related to the output buffer given to `markdown()` because in some cases
+render into a temporary buffer is needed.
+The last argument of a renderer function is always an opaque pointer, which
+is equal to the `opaque` member of `struct mkd_renderer`. The name
+"opaque" might not be well-chosen, but it means a pointer *opaque for the
+parser, **not** for the renderer*. It means that my parser passes around
+blindy the pointer which contains data you know about, in case you need to
+store an internal state or whatever. I have not found anything to put in
+this pointer in my example renderers, so it is set to NULL in the structure
+and never look at in the callbacks.
+`emph_chars` is a zero-terminated string which contains the set of
+characters that trigger emphasis. In regular markdown, emphasis is only
+triggered by '\_' and '\*', but in some extensions it might be useful to
+add other characters to this list. For example in my extension to handle
+`<ins>` and `<del>` spans, delimited respectively by "++" and "--", I have
+added '+' and '-' to `emph_chars`. The character that triggered the
+emphasis is then passed to `emphasis`, `double_emphasis` and
+`triple_emphasis` through the parameter `c`.
+Function pointers in `struct mkd_renderer` can be NULL, but it has a
+different meaning whether the callback is block-level or span-level. A null
+block-level callback will make the corresponding block disappear from the
+output, as if the callback was an empty function. A null span-level
+callback will cause the corresponding element to be treated as normal
+characters, copied verbatim to the output.
+So for example, to disable link and images (e.g. because you consider them
+as dangerous), just put a null pointer in `` and `rndr.image` and
+the bracketed stuff will be present as-is in the output. While a null
+pointer in `header` will remove all header-looking blocks. If you want an
+otherwise standard markdown-to-XHTML conversion, you can take the example
+`mkd_xhtml` struct, copy it into yoru own `struct mkd_renderer` and then
+assign NULL to `link` and `image` members.
+Moreover, span-level callbacks return an integer, which tells whether the
+renderer accepts to render the item (non-zero return value) or whether it
+should be copied verbatim (zero return value). This allows you to only
+accept some specific inputs. For example, my extension for `<ins>` and
+`<del>` spans asks *exactly* two '-' or '+' as delimiters, when `emphasis`
+and `triple_emphasis` are called with '-' or '+', they return 0.
+### Renderer examples
+While libupskirt is designed to perform only the parsing of markdown files,
+and to let you provide the renderer callbacks, a few renderers have been
+included, both to illustrate how to write a set of renderer functions and
+to allow anybody who do not need special extensions to use libupskirt
+without hassle.
+All the examples provided here comme with two flavors, `_html` producing
+HTML code (self-closing tags are rendered like this: `<hr>`), and `_xhtml`
+producing XHTML code (self-closing tags like `<hr />`).
+#### Standard markdown renderer
+`mkd_html` and `mkd_xhtml` implement standard Markdown to (X)HTML
+translation without any extension.
+#### Discount-ish renderer
+`discount_html` and `discount_xhtml` implement on top of the standard
+markdown *some* of the extensions found in Discount.
+Actually, all Discount extensions that are not provided here cannot be
+easily implemented in libupskirt without touching to the parsing code,
+hence they do not belong strictly to the renderer realm. However some
+(maybe all, not sure about tables) extensions can be implemented fairly
+easily with libupskirt by using both a dedicated renderer and some
+preprocessing to make the extension look like something closer to the
+original markdown syntax.
+Here is a list of all extensions included in these renderers:
+ - image size specitication, by appending " =(width)x(height)" to the link,
+ - pseudo-protocols in links:
+ * abbr:_description_ for `<abbr title="`_description_`">...</abbr>`
+ * class:_name_ for `<span class="`_name_`">...</span>`
+ * id:_name_ for `<a id="`_name_`>...</a>`
+ * raw:_text_ for verbatim unprocessed _text_ inclusion
+ - class blocks: blockquotes beginning with %_class_% will be rendered as a
+ `div` of the given class(es).
+#### Natasha's own extensions
+`nat_html` and `nat_xhtml` implement on top of Discount extensions some
+things that I need to convert losslessly my existing HTML into extended
+Here is a list of these extensions :
+ - id attribute for headers, using the syntax _id_#_Header text_
+ - class attribute for paragraphs, by butting class name(s) between
+ parenthesis at the very beginning of the paragraph
+ - `<ins>` and `<del>` spans, using respectively `++` and `--` as
+ delimiters (with emphasis-like restrictions, i.e. an opening delimiter
+ cannot be followed by a whitespace, and a closing delimiter cannot be
+ preceded by a whitespace).
+ - plain `<span>` without attribute, using emphasis-like delimiter `|`
+Follows an example use of all of them:
+ ###atx_id#ID was chosen to look nice in atx-style headers ###
+ setext_id#Though it will also work in setext-style headers
+ ----------------------------------------------------------
+ Here is a paragraph with --deleted-- and ++inserted++ text.
+ I use CSS rules to render poetry and other verses, using a plain
+ `<span>` for each verse, and enclosing each group of verses in
+ a `<p class="verse">`. Here is how it would look like:
+ (verse)|And on the pedestal these words appear:|
+ |"My name is Ozymandias, king of kings:|
+ |Look on my works, ye Mighty, and despair!"|

0 comments on commit b795385

Please sign in to comment.
Something went wrong with that request. Please try again.