Skip to content

Embedding Parser: Calling MD4C

Adeepa Gunathilake edited this page Feb 2, 2022 · 5 revisions

This page of the wiki explains how to call MD4C. md4c.h is the header file which should be #include in the caller's code. This header file is very well documented. So don't forget to read it too.


MD4C exposes a function called md_parse in the md4c.h header file. Users of the library should call this function to parse markdown text. Caller should provide several callback functions and markdown text. MD4C call the callbacks on events like entering a markdown block or leaving a markdown block etc.

Here's the function prototype:

int md_parse(const MD_CHAR* text, MD_SIZE size, const MD_PARSER* parser, void* userdata);
  • text - Pointer to the beginning of the markdown text to parse.
  • size - Size of the string text.
  • userdata - MD4C does not use this parameter. It simply pass this to the callbacks as it is. Application can use this to transfer data to the callbacks. Set to NULL if you don't want this.
  • parser - Pointer to a MD_PARSER struct which contains information about callback functions.

Note: MD_CHAR will take either WCHAR(if defined macro MD4C_USE_UTF16 and platform is windows) or char data type. MD_SIZE is a typedef to unsigned.

MD_PARSER struct

3rd parameter of md_parse function is a pointer to a variable of type MD_PARSER. This holds the information about the caller-provided callback functions for rendering.

MD_PARSER is defined as follows:

typedef struct MD_PARSER {
    unsigned abi_version; // Reserved. Set to zero.
    unsigned flags; // Dialect options. Bitmask of MD_FLAG_xxxx values.

    // Caller-provided rendering callbacks.

    int (*enter_block)(MD_BLOCKTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
    int (*leave_block)(MD_BLOCKTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
    int (*enter_span)(MD_SPANTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
    int (*leave_span)(MD_SPANTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
    int (*text)(MD_TEXTTYPE /*type*/, const MD_CHAR* /*text*/, MD_SIZE /*size*/, void* /*userdata*/);

    void (*debug_log)(const char* /*msg*/, void* /*userdata*/);
    void (*syntax)(void); // Reserved. Set to NULL.
} MD_PARSER;
  • abi_version: Set this to zero.
  • flags : Bitmask of flags. Refer md4c.h for more details.
  • enter_block : Function pointer. This should point to the function MD4C should call when entering a block.
  • leave_block : This should point to the function MD4C should call when leaving a block.
  • enter_span : This should point to the function MD4C should call when entering a span.
  • leave_span : This should point to the function MD4C should call when leaving span.
  • text : This should point to the function MD4C should call when reading actual text content. (Discussed below).
  • debug_log : Optional (may be NULL). If provided and something goes wrong, this function gets called. But note that this is intended for debugging and problem diagnosis for developers. Not suitable to get errors to display at end user.
  • syntax : Set this to NULL.

md_parse function takes a void* parameter (last parameter) called userdata. As mentioned above, MD4C does not use this, but it simply pass it to the callbacks. This will be is passed to the last parameter(userdata) of caller-provided rendering callback.MD4C pass the type of a block, span or a text to the 1st parameter of all rendering callbacks, which is referred to as type in the above code block (see below). The parameter referred to as detail will receive additional information about the relevant block or span (see below). The text callback will receive a pointer to the beginning of the actual text content.

Note well that any strings provided to the callbacks as their arguments or as members of any detail structure are generally not zero-terminated. Application has to take the respective size information into account.

How and when the callbacks are called

Callbacks enter_block, leave_block, enter_span, leave_span won't receive any render-able textual content. Those callbacks are called to trigger the application that MD4C encountered a certain block or a span, so application can render the text it receive in the text callback with a desired style. Only text callback receive the actual content that should be rendered.

So for an example, MD4C will call the enter_span when it encounter a bold text. When callback identified the span MD4C just entered as a bold text by referring to the values MD4C pass in parameters, application can do something like opening a <b> tag etc. Then when MD4C call the text callback, application know the text it retrieved should be bold. When MD4C leaves that bold text, MD4C calls the leave_span to signal the application that it left the bold text, so callback can do relevant styling like closing the opened <b> tag etc.

Blocks and spans

MD4C arrange the markdown content in a tree like structure. Text content contained in a span. A sequence of spans forms a block like paragraph or list item.

Quoting from a comment in md4c.h:

Span represents an in-line piece of a document which should be rendered with the same font, color and other attributes. A sequence of spans forms a block like paragraph or list item.

But note that normal text, i.e completely normal text without any styles like text in paragraphs aren't wrapped in a span. As a result, MD4C call the text callback to give caller the "normal" text without calling enter_span for them.

MD4C tell the client application the type of block/span it just entered or left via type parameter.

For the sake of understanding, here are some examples for block types defined in md4c.h,

  • MD_BLOCK_DOC (Document body)
  • MD_BLOCK_QUOTE (Quotes)
  • MD_BLOCK_UL (Unordered list)

Here are the span types,

  • MD_SPAN_EM for the <em> type (emphasize)
  • MD_SPAN_STRONG for bold (<strong>)
  • MD_SPAN_IMG

Note that the provided examples above are not complete. They are presented hoping the reader can get the idea of blocks and spans. Please see the md4c.h header for more info.

Detail

There is a block type for headings defined as MD_BLOCK_H which will be passed to the callbacks when MD4C is entering or leaving a "heading" block. But how the caller is supposed to know which type of heading it is? Is it a # Big heading or a ### Not so big heading? To provide a solution to this problem MD4C pass a void pointer to the rendering callbacks as the second parameter (referred as detail in the above code block). This pointer points to a struct containing additional information about the block or span. md4c.h includes several definitions of "detail" structs.

Several examples are:

  • MD_BLOCK_UL_DETAIL
  • MD_BLOCK_H_DETAIL
  • MD_SPAN_A_DETAIL
  • MD_SPAN_IMG_DETAIL

Here's the definition of MD_BLOCK_H_DETAIL:

typedef struct MD_BLOCK_H_DETAIL {
    unsigned level; // Header level (1 - 6)
} MD_BLOCK_H_DETAIL;

Examples aren't complete. Please refer to md4c.h to see all structs and the way they are defined.

Because there are many types of "detail" structs and they are defined differently, application should determine to which struct it should cast the void pointer by considering the block/span type.

Clone this wiki locally