Skip to content

Embedding Parser: Calling MD4C

Adeepa Gunathilake edited this page Feb 1, 2022 · 5 revisions

This page of the wiki explains how to call MD4C. md4c.h is the header file which should be #include in the caller's code. This header file is very well documented. So don't forget to read it too.


MD4C exposes a function called md_parse in the md4c.h header file. Users of the library should call this function to parse markdown text. Caller should provide several callback functions and markdown text. MD4C call the callback functions upon events like entering a markdown block or leaving a markdown block etc.

Here's the function prototype:

int md_parse(const MD_CHAR* text, MD_SIZE size, const MD_PARSER* parser, void* userdata);
  • text - Pointer to the beginning of the markdown text to parse.
  • size - Size of the string text.
  • userdata - MD4C does not use this parameter for parsing. It simply pass this to the callback functions as it is. Callback functions can use this parameter to transfer data to the callback functions.
  • parser - Pointer to a MD_PARSER struct which contains information about callback functions.

Note: MD_CHAR will take either WCHAR(if defined macro MD4C_USE_UTF16 and platform is windows) or char data type. MD_SIZE is a typedef to unsigned.

MD_PARSER struct

3rd parameter of md_parse function is a pointer to a variable of MD_PARSER data type. This holds the information about the caller-provided callback functions for rendering. MD_PARSE is a struct defined in the md4c.h header file.

MD_PARSER is defined as follows:

typedef struct MD_PARSER {
    unsigned abi_version; // Reserved. Set to zero.
    unsigned flags; // Dialect options. Bitmask of MD_FLAG_xxxx values.

    // Caller-provided rendering callbacks.

    int (*enter_block)(MD_BLOCKTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
    int (*leave_block)(MD_BLOCKTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
    int (*enter_span)(MD_SPANTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
    int (*leave_span)(MD_SPANTYPE /*type*/, void* /*detail*/, void* /*userdata*/);
    int (*text)(MD_TEXTTYPE /*type*/, const MD_CHAR* /*text*/, MD_SIZE /*size*/, void* /*userdata*/);

    void (*debug_log)(const char* /*msg*/, void* /*userdata*/);
    void (*syntax)(void); // Reserved. Set to NULL.
} MD_PARSER;
  • abi_version: Set this to zero.
  • flags : Bitmask of flags. Discussed later.
  • enter_block : Function pointer. This should point to the function MD4C should call when entering a block.
  • leave_block : This should point to the function MD4C should call when leaving a block.
  • enter_span : This should point to the function MD4C should call when entering a span.
  • leave_span : This should point to the function MD4C should call when leaving span.
  • text : Function pointer. Should point to the function MD4C should call when reading actual text content. (Discussed below).
  • debug_log : Optional (may be NULL). If provided and something goes wrong, this function gets called. But note that this is intended for debugging and problem diagnosis for developers. Not suitable to get errors to display at end user.
  • syntax : Set this to NULL.

md_parse function takes a void* type parameter (last parameter) called userdata. As mentioned above, MD4C does not use this parameter but it simply pass it to the callback functions. This parameter is passed to the last parameter of caller-provided rendering callback. MD4C pass the type of a block, span or a text to the 1st parameter of all rendering callbacks, which is referred to as type in the above code block (see below). The parameter referred to as detail will receive additional information about the relevant block or span (see below). The text callback will receive the pointer to the beginning of the actual text content.

Note well that any strings provided to the callbacks as their arguments or as members of any detail structure are generally not zero-terminated. Application has to take the respective size information into account.

How and when the callbacks are called

Callbacks enter_block, leave_block, enter_span, leave_span won't receive any renderable textual contents. Those functions are called to trigger the caller that MD4C encountered a certain block or a span, so callers can render the text it receive in the text callback with a desired style. Only text callback receive the actual content that should be rendered.

So for an example, MD4C will call the enter_span, passing corresponding values to the parameters when it encounter a bold text. When callback identified () the span MD4C just entered is a bold text,by referring to the values MD4C pass in parameters, caller can do something like opening a <b> tag etc. Then when MD4C call the text callback, caller know the text it retrieved should be bold. When MD4C leaves that bold text, MD4C calls the leave_span to signal the caller that it left the bold text. So callback can do relevant styling like closing the opened <b> tag etc.

Blocks and spans

MD4C arrange the markdown content in a tree like structure. Text content contained in a span. A sequence of spans forms a block like paragraph or list item.

Quoting from a comment in md4c.h:

Span represents an in-line piece of a document which should be rendered with the same font, color and other attributes. A sequence of spans forms a block like paragraph or list item.

But note that normal text, i.e completely normal text without any styles like text in paragraphs aren't wrapped in a span. As a result, MD4C call the text callback to give caller the "normal" text without calling enter_span for them.

For the sake of understanding, here are some examples for block types defined in md4c.h,

  • MD_BLOCK_DOC (Document body)
  • MD_BLOCK_QUOTE (Quotes)
  • MD_BLOCK_UL (Unordered list)

Here are the span types,

  • MD_SPAN_EM for the <em> type (emphasize)
  • MD_SPAN_STRONG for bold (<strong>)
  • MD_SPAN_IMG

Note that the provided examples above are not complete. They are presented hoping the reader can get the idea of blocks and spans referring to them. Please see the md4c.h header file where above data types are defined (as enums). Header is well documented.

Detail

There is a block type for headings defined as MD_BLOCK_H in md4c.h which will be passed to the callback function when MD4C is entering or leaving a "heading" block. But how the caller is supposed to know which type of heading it is? Is it a # Big heading or a ### Not so big heading? To provide a solution to this problem MD4C pass a void pointer to the rendering callbacks as the second parameter (referred as detail in above code block). This pointer points to a struct containing additional information about the block or span. md4c.h includes several definitions of "detail" structs.

Several examples are:

  • MD_BLOCK_UL_DETAIL
  • MD_BLOCK_H_DETAIL
  • MD_SPAN_A_DETAIL
  • MD_SPAN_IMG_DETAIL

Here's the implementation of MD_BLOCK_H_DETAIL:

typedef struct MD_BLOCK_H_DETAIL {
    unsigned level; // Header level (1 - 6)
} MD_BLOCK_H_DETAIL;

Examples aren't complete. Please refer to md4c.h to see all structs and the way they are defined.

Because there are many types of "detail" structs and they are defined differently, caller should determine to which struct it should cast the void pointer by considering the block/span type.

Clone this wiki locally