Org-Mode Javascript Parser

This project aims to provide a parser and easily customizable renderer for Org-Mode files in JavaScript.

Org : the Main object

The global context is extended with only one object, named Org.


Org.Config : configuration

URL protocols

Tab width

Specifies how much single spaces for each tabulation character. 4 by default.

Org.Parser : the general parser

This section describes the general Org document parser.

Parser : the object to be returned by Org.getParser

The parser creates a tree of Org =Node=s. It includes the referenced external files and generates a tree of nodes, each of them recursively parsed with the Content parser.

Including external files

This section deals with the #\+INCLUDE: tags, which allow to load another Org file into the current file.

There are basically two strategies to include a file:

if we detect that we’re in a browser with jQuery, we use that to get the content from the included file with a GET request to the server, using the path in the include tag as a relative path to the current file being processed.
File system read
if we detect that we’re in Node.js (presence of the ‘fs’ module), we read the file having a relative path to the current Org file given in the include tag.

This behaviour is not coded here, though, it relies on the behaviour of the _U.get() function.

Include object

Parsing the include lines

Rendering the included content

  • Loading the content from the location
  • Modifying the headlines levels (if :minlevel has been set)
  • Generating the included content from the fetched lines
  • Enclosing in a BEGIN/END block if needed

Org.Outline : the outline/headlines parser

This section describes the outline parser.


Objects representing the headlines and their associated content (including sub-nodes)


Counting the documents generated in this page. Helps to generate an ID for the nodes when no docid is given in the root node.


  • parseContent()
  • repr() provides a representation of the node’s path
  • addFootnoteDef

Parsing nodes


Headline embeds the parsing of a heading line (without the subcontent).


Parsing a whole section

  • Returns the heading object for this node
  • Returns the map of headers (defined by “#+META: …” line definitions)
  • Returns the properties as defined in the :PROPERTIES: field
  • Returns the whole content without the heading nor the subitems
  • Returns the content only : no heading, no properties, no subitems, no clock, etc.
  • Extracts all the “”#+HEADER: Content” lines
    • at the beginning of the given text, and returns a map
    • of HEADER => Content
  • Returns the given text without the “#+HEADER: Content” lines at the beginning

The returned object

Org.Content : the content parser

This section describes the parser for the actual content within the sections of the org file.

Content is the object returned by this function.

Types of lines

LineDef is the object containing line definitions. All lines of the Org file will be treated sequencially, and their type will determine what to do with it. Line types are given an id property: a number identifying them.

  • Function which determines the type from the given line. A minimal caching system is provided, since the function will be called several times for the same line, so we keep the result of the last call for a given input. The function will only compare the line with regexps.
  • Function which determines the level of indentation of a line.


Container block

This kind of block is abstract: many other blocks inherit from it, and it will not be used as is. It provides functionality for blocks which contain other sub-blocks. It contains an array of children, containing the children blocks.

Root block

This block represents the root content under a headline of the document. It is the highest container directly under the headline node.

Generic content block

Generic content with markup block

Paragraph block

Simple example blocks

These are blocks with lines prepended with a colon:

: This is a simple example.
: <- here are the colons...

Ignored line (starting with a hash)

Footnote definition block

Generic Begin/End block

Quote block

Verse block

Centered-text block

Example block

Source code block

HTML block

Comment block

Generic List Item block

Unordered List block

A new list block is created when we encounter a list item line. The logic would be that a list item be created instead, but the list item needs a list block container. So that’s actually a list block that the line triggers, and the block is in charge to create a first list item child, and to consume all the other items.

Unoredered List Item block

Ordered List block

Ordered list item block

Definition List block

DlistItem block

Parsing the content

Markup parser

This file describes the OrgMode wiki-style markup parsing.

The parsing strategy differs in some ways from the original Org:

  • emphasis markup (bold, italic, underline, strike-through) are recursive, and can be embedded one in (they can also contain code/verbatim inline items)
  • the delimiting characters for the emphasis/code/verbatim markup are not configurable as they are in the OrgMode implementation
  • subscript and superscript are mandatorily used with curly braces

Link management

Link type definitions

Link object

Footnote references

Footnotes have definitions as blocks in the Content section. This section deals only with footnote references from within the markup.

Sub/sup markup

Timestamp markup

Typographic markup

EmphMarkers : emphasis marker abstract object

Inline nodes containing either inline nodes or raw textual content


Creates an inline node object
constructor for the object to build ; should build an object with a consume() property
parent of the node to build
textual content the new inline node has to parse as subnodes

EmphInline : abstract high-level inline node

End-point node types

Basic inline types containing raw text content. Can not contain anything else than text content.

EmphRaw : basic text

Recursing nodes

These nodes contain other sub nodes (either EmphRaw, other EmphInline subtypes, =Link=s, etc.).

EmphItalic : recursing node

EmphBold : recursing node

EmphUnderline : recursing node

EmphStrike : recursing node

LaTeXInline : non-recursing node

EmphCode : code example

EmphVerbatim : unedited content

Parsing the paragraph content

Replacing code/verbatim parts with unique tokens

Before dealing with emphasis markup, we replace the code/verbatim parts with textual tokens which will be replaced in the end by their corresponding tree item. These tokens are stored in the tokens local variable.

Replacing \LaTeX inline markup

These inline items are possibly:

  • enclosed in dollar signs (\$)
  • enclosed in backslash-parens (\\(...\\))
  • enclosed in backslash-brackets (\\[...\\])
Replacing code/verbatim markup

These inline items are possibly:

for code
enclosed in \= signs
for verbatim
enclosed in \~ signs
Replacing timestamp markup

These items are possibly:

<yyyy-MM-dd (weekday.)? (hh:mm)?>
[yyyy-MM-dd (weekday.)? (hh:mm)?]
Replacing sub/sup markup

These items are possibly:

for sub
defined by underscore and cury braces (\_{...})
for sup
defined by caret and cury braces (\^{...})

This behaviour should evolve to deal with the possiblity to skip the curly braces. For now, since it may conflict with the underscore markup, this part is left for later. Consider the org-option #+OPTIONS: ^:{} to be mandatory.

Replacing links
Replacing footnote definitions
Processing emphasis markup (bold, italic, etc.)
Reinjecting saved tokens

Org.Regexps : the regexp bank

The parser needs a lot of regular expressions. Non trivial regexps will be found in the file org.regexps.js, and accessible under the object Org.Regexps.

  • A new line declaration, either windows or unix-like
  • Captures the first line of the string
  • Selects anything in the given string until the next heading, or the end. Example :
    some content
    * next heading

    would match “some content\n\n*” Captures everything except the star of the following heading.

  • Parses a heading line, capturing :
    • the stars
    • the TODO status
    • the priority
    • the heading title
    • the tags, if any, separated by colons
  • How a meta information begins ( #\+META_KEY: )
  • A meta information line, capturing:
    • the meta key,
    • the meta value


    #+TITLE: The title

    captures: “TITLE”, “The title”

  • The property section. Captures the content of the section.
  • Property line. Captures the KEY and the value.
  • Clock section when several clock lines are defined.
  • Matches a clock line, either started only, or finished. Captures:
    • start date (yyyy-MM-dd)
    • start time (hh:mm)
    • end date (yyyy-MM-dd)
    • end time (hh:mm)
    • duration (hh:mm)
  • Scheduled
  • Deadline
  • The different kinds of lines encountered when parsing the content

Org.Utils : useful functions

Many functionalities are used throughout the parser, mainly to process strings. The Org.Utils object contains these functions.

Testing for presence of Node fs module

Built-in object modifications

We try to remain as light as possible, only adding functionalities that may already be present in certain versions of Javascript.

Object.create implementation if not present

Array.prototype.indexOf implementation if not present

Utils object to be returnedn aliased as _U.

  • extend() is a function to be attached to prototypes, for example, to allow easy addition of features.
    var Type = function(){};
    Type.prototype.extend = _U.extend;
      some: function(){},
      neet: function(){}
  • merge() resembles extend() but allows to merge several objects into a brand new one.
    var one   = {a:1, b:1};
    var two   = {a:2, c:3};
    var three = _U.merge(one, two);
    assertEquals(2, three.a);
    assertEquals(1, three.b);
    assertEquals(3, three.c);
  • array(o) makes an “official” Array out of an array-like object (like function arguments)
  • range() returns an array of numbers, built depending on the arguments
    • 1 argument : 0 to the argument, incrementing if positive, decrementing if negative
    • 2 arguments : arg[0] to arg[1], incrementing or decrementing,
    • 3 arguments: arg[0] to arg[1], incrementing by arg[3]
  • trim(str) : trimming a string, always returning a string (never return null or unusable output)
  • unquote(str) : if the input is inserted in quotes (=’) or double quotes (”=), remove them ; return input if enclosing quotes not found.
  • empty(o) tells if a given string or array is empty (more exactly, tells if the length property of the argument is falsy)
  • notEmpty(o) is the inverse of empty
  • blank(str) tells if the given string has only blank characters
  • notBlank(str) is the inverse of blank
  • repeat(str, times) repeats the given string n times
  • =each(arr, fn)=applies a function for each element of the given array or object
  • =map(arr, fn)=applies the given function for each element of the given array or object, and returns the array of results
  • filter(arr, fn) applies the given function for each element of the given array or object, and returns the array of filtered results
  • log(obj) logs the given argument (relies on console.log, does nothing if not present)
  • firstLine(str) returns the first line of the given string
  • lines(str) splits the given string in lines, returns the array of lines without the trailing line feed
  • randomStr(length, chars) returns a random string of given length
  • keys(obj) returns an array of the keys of the given object
  • returns the keys of the given object joined with the given delimiter
  • getAbsentToken(str, prefix) returns a random token not present in the given string
  • URI-style path utilities
  • parent(path) gets the parent of the given path
  • concat concatenates path pieces into a valid path (normalizing path separators)
  • get() gets the content from a given location :
    • through AJAX if jQuery is detected,
    • through node.js filesystem if node.js is detected,
    • returning null if nothing found
  • _U.noop() is (slightly) shorter to write than function(){}
  • incrementor() provides an incrementor function, starting from 0 or the given argument
  • id() returns a unique identifier
  • bind() mimics the Function.bind
  • incr is the default incrementor

_U.TreeNode is the basic type for the items in the tree of the parsed documents

Access the parent with the .parent property.

Access the children with the .children property.

Helper functions to manipulate / navigate through the tree.

  • ancestors() provides the array of the ancestors of the current node, closest first
  • root() provides the root of the tree (last of ancestors)
  • leaf() tells if the node has children or not
  • siblings() provides all the siblings (this node excluded)
  • siblingsAll() provides all the siblings (this node included)
  • prev() provides the previous item, or null
  • prevAll() provides all the previous items (in the same order as siblings, closest last)
  • next() provides the next item, or null
  • lastAll() provides all the next items (in the same order as siblings, closest first)
  • append() adds a new child at the end of the children array
  • prepend() adds a new child at the beginning of the children array

_U.Timestamp : wrapper around Javascript Date

This object allows to parse and format dates. Only the parameters actually provided by the Org timestamps are parsed/formatted for now, and only as numbers (no locale management for textual output of weekdays or months).

Add configuration entry to deal with textual repr. of weekdays and months

Add text-formatting options for weekdays and months

Wrapper around date

This object is a wrapper around the Javascript Date object. Access the Date instance through the date property.


the corresponding Javascript date
the year
the month (1-12)
the day (1-31)
the hour (0-23)
the minute (0-59)

Prototype functions


Parses a timestamp at the Org format (for instance 2010-01-30 12:34). This function is called by the constructor.


Formats the timestamp in the Unix-date fashion. Only a few flags are supported.

  • %H : the 2-digit hour (00-23)
  • %k : the hour (0-23)
  • %I : the 2-digit hour (01-12)
  • %l : the hour (1-12)
  • %M : the 2-digit minutes (00-59)
  • %S : the 2-digit seconds (00-59)
  • %y : the 2-digit year
  • %Y : the 4-digit year
  • %m : the 2-digit month (01-12)
  • %d : the 2-digit day (01-31)
  • %e : the day (1-31)



An XPath-like language to select items in the Org document tree.

This allows to provide a selection mechanism to apply templates to nodes at rendering time.

Path examples

Just to give a feeling of the selecting language, here are a few examples:

any item whatsoever
node, node{*}
any node, an any level
n{*}, n
any node, ‘n’ being shortcut for ‘node’
n3, n{3}
any node of level 3
n{1-3}, n3[level~1-3]
any node of level 1 to 3
any node of level 3 with a tag “tag” (possibly implied by parents)
any node of level 3 with a tag “tag” defined at this node
any second node of level 3 within its parent
any second node of level 3 within its parent
any node of level 3 with a “DONE” todo-marker
n3/src1, n3/src{1}, n3/src[level~1-3]
any BEGIN_SRC item right under a node of level 3
any BEGIN_SRC item within the content a node of level 3
any BEGIN_SRC item anywhere under a node of level 3
any BEGIN_SRC item anywhere
any BEGIN_SRC item anywhere whith language set as ‘js’
first paragraph following a BEGIN_SRC item
any paragraph following a BEGIN_SRC item
first paragraph preceding a BEGIN_SRC item
any paragraph preceding a BEGIN_SRC item
parent of a BEGIN_SRC item

Default Rendering

This section provides a default HTML renderer for the parsed tree.

It is intended to provide an example of how to attach rendering functions to the Outline.Node’s and the different Content.Block’s prototypes.


Working in the context of the Org object. We will need, as usual, some shortcuts to the Utils, and to Org.Content and Org.Outline.


provides a utility function to render all the children of a Node or a Block.
node, renderer
must be called with .call(obj) to provide the value for this. this must have an enumerable children property.


provides a utility function to renders a node with the given renderer
node, renderer

Rendering inline items



Should not be used, EmphInline is abstract…














