Skip to content
/ orgjs Public

A javascript library to parse (and render) org-mode files

Notifications You must be signed in to change notification settings

glmxndr/orgjs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Org-Mode Javascript Parser

This project aims to provide a parser and easily customizable renderer for Org-Mode files in JavaScript.

Org : the Main object

The global context is extended with only one object, named Org.

Org.API : API

Org.Config : configuration

URL protocols

Tab width

Specifies how much single spaces for each tabulation character. 4 by default.

Org.Parser : the general parser

This section describes the general Org document parser.

Parser : the object to be returned by Org.getParser

The parser creates a tree of Org =Node=s. It includes the referenced external files and generates a tree of nodes, each of them recursively parsed with the Content parser.

Including external files

This section deals with the #\+INCLUDE: tags, which allow to load another Org file into the current file.

There are basically two strategies to include a file:

HTTP GET
if we detect that we’re in a browser with jQuery, we use that to get the content from the included file with a GET request to the server, using the path in the include tag as a relative path to the current file being processed.
File system read
if we detect that we’re in Node.js (presence of the ‘fs’ module), we read the file having a relative path to the current Org file given in the include tag.

This behaviour is not coded here, though, it relies on the behaviour of the _U.get() function.

Include object

Parsing the include lines

Rendering the included content

  • Loading the content from the location
  • Modifying the headlines levels (if :minlevel has been set)
  • Generating the included content from the fetched lines
  • Enclosing in a BEGIN/END block if needed

Org.Outline : the outline/headlines parser

This section describes the outline parser.

Node

Objects representing the headlines and their associated content (including sub-nodes)

Node.tocnum

Counting the documents generated in this page. Helps to generate an ID for the nodes when no docid is given in the root node.

Node.prototype

  • parseContent()
  • repr() provides a representation of the node’s path
  • addFootnoteDef

Parsing nodes

Headline

Headline embeds the parsing of a heading line (without the subcontent).

NodeParser

Parsing a whole section

  • Returns the heading object for this node
  • Returns the map of headers (defined by “#+META: …” line definitions)
  • Returns the properties as defined in the :PROPERTIES: field
  • Returns the whole content without the heading nor the subitems
  • Returns the content only : no heading, no properties, no subitems, no clock, etc.
  • Extracts all the “”#+HEADER: Content” lines
    • at the beginning of the given text, and returns a map
    • of HEADER => Content
  • Returns the given text without the “#+HEADER: Content” lines at the beginning

The returned object

Org.Content : the content parser

This section describes the parser for the actual content within the sections of the org file.

Content is the object returned by this function.

Types of lines

LineDef is the object containing line definitions. All lines of the Org file will be treated sequencially, and their type will determine what to do with it. Line types are given an id property: a number identifying them.

  • Function which determines the type from the given line. A minimal caching system is provided, since the function will be called several times for the same line, so we keep the result of the last call for a given input. The function will only compare the line with regexps.
  • Function which determines the level of indentation of a line.

Blocks

Container block

This kind of block is abstract: many other blocks inherit from it, and it will not be used as is. It provides functionality for blocks which contain other sub-blocks. It contains an array of children, containing the children blocks.

Root block

This block represents the root content under a headline of the document. It is the highest container directly under the headline node.

Generic content block

Generic content with markup block

Paragraph block

Simple example blocks

These are blocks with lines prepended with a colon:

: This is a simple example.
: <- here are the colons...

Ignored line (starting with a hash)

Footnote definition block

Generic Begin/End block

Quote block

Verse block

Centered-text block

Example block

Source code block

HTML block

Comment block

Generic List Item block

Unordered List block

A new list block is created when we encounter a list item line. The logic would be that a list item be created instead, but the list item needs a list block container. So that’s actually a list block that the line triggers, and the block is in charge to create a first list item child, and to consume all the other items.

Unoredered List Item block

Ordered List block

Ordered list item block

Definition List block

DlistItem block

Parsing the content

Markup parser

This file describes the OrgMode wiki-style markup parsing.

The parsing strategy differs in some ways from the original Org:

  • emphasis markup (bold, italic, underline, strike-through) are recursive, and can be embedded one in (they can also contain code/verbatim inline items)
  • the delimiting characters for the emphasis/code/verbatim markup are not configurable as they are in the OrgMode implementation
  • subscript and superscript are mandatorily used with curly braces

Link management

Link type definitions

Link object

Footnote references

Footnotes have definitions as blocks in the Content section. This section deals only with footnote references from within the markup.

Sub/sup markup

Timestamp markup

Typographic markup

EmphMarkers : emphasis marker abstract object

Inline nodes containing either inline nodes or raw textual content

makeInline

Purpose
Creates an inline node object
Arguments
constr
constructor for the object to build ; should build an object with a consume() property
parent
parent of the node to build
inner
textual content the new inline node has to parse as subnodes

EmphInline : abstract high-level inline node

End-point node types

Basic inline types containing raw text content. Can not contain anything else than text content.

EmphRaw : basic text

Recursing nodes

These nodes contain other sub nodes (either EmphRaw, other EmphInline subtypes, =Link=s, etc.).

EmphItalic : recursing node

EmphBold : recursing node

EmphUnderline : recursing node

EmphStrike : recursing node

LaTeXInline : non-recursing node

EmphCode : code example

EmphVerbatim : unedited content

Parsing the paragraph content

Replacing code/verbatim parts with unique tokens

Before dealing with emphasis markup, we replace the code/verbatim parts with textual tokens which will be replaced in the end by their corresponding tree item. These tokens are stored in the tokens local variable.

Replacing \LaTeX inline markup

These inline items are possibly:

  • enclosed in dollar signs (\$)
  • enclosed in backslash-parens (\\(...\\))
  • enclosed in backslash-brackets (\\[...\\])
Replacing code/verbatim markup

These inline items are possibly:

for code
enclosed in \= signs
for verbatim
enclosed in \~ signs
Replacing timestamp markup

These items are possibly:

activated
<yyyy-MM-dd (weekday.)? (hh:mm)?>
deactivated
[yyyy-MM-dd (weekday.)? (hh:mm)?]
Replacing sub/sup markup

These items are possibly:

for sub
defined by underscore and cury braces (\_{...})
for sup
defined by caret and cury braces (\^{...})

This behaviour should evolve to deal with the possiblity to skip the curly braces. For now, since it may conflict with the underscore markup, this part is left for later. Consider the org-option #+OPTIONS: ^:{} to be mandatory.

Replacing links
Replacing footnote definitions
Processing emphasis markup (bold, italic, etc.)
Reinjecting saved tokens

Org.Regexps : the regexp bank

The parser needs a lot of regular expressions. Non trivial regexps will be found in the file org.regexps.js, and accessible under the object Org.Regexps.

  • A new line declaration, either windows or unix-like
  • Captures the first line of the string
  • Selects anything in the given string until the next heading, or the end. Example :
    some content
    * next heading
        

    would match “some content\n\n*” Captures everything except the star of the following heading.

  • Parses a heading line, capturing :
    • the stars
    • the TODO status
    • the priority
    • the heading title
    • the tags, if any, separated by colons
  • How a meta information begins ( #\+META_KEY: )
  • A meta information line, capturing:
    • the meta key,
    • the meta value

    Example:

    #+TITLE: The title
        

    captures: “TITLE”, “The title”

  • The property section. Captures the content of the section.
  • Property line. Captures the KEY and the value.
  • Clock section when several clock lines are defined.
  • Matches a clock line, either started only, or finished. Captures:
    • start date (yyyy-MM-dd)
    • start time (hh:mm)
    • end date (yyyy-MM-dd)
    • end time (hh:mm)
    • duration (hh:mm)
  • Scheduled
  • Deadline
  • The different kinds of lines encountered when parsing the content

Org.Utils : useful functions

Many functionalities are used throughout the parser, mainly to process strings. The Org.Utils object contains these functions.

Testing for presence of Node fs module

Built-in object modifications

We try to remain as light as possible, only adding functionalities that may already be present in certain versions of Javascript.

Object.create implementation if not present

Array.prototype.indexOf implementation if not present

Utils object to be returnedn aliased as _U.

  • extend() is a function to be attached to prototypes, for example, to allow easy addition of features.
    var Type = function(){};
    Type.prototype.extend = _U.extend;
    Type.prototype.extend({
      some: function(){},
      neet: function(){}
    });
        
  • merge() resembles extend() but allows to merge several objects into a brand new one.
    var one   = {a:1, b:1};
    var two   = {a:2, c:3};
    var three = _U.merge(one, two);
    
    assertEquals(2, three.a);
    assertEquals(1, three.b);
    assertEquals(3, three.c);
        
  • array(o) makes an “official” Array out of an array-like object (like function arguments)
  • range() returns an array of numbers, built depending on the arguments
    • 1 argument : 0 to the argument, incrementing if positive, decrementing if negative
    • 2 arguments : arg[0] to arg[1], incrementing or decrementing,
    • 3 arguments: arg[0] to arg[1], incrementing by arg[3]
  • trim(str) : trimming a string, always returning a string (never return null or unusable output)
  • unquote(str) : if the input is inserted in quotes (=’) or double quotes (”=), remove them ; return input if enclosing quotes not found.
  • empty(o) tells if a given string or array is empty (more exactly, tells if the length property of the argument is falsy)
  • notEmpty(o) is the inverse of empty
  • blank(str) tells if the given string has only blank characters
  • notBlank(str) is the inverse of blank
  • repeat(str, times) repeats the given string n times
  • =each(arr, fn)=applies a function for each element of the given array or object
  • =map(arr, fn)=applies the given function for each element of the given array or object, and returns the array of results
  • filter(arr, fn) applies the given function for each element of the given array or object, and returns the array of filtered results
  • log(obj) logs the given argument (relies on console.log, does nothing if not present)
  • firstLine(str) returns the first line of the given string
  • lines(str) splits the given string in lines, returns the array of lines without the trailing line feed
  • randomStr(length, chars) returns a random string of given length
  • keys(obj) returns an array of the keys of the given object
  • returns the keys of the given object joined with the given delimiter
  • getAbsentToken(str, prefix) returns a random token not present in the given string
  • URI-style path utilities
  • parent(path) gets the parent of the given path
  • concat concatenates path pieces into a valid path (normalizing path separators)
  • get() gets the content from a given location :
    • through AJAX if jQuery is detected,
    • through node.js filesystem if node.js is detected,
    • returning null if nothing found
  • _U.noop() is (slightly) shorter to write than function(){}
  • incrementor() provides an incrementor function, starting from 0 or the given argument
  • id() returns a unique identifier
  • bind() mimics the Function.bind
  • incr is the default incrementor

_U.TreeNode is the basic type for the items in the tree of the parsed documents

Access the parent with the .parent property.

Access the children with the .children property.

Helper functions to manipulate / navigate through the tree.

  • ancestors() provides the array of the ancestors of the current node, closest first
  • root() provides the root of the tree (last of ancestors)
  • leaf() tells if the node has children or not
  • siblings() provides all the siblings (this node excluded)
  • siblingsAll() provides all the siblings (this node included)
  • prev() provides the previous item, or null
  • prevAll() provides all the previous items (in the same order as siblings, closest last)
  • next() provides the next item, or null
  • lastAll() provides all the next items (in the same order as siblings, closest first)
  • append() adds a new child at the end of the children array
  • prepend() adds a new child at the beginning of the children array

_U.Timestamp : wrapper around Javascript Date

This object allows to parse and format dates. Only the parameters actually provided by the Org timestamps are parsed/formatted for now, and only as numbers (no locale management for textual output of weekdays or months).

Add configuration entry to deal with textual repr. of weekdays and months

Add text-formatting options for weekdays and months

Wrapper around date

This object is a wrapper around the Javascript Date object. Access the Date instance through the date property.

Proprieties

date
the corresponding Javascript date
year
the year
month
the month (1-12)
day
the day (1-31)
hour
the hour (0-23)
minute
the minute (0-59)

Prototype functions

parse()

Parses a timestamp at the Org format (for instance 2010-01-30 12:34). This function is called by the constructor.

format()

Formats the timestamp in the Unix-date fashion. Only a few flags are supported.

  • %H : the 2-digit hour (00-23)
  • %k : the hour (0-23)
  • %I : the 2-digit hour (01-12)
  • %l : the hour (1-12)
  • %M : the 2-digit minutes (00-59)
  • %S : the 2-digit seconds (00-59)
  • %y : the 2-digit year
  • %Y : the 4-digit year
  • %m : the 2-digit month (01-12)
  • %d : the 2-digit day (01-31)
  • %e : the day (1-31)

—orgdoc*

OrgPath

An XPath-like language to select items in the Org document tree.

This allows to provide a selection mechanism to apply templates to nodes at rendering time.

Path examples

Just to give a feeling of the selecting language, here are a few examples:

*
any item whatsoever
node, node{*}
any node, an any level
n{*}, n
any node, ‘n’ being shortcut for ‘node’
n3, n{3}
any node of level 3
n{1-3}, n3[level~1-3]
any node of level 1 to 3
n3:tag
any node of level 3 with a tag “tag” (possibly implied by parents)
n3!tag
any node of level 3 with a tag “tag” defined at this node
n3[position\=2]
any second node of level 3 within its parent
n3[2]
any second node of level 3 within its parent
n3[todo\=DONE]
any node of level 3 with a “DONE” todo-marker
n3/src1, n3/src{1}, n3/src[level~1-3]
any BEGIN_SRC item right under a node of level 3
n3/src
any BEGIN_SRC item within the content a node of level 3
n3//src
any BEGIN_SRC item anywhere under a node of level 3
src
any BEGIN_SRC item anywhere
src[lang\=js]
any BEGIN_SRC item anywhere whith language set as ‘js’
src>p
first paragraph following a BEGIN_SRC item
src>>p
any paragraph following a BEGIN_SRC item
src<p
first paragraph preceding a BEGIN_SRC item
src<<p
any paragraph preceding a BEGIN_SRC item
src/..
parent of a BEGIN_SRC item

Default Rendering

This section provides a default HTML renderer for the parsed tree.

It is intended to provide an example of how to attach rendering functions to the Outline.Node’s and the different Content.Block’s prototypes.

Initialisations

Working in the context of the Org object. We will need, as usual, some shortcuts to the Utils, and to Org.Content and Org.Outline.

renderChildren

Purpose
provides a utility function to render all the children of a Node or a Block.
Arguments
node, renderer
Usage
must be called with .call(obj) to provide the value for this. this must have an enumerable children property.

render

Purpose
provides a utility function to renders a node with the given renderer
Arguments
node, renderer

Rendering inline items

IgnoredLine

EmphInline

Should not be used, EmphInline is abstract…

EmphRaw

EmphCode

EmphVerbatim

EmphItalic

EmphBold

EmphUnderline

EmphStrike

LaTeXInline

Link

FootNoteRef

SubInline

SupInline

TimestampInline

Rendering blocks

This sections contains the code for the different types of instanciable blocks defined in

We will attach a, until now undefined, render property to these block prototypes. None of these function take any argument, all the information they need being in the block object they will act upon through the this value.

The container blocks (those whose constructor calls the ContainerBlock constructor) all use the renderChildren function.

The content blocks (those whose constructor calls the ContentBlock constructor) should use their this.lines array.

Rendering RootBlock

RootBlock=s are rendered with a =div tag, with class org_content.

Rendering UlistBlock

UlistBlock=s are rendered with a simple =ul tag.

Rendering OlistBlock

OlistBlock=s are rendered with a simple =ol tag.

If the block has a start property different from 1, it is inserted in the start attribute of the tag.

Rendering DlistBlock

DlistBlock=s are rendered with a =dl tag.

DlistItemBlock=s will have to use =dt=/=dd structure accordingly.

Rendering UlistItemBlock and OlistItemBlocks

UlistItemBlock=s and =0listItemBlocks are rendered with a #simple li tag.

Rendering DlistItemBlock

DlistItemBlock=s are rendered with a =dt=/=dl tag structure.

The content of the dt is the title attribute of the block.

The content of the dd is the rendering of this block’s children.

Rendering ParaBlock

ParaBlock=s are rendered with a =p tag.

The content of the tag is the concatenation of this block’s this.lines, passed to the renderMarkup function.

Rendering VerseBlock

VerseBlock=s are rendered with a =p tag, with class verse.

All spaces are converted to unbreakable spaces.

All new lines are replaced by a br tag.

Rendering QuoteBlock

QuoteBlock=s are rendered with a =blockquote tag.

If the quote contains an author declaration (after a double dash), this declaration is put on a new line.

Rendering CenterBlock

CenterBlock=s are rendered with a simple =center tag.

Rendering ExampleBlock

ExampleBlock=s are rendered with a simple =pre tag.

The content is not processed with the renderMarkup function, only with the escapeHtml function.

Rendering SrcBlock

SrcBlock=s are rendered with a =pre.src tag with a code tag within. The code tag may have a class attribute if the language of the block is known. In case there is, the class would take the language identifier.

The content is not processed with the renderMarkup function, only with the escapeHtml function.

Rendering HtmlBlock

=HtmlBlock=s are rendered by simply outputting the HTML content verbatim, with no modification whatsoever.

Rendering CommentBlock

=CommentBlock=s are ignored.

Rendering headlines

Here we render headlines, represented by Outline.Node objects.

A section tag is used, with class orgnode, and a level. The id attribute is the computed id corresponding to a unique TOC identifier.

The title is in a div.title element. Each tag is represented at the end of this element by a span.tag element.

The content of the node (the RootBlock associated to this headline) is rendered.

Then the subheadlines are rendered using the renderChildren function.

Utility functions

escapeHtml(str)

Purpose
The escapeHtml function escapes the forbidden characters in HTML/XML: &, >, <, =’= and =”=, which are all translated to their corresponding entity.
Arguments
str
any value, converted into a string at the beginning of the function.

unBackslash(str)

Purpose
Utility to unescape the characters of the given raw content
Arguments
str
any value, converted into a string at the beginning of the function.

htmlize(str, renderer)

Purpose
Unbackslash after having escaped HTML
Arguments
str
any value, converted into a string at the beginning of the function.

Rendering inline items

IgnoredLine

EmphInline

Should not be used, EmphInline is abstract…

EmphRaw

EmphCode

EmphVerbatim

EmphItalic

EmphBold

EmphUnderline

EmphStrike

LaTeXInline

Link

FootNoteRef

SubInline

SupInline

TimestampInline

Rendering blocks

This sections contains the code for the different types of instanciable blocks defined in

We will attach a, until now undefined, render property to these block prototypes. None of these function take any argument, all the information they need being in the block object they will act upon through the this value.

The container blocks (those whose constructor calls the ContainerBlock constructor) all use the renderChildren function.

The content blocks (those whose constructor calls the ContentBlock constructor) should use their this.lines array.

Rendering RootBlock

RootBlock=s are rendered with a =div tag, with class org_content.

Rendering UlistBlock

UlistBlock=s are rendered with a simple =ul tag.

Rendering OlistBlock

OlistBlock=s are rendered with a simple =ol tag.

If the block has a start property different from 1, it is inserted in the start attribute of the tag.

Rendering DlistBlock

DlistBlock=s are rendered with a =dl tag.

DlistItemBlock=s will have to use =dt=/=dd structure accordingly.

Rendering UlistItemBlock and OlistItemBlocks

UlistItemBlock=s and =0listItemBlocks are rendered with a #simple li tag.

Rendering DlistItemBlock

DlistItemBlock=s are rendered with a =dt=/=dl tag structure.

The content of the dt is the title attribute of the block.

The content of the dd is the rendering of this block’s children.

Rendering ParaBlock

ParaBlock=s are rendered with a =p tag.

The content of the tag is the concatenation of this block’s this.lines, passed to the renderMarkup function.

Rendering VerseBlock

VerseBlock=s are rendered with a =p tag, with class verse.

All spaces are converted to unbreakable spaces.

All new lines are replaced by a br tag.

Rendering QuoteBlock

QuoteBlock=s are rendered with a =blockquote tag.

If the quote contains an author declaration (after a double dash), this declaration is put on a new line.

Rendering CenterBlock

CenterBlock=s are rendered with a simple =center tag.

Rendering ExampleBlock

ExampleBlock=s are rendered with a simple =pre tag.

The content is not processed with the renderMarkup function, only with the escapeHtml function.

Rendering SrcBlock

SrcBlock=s are rendered with a =pre.src tag with a code tag within. The code tag may have a class attribute if the language of the block is known. In case there is, the class would take the language identifier.

The content is not processed with the renderMarkup function, only with the escapeHtml function.

Rendering HtmlBlock

=HtmlBlock=s are rendered by simply outputting the HTML content verbatim, with no modification whatsoever.

Rendering CommentBlock

=CommentBlock=s are ignored.

Rendering headlines

Here we render headlines, represented by Outline.Node objects.

A section tag is used, with class orgnode, and a level. The id attribute is the computed id corresponding to a unique TOC identifier.

The title is in a div.title element. Each tag is represented at the end of this element by a span.tag element.

The content of the node (the RootBlock associated to this headline) is rendered.

Then the subheadlines are rendered using the renderChildren function.

About

A javascript library to parse (and render) org-mode files

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages