This project aims to provide a parser and easily customizable renderer for Org-Mode files in JavaScript.
The global context is extended with only one object, named Org
.
Specifies how much single spaces for each tabulation character. 4 by default.
This section describes the general Org
document parser.
The parser creates a tree of Org
=Node=s. It includes
the referenced external files and generates a tree of nodes,
each of them recursively parsed with the Content
parser.
This section deals with the #\+INCLUDE:
tags, which allow to load another
Org
file into the current file.
There are basically two strategies to include a file:
HTTP GET
- if we detect that we’re in a browser with jQuery, we use that to get the content from the included file with a GET request to the server, using the path in the include tag as a relative path to the current file being processed.
- File system read
- if we detect that we’re in Node.js (presence
of the ‘fs’ module), we read the file having a relative path to the current
Org
file given in the include tag.
This behaviour is not coded here, though, it relies on the behaviour of the
_U.get()
function.
- Loading the content from the location
- Modifying the headlines levels (if
:minlevel
has been set) - Generating the included content from the fetched lines
- Enclosing in a
BEGIN/END
block if needed
This section describes the outline parser.
Objects representing the headlines and their associated content (including sub-nodes)
Counting the documents generated in this page. Helps to generate an ID for the nodes when no docid is given in the root node.
parseContent()
repr()
provides a representation of the node’s pathaddFootnoteDef
Headline embeds the parsing of a heading line (without the subcontent).
Parsing a whole section
- Returns the heading object for this node
- Returns the map of headers (defined by “#+META: …” line definitions)
- Returns the properties as defined in the :PROPERTIES: field
- Returns the whole content without the heading nor the subitems
- Returns the content only : no heading, no properties, no subitems, no clock, etc.
- Extracts all the “”#+HEADER: Content” lines
- at the beginning of the given text, and returns a map
- of HEADER => Content
- Returns the given text without the “#+HEADER: Content” lines at the beginning
This section describes the parser for the actual content within the sections
of the org
file.
Content
is the object returned by this function.
LineDef
is the object containing line definitions. All lines of the Org
file
will be treated sequencially, and their type will determine what to do with it.
Line types are given an id
property: a number identifying them.
- Function which determines the type from the given line. A minimal caching system is provided, since the function will be called several times for the same line, so we keep the result of the last call for a given input. The function will only compare the line with regexps.
- Function which determines the level of indentation of a line.
This kind of block is abstract: many other blocks inherit from it, and it will not be used as is.
It provides functionality for blocks which contain other sub-blocks.
It contains an array of children
, containing the children blocks.
This block represents the root content under a headline of the document. It is the highest container directly under the headline node.
These are blocks with lines prepended with a colon:
: This is a simple example. : <- here are the colons...
A new list block is created when we encounter a list item line. The logic would be that a list item be created instead, but the list item needs a list block container. So that’s actually a list block that the line triggers, and the block is in charge to create a first list item child, and to consume all the other items.
This file describes the OrgMode
wiki-style markup parsing.
The parsing strategy differs in some ways from the original Org
:
- emphasis markup (bold, italic, underline, strike-through) are recursive, and can be embedded one in (they can also contain code/verbatim inline items)
- the delimiting characters for the emphasis/code/verbatim markup are
not configurable as they are in the
OrgMode
implementation - subscript and superscript are mandatorily used with curly braces
Footnotes have definitions as blocks in the Content
section. This section
deals only with footnote references from within the markup.
- Purpose
- Creates an inline node object
- Arguments
-
constr
- constructor for the object to build ;
should build an object with a
consume()
property parent
- parent of the node to build
inner
- textual content the new inline node has to parse as subnodes
Basic inline types containing raw text content. Can not contain anything else than text content.
These nodes contain other sub nodes (either EmphRaw
,
other EmphInline
subtypes, =Link=s, etc.).
Before dealing with emphasis markup, we replace the code/verbatim parts
with textual tokens which will be replaced in the end by their
corresponding tree item. These tokens are stored in the tokens
local variable.
These inline items are possibly:
- enclosed in dollar signs (
\$
) - enclosed in backslash-parens (
\\(...\\)
) - enclosed in backslash-brackets (
\\[...\\]
)
These inline items are possibly:
- for code
- enclosed in
\=
signs - for verbatim
- enclosed in
\~
signs
These items are possibly:
- activated
<yyyy-MM-dd (weekday.)? (hh:mm)?>
- deactivated
[yyyy-MM-dd (weekday.)? (hh:mm)?]
These items are possibly:
- for sub
- defined by underscore and cury braces (
\_{...}
) - for sup
- defined by caret and cury braces (
\^{...}
)
This behaviour should evolve to deal with the possiblity to skip the
curly braces. For now, since it may conflict with the underscore
markup, this part is left for later. Consider the org-option
#+OPTIONS: ^:{}
to be mandatory.
The parser needs a lot of regular expressions.
Non trivial regexps will be found in the file org.regexps.js
,
and accessible under the object Org.Regexps
.
- A new line declaration, either windows or unix-like
- Captures the first line of the string
- Selects anything in the given string until the next heading, or the end.
Example :
some content * next heading
would match “some content\n\n*” Captures everything except the star of the following heading.
- Parses a heading line, capturing :
- the stars
- the TODO status
- the priority
- the heading title
- the tags, if any, separated by colons
- How a meta information begins (
#\+META_KEY:
) - A meta information line, capturing:
- the meta key,
- the meta value
Example:
#+TITLE: The title
captures: “TITLE”, “The title”
- The property section. Captures the content of the section.
- Property line. Captures the KEY and the value.
- Clock section when several clock lines are defined.
- Matches a clock line, either started only, or finished.
Captures:
- start date (yyyy-MM-dd)
- start time (hh:mm)
- end date (yyyy-MM-dd)
- end time (hh:mm)
- duration (hh:mm)
- Scheduled
- Deadline
- The different kinds of lines encountered when parsing the content
Many functionalities are used throughout the parser, mainly to process
strings. The Org.Utils
object contains these functions.
We try to remain as light as possible, only adding functionalities that may already be present in certain versions of Javascript.
extend()
is a function to be attached to prototypes, for example, to allow easy addition of features.var Type = function(){}; Type.prototype.extend = _U.extend; Type.prototype.extend({ some: function(){}, neet: function(){} });
merge()
resemblesextend()
but allows to merge several objects into a brand new one.var one = {a:1, b:1}; var two = {a:2, c:3}; var three = _U.merge(one, two); assertEquals(2, three.a); assertEquals(1, three.b); assertEquals(3, three.c);
array(o)
makes an “official” Array out of an array-like object (like functionarguments
)range()
returns an array of numbers, built depending on the arguments- 1 argument : 0 to the argument, incrementing if positive, decrementing if negative
- 2 arguments :
arg[0]
toarg[1]
, incrementing or decrementing, - 3 arguments:
arg[0]
toarg[1]
, incrementing byarg[3]
trim(str)
: trimming a string, always returning a string (never return null or unusable output)unquote(str)
: if the input is inserted in quotes (=’) or double quotes (
”=), remove them ; return input if enclosing quotes not found.empty(o)
tells if a given string or array is empty (more exactly, tells if the length property of the argument is falsy)notEmpty(o)
is the inverse ofempty
blank(str)
tells if the given string has only blank charactersnotBlank(str)
is the inverse ofblank
repeat(str, times)
repeats the given string n times- =each(arr, fn)=applies a function for each element of the given array or object
- =map(arr, fn)=applies the given function for each element of the given array or object, and returns the array of results
filter(arr, fn)
applies the given function for each element of the given array or object, and returns the array of filtered resultslog(obj)
logs the given argument (relies onconsole.log
, does nothing if not present)firstLine(str)
returns the first line of the given stringlines(str)
splits the given string in lines, returns the array of lines without the trailing line feedrandomStr(length, chars)
returns a random string of given lengthkeys(obj)
returns an array of the keys of the given object- returns the keys of the given object joined with the given delimiter
getAbsentToken(str, prefix)
returns a random token not present in the given string- URI-style path utilities
parent(path)
gets the parent of the given pathconcat
concatenates path pieces into a valid path (normalizing path separators)get()
gets the content from a given location :- through AJAX if jQuery is detected,
- through node.js filesystem if node.js is detected,
- returning null if nothing found
_U.noop()
is (slightly) shorter to write thanfunction(){}
…incrementor()
provides an incrementor function, starting from 0 or the given argumentid()
returns a unique identifierbind()
mimics theFunction.bind
incr
is the default incrementor
Access the parent with the .parent
property.
Access the children with the .children
property.
ancestors()
provides the array of the ancestors of the current node, closest firstroot()
provides the root of the tree (last of ancestors)leaf()
tells if the node has children or notsiblings()
provides all the siblings (this node excluded)siblingsAll()
provides all the siblings (this node included)prev()
provides the previous item, or nullprevAll()
provides all the previous items (in the same order as siblings, closest last)next()
provides the next item, or nulllastAll()
provides all the next items (in the same order as siblings, closest first)append()
adds a new child at the end of the children arrayprepend()
adds a new child at the beginning of the children array
This object allows to parse and format dates. Only the parameters actually
provided by the Org
timestamps are parsed/formatted for now, and only as
numbers (no locale management for textual output of weekdays or months).
This object is a wrapper around the Javascript Date
object. Access the Date
instance through the date
property.
date
- the corresponding Javascript date
year
- the year
month
- the month (1-12)
day
- the day (1-31)
hour
- the hour (0-23)
minute
- the minute (0-59)
Parses a timestamp at the Org
format (for instance 2010-01-30 12:34
).
This function is called by the constructor.
Formats the timestamp in the Unix-date fashion. Only a few flags are supported.
%H
: the 2-digit hour (00-23)%k
: the hour (0-23)%I
: the 2-digit hour (01-12)%l
: the hour (1-12)%M
: the 2-digit minutes (00-59)%S
: the 2-digit seconds (00-59)%y
: the 2-digit year%Y
: the 4-digit year%m
: the 2-digit month (01-12)%d
: the 2-digit day (01-31)%e
: the day (1-31)
—orgdoc*
An XPath-like language to select items in the Org
document tree.
This allows to provide a selection mechanism to apply templates to nodes at rendering time.
Just to give a feeling of the selecting language, here are a few examples:
*
- any item whatsoever
node
,node{*}
- any node, an any level
n{*}
,n
- any node, ‘n’ being shortcut for ‘node’
n3
,n{3}
- any node of level 3
n{1-3}
,n3[level~1-3]
- any node of level 1 to 3
n3:tag
- any node of level 3 with a tag “tag” (possibly implied by parents)
n3!tag
- any node of level 3 with a tag “tag” defined at this node
n3[position\=2]
- any second node of level 3 within its parent
n3[2]
- any second node of level 3 within its parent
n3[todo\=DONE]
- any node of level 3 with a “DONE” todo-marker
n3/src1
,n3/src{1}
,n3/src[level~1-3]
- any
BEGIN_SRC
item right under a node of level 3 n3/src
- any
BEGIN_SRC
item within the content a node of level 3 n3//src
- any
BEGIN_SRC
item anywhere under a node of level 3 src
- any
BEGIN_SRC
item anywhere src[lang\=js]
- any
BEGIN_SRC
item anywhere whith language set as ‘js’ src>p
- first paragraph following a
BEGIN_SRC
item src>>p
- any paragraph following a
BEGIN_SRC
item src<p
- first paragraph preceding a
BEGIN_SRC
item src<<p
- any paragraph preceding a
BEGIN_SRC
item src/..
- parent of a
BEGIN_SRC
item
This section provides a default HTML renderer for the parsed tree.
It is intended to provide an example of how to attach rendering
functions to the Outline.Node
’s and the different
Content.Block
’s prototypes.
Working in the context of the Org
object. We will need, as
usual, some shortcuts to the Utils
, and to Org.Content
and
Org.Outline
.
- Purpose
- provides a utility function to render all the
children of a
Node
or aBlock
. - Arguments
- node, renderer
- Usage
- must be called with
.call(obj)
to provide the value forthis
.this
must have an enumerablechildren
property.
- Purpose
- provides a utility function to renders a node with the given renderer
- Arguments
- node, renderer
Should not be used, EmphInline is abstract…
This sections contains the code for the different types of instanciable blocks defined in
We will attach a, until now undefined, render
property to these
block prototypes. None of these function take any argument, all
the information they need being in the block object they will act
upon through the this
value.
The container blocks (those whose constructor calls the
ContainerBlock
constructor) all use the renderChildren
function.
The content blocks (those whose constructor calls the
ContentBlock
constructor) should use their this.lines
array.
RootBlock=s are rendered with a =div
tag, with class
org_content
.
UlistBlock=s are rendered with a simple =ul
tag.
OlistBlock=s are rendered with a simple =ol
tag.
If the block has a start
property different from 1
, it is
inserted in the start
attribute of the tag.
DlistBlock=s are rendered with a =dl
tag.
DlistItemBlock=s will have to use =dt=/=dd
structure
accordingly.
UlistItemBlock=s and =0listItemBlocks
are rendered with a
#simple li
tag.
DlistItemBlock=s are rendered with a =dt=/=dl
tag structure.
The content of the dt
is the title
attribute of the block.
The content of the dd
is the rendering of this block’s children.
ParaBlock=s are rendered with a =p
tag.
The content of the tag is the concatenation of this block’s
this.lines
, passed to the renderMarkup
function.
VerseBlock=s are rendered with a =p
tag, with class
verse
.
All spaces are converted to unbreakable spaces.
All new lines are replaced by a br
tag.
QuoteBlock=s are rendered with a =blockquote
tag.
If the quote contains an author declaration (after a double dash), this declaration is put on a new line.
CenterBlock=s are rendered with a simple =center
tag.
ExampleBlock=s are rendered with a simple =pre
tag.
The content is not processed with the renderMarkup
function, only
with the escapeHtml
function.
SrcBlock=s are rendered with a =pre.src
tag with a code
tag within.
The code
tag may have a class attribute if the language of the
block is known. In case there is, the class would take the language
identifier.
The content is not processed with the renderMarkup
function, only
with the escapeHtml
function.
=HtmlBlock=s are rendered by simply outputting the HTML content verbatim, with no modification whatsoever.
=CommentBlock=s are ignored.
Here we render headlines, represented by Outline.Node
objects.
A section
tag is used, with class orgnode, and a level.
The id
attribute is the computed id corresponding to a unique TOC
identifier.
The title is in a div.title
element. Each tag is represented at the
end of this element by a span.tag
element.
The content of the node (the RootBlock associated to this headline) is rendered.
Then the subheadlines are rendered using the renderChildren
function.
- Purpose
- The
escapeHtml
function escapes the forbidden characters in HTML/XML:&
,>
,<
, =’= and =”=, which are all translated to their corresponding entity. - Arguments
-
str
- any value, converted into a string at the beginning of the function.
- Purpose
- Utility to unescape the characters of the given raw content
- Arguments
-
str
- any value, converted into a string at the beginning of the function.
- Purpose
- Unbackslash after having escaped HTML
- Arguments
-
str
- any value, converted into a string at the beginning of the function.
Should not be used, EmphInline is abstract…
This sections contains the code for the different types of instanciable blocks defined in
We will attach a, until now undefined, render
property to these
block prototypes. None of these function take any argument, all
the information they need being in the block object they will act
upon through the this
value.
The container blocks (those whose constructor calls the
ContainerBlock
constructor) all use the renderChildren
function.
The content blocks (those whose constructor calls the
ContentBlock
constructor) should use their this.lines
array.
RootBlock=s are rendered with a =div
tag, with class
org_content
.
UlistBlock=s are rendered with a simple =ul
tag.
OlistBlock=s are rendered with a simple =ol
tag.
If the block has a start
property different from 1
, it is
inserted in the start
attribute of the tag.
DlistBlock=s are rendered with a =dl
tag.
DlistItemBlock=s will have to use =dt=/=dd
structure
accordingly.
UlistItemBlock=s and =0listItemBlocks
are rendered with a
#simple li
tag.
DlistItemBlock=s are rendered with a =dt=/=dl
tag structure.
The content of the dt
is the title
attribute of the block.
The content of the dd
is the rendering of this block’s children.
ParaBlock=s are rendered with a =p
tag.
The content of the tag is the concatenation of this block’s
this.lines
, passed to the renderMarkup
function.
VerseBlock=s are rendered with a =p
tag, with class
verse
.
All spaces are converted to unbreakable spaces.
All new lines are replaced by a br
tag.
QuoteBlock=s are rendered with a =blockquote
tag.
If the quote contains an author declaration (after a double dash), this declaration is put on a new line.
CenterBlock=s are rendered with a simple =center
tag.
ExampleBlock=s are rendered with a simple =pre
tag.
The content is not processed with the renderMarkup
function, only
with the escapeHtml
function.
SrcBlock=s are rendered with a =pre.src
tag with a code
tag within.
The code
tag may have a class attribute if the language of the
block is known. In case there is, the class would take the language
identifier.
The content is not processed with the renderMarkup
function, only
with the escapeHtml
function.
=HtmlBlock=s are rendered by simply outputting the HTML content verbatim, with no modification whatsoever.
=CommentBlock=s are ignored.
Here we render headlines, represented by Outline.Node
objects.
A section
tag is used, with class orgnode, and a level.
The id
attribute is the computed id corresponding to a unique TOC
identifier.
The title is in a div.title
element. Each tag is represented at the
end of this element by a span.tag
element.
The content of the node (the RootBlock associated to this headline) is rendered.
Then the subheadlines are rendered using the renderChildren
function.