Skip to content
My JavaScript parser
Branch: master
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.
.gitignore Add gitignore Oct 25, 2011
.npmignore [fix] Don't bundle tests when deploying package. Oct 16, 2012
LICENSE First commit! Oct 1, 2011
Tokenizer.js Better error detection in tag parsing... Nov 5, 2012
ZeParser.js Fixed a case where a leading unfinished tag literal would cause infin… Nov 5, 2012
interactive.html Added an interactive console for the parser which shows you the token… Mar 25, 2012
test-tokenizer.html Fixed case for files Dec 23, 2011
tests.js Add unit tests for determining the size of the parse tree. Dec 23, 2011


This is a JavaScript parser.
(c) Peter van der Zee


The Tokenizer is used by the parser. The parser tells the tokenizer whether the next token may be a regular expression or not. Without the parser, the tokenizer will fail if regular expression literals are used in the input.


Returns a "parse tree" which is a tree of an array of arrays with tokens (regular objects) as leafs. Meta information embedded as properties (of the arrays and the tokens).


Returns a new ZeParser instance which has already parsed the input. Amongst others, the ZeParser instance will have the properties .tree, .wtree and .btree.

.tree is the parse tree mentioned above.
.wtree ("white" tree) is a regular array with all the tokens encountered (including whitespace, line terminators and comments)
.btree ("black" tree) is just like .wtree but without the whitespace, line terminators and comments. This is what the specification would call the "token stream".

I'm aware that the naming convention is a bit awkward. It's a tradeoff between short and descriptive. The streams are used quite often in the analysis.

Tokens are regular objects with several properties. Amongst them are .tokposw and .tokposw, they correspond with their own position in the .wtree and .btree.

The parser has two modes for parsing: simple and extended. Simple mode is mainly for just parsing and returning the streams and a simple parse tree. There's not so much meta information here and this mode is mainly built for speed. The other mode has everything required for Zeon to do its job. This mode is toggled by the instance property .ast, which is true by default :)

Non-factory example:

var input = "foo";
var tree = []; // this should probably be refactored away some day
var tokenizer = new Tokenizer(input); // dito
var parser = new ZeParser(input, tokenizer, tree);
parser.parse(); // returns tree..., should never throw errors
parser.tokenizer.fixValues(); // makes sure all tokens have a .value property

Highlighting example:

var parser = ZeParser.createParser(textarea.value); // textarea.value:input
parser.tokenizer.fixValues(); // makes sure all tokens have a .value property
var wtree = parser.tokenizer.wtree; // all the tokens ("token stream", including whitespace)
textarea.className = '';
var tokenstrings ={
	if ( == 14) textarea.className = 'error';
	return '<span class="t''">'+('\u29e6':('\u292c':t.value)).replace(/&/g,'&amp;').replace(/</g,'&lt;').replace(/>/g,'&gt;')+'</span>';
// the string that would contain highlighted code
// tokenstrings.join('');
You can’t perform that action at this time.