parse

Matt Bierner edited this page Jan 10, 2015 · 24 revisions
Clone this wiki locally

Running

Operations that run a parser against a state to and extract the success or failure result.

parse.run(p, input, [userData])

Run parser p against array like object input. An optional user data object can be provided.

run returns success results and throws error results.

parse.run(
    parse_text.string('ab'),
    'abc'); // returns 'ab'

parse.run(
    parse_text.string('ab'),
    'x'); // throws Expected 'a' found 'x'

parse.runStream(p, s, [userData])

Run parser p against a potentially infinite Nu stream s. An optional user data object can be provided.

Returns success results, throws error results.

parse.runStream(
    parse_text.string('ab'),
    stream.from('abc')); // returns 'ab'

// Run against infinite, lazy stream
parse.runStream(
    parse_text.string('aaaaa'),
    gen.repeat(Infinity, 'a')); // returns 'aaaaa'

parse.runState(p, state)

Run parser p with state. This allows you to use a custom parser state or use a [custom position][position].

Returns success results, throws error results.

parse.runState(
    parse_text.string('ab'),
    new parse.ParserState(
         parse.Position.initial,
         stream.from('abc'),
         {});
 // returns 'ab'

parse.test(p, input, [userData])

Same as parse.run but returns a boolean that indicates if parser p succeeded or failed.

parse.test(
    parse_text.string('ab'),
    'abc'); // returns true

parse.run(
    parse_text.string('ab'),
    'x'); // returns false

parse.testStream(p, s, [userData])

Same as parse.runStream but returns a boolean that indicates if parser p succeeded or failed.

parse.testState(p, state)

Same as parse.runState but returns a boolean that indicates if parser p succeeded or failed.

parse.parse(p, input, userData, ok, err)

Runs parser p against an array like input. A user data object must be provided, but may be null. Calls ok if the parser succeeds and err if it fails.

Returns result callback.

parse.parse(
    parse_text.string('ab'),
    'abc',
    null
    \x -> console.log("suc" + x),
    \x -> console.log("error" + x));

parse.parseStream(p, s, userData, ok, err)

Same as parse.parse but takes and Nu stream s as input.

parse.parseState(p, state, ok, err)

Same as parse.parse but takes an explicit parser state state.

Parsers

Basic Parsers

parse.always(x), parse.of(x)

Parser that always succeeds with value x.

var p = parse.always('3');

parse.run(p, ''); // 3
parse.run(p, 'abc'); // 3

parse.never(x)

Parser that always fails with value x.

parse.run(parse.never(5), 'abc'); // throws 5
parse.run(parse.never('Error'), 'abc'); // throws 'Error'

parse.bind(p, f), parse.chain(p, f)

Parser that parses p, passing the result to function f which returns a parser to continue the computation.

var p = parse.bind(parse.always(1), \x -> parse.always(x + 1));
parse.run(p, ''); // 2
    
var err = parse.bind(parse.never(), \x -> parse.always(x + 1));
parse.run(p, ''); // throws UnknownError, 'f' never called.

parse.fail([message])

Parser that always fails with an error.

If message is not provided, fails with UnknownError. Otherwise, fails with 'ParseError' using message

parse.run(parse.never(), 'abc'); // throws UnknownError
parse.run(parse.never('Error'), 'abc'); // throws ParserError with error message "Error"

Sequencing

parse.next(p, q), parse.concat(p, q)

Consumes p then q. Returns result from q

var p = parse.next(
    parse_text.character('a'),
    parse_text.character('b'));

parse.run(p, 'ab'); // b 
parse.run(p, 'abc'); // b
parse.run(p, 'a'); // Error!
parse.run(p, 'aab'); // Error!
parse.run(p, 'ba'); // Error!

// If `p` fails, `q` is never run.
var p = parse.next(
    parse.never(),
    parse_text.character('b'));

parse.run(p, 'b'); // Error
parse.run(p, ''); // Error

parse.sequence(...parsers)

Parser that runs each parser from parsers in order, returning the last result.

var p = parse.sequence(
    parse_text.character('a'),
    parse_text.character('b'),
    parse_text.character('c'));

parse.run(p, 'a'); // Error, expected b
parse.run(p, 'ab'); // Error expected c
parse.run(p, 'abc'); // c
parse.run(p, 'abcd'); // c

parse.sequencea(parsers)

Same as parse.sequence but takes parsers as an array.

var p = parse.sequencea(['a', 'b', 'c'].map(parse_text.character));

parse.run(p, 'a'); // Error, expected b
parse.run(p, 'ab'); // Error expected c
parse.run(p, 'abc'); // c

parse.sequences(parsers)

Same as parse.sequence but get parsers as a Nu stream. This allows parsing infinite lazy sequences of parsers.

var tenA = parse.sequences(
    stream.repeat(10, parse_text.character('a'));

parse.run(tenA, 'aaaaaaaaaaaa'); // a

Choice

parse.either(p, q)

Parser that succeeds with either parser p or q. Attempts p first and if p fails attempts q.

If both fail, fails with a MultipleError with errors from p and q combined.

var p = parse.either(
    parse_text.character('a'),
    parse_text.character('b'));

parse.run(p, 'a'); // a 
parse.run(p, 'b'); // b
parse.run(p, 'c'); // Error! MultipleError

Either does not automatically backtrack, use parse.attempt to add backtracking

var p = parse.either(
    parse.next(
        parse_text.character('a'),
        parse_text.character('b')),
    parse.next(
        parse_text.character('a'),
        parse_text.character('c')));
parse.run(p, 'ab'); // b
parse.run(p, 'ac'); // Error! First parser succeeded on 'a' then failed on 'c'.
// Returned error is the error from the first parser

parse.attempt(p)

Parser that attempts to parse p and can backtrack if needed.

Useful with either parsers

// Modified example from 'parse.either'
var p = parse.either(
    parse.attempt(parse.next(
        parse_text.character('a'),
        parse_text.character('b'))),
    parse.next(
        parse_text.character('a'),
        parse_text.character('c')));

parse.run(p, 'ab'); // b
parse.run(p, 'ac'); // c
parse.run(p, 'z'); // Error! MultipleError 

parse.look(p)

Parse p, but don't consume any input if it succeeds. Returns results from p. This is the same as Parsec's lookahead

var p = parse.sequence(
    parse_text.character('a'),
    parse.look(parse_text.character('b')));

parse.run(p, 'ab'); 'b'
parse.run(p, 'ax'); Error
parse.run(p, 'a'); Error

// When look succeeds, it consumes no input
parse.run(parse.next(p, parse.anyToken), 'ab'); 'b'

parse.lookahead(p)

Like parse.look, but preserves user state after parsing p. position and input are still reverted.

parse.choice(...choices)

Attempts a variable number of parsers in order until one succeeds, or all fail. Returns result of first to succeed.

When all fails, returns a MultipleError with all errors from choices.

var p = parse.choice(
    parse_text.character('a'),
    parse_text.character('b'),
    parse_text.character('c')); 
parse.run(p, 'a'); // a
parse.run(p, 'b'); // b
parse.run(p, 'c'); // c
parse.run(p, 'z'); // Error! MultipleError

parse.choicea(choices)

Same as choice, but gets choices from array:

var p = parse.choicea(
    ['a', 'b', 'c']
        .map(parse_text.character));

parse.run(p, 'a'); // a
parse.run(p, 'b'); // b
parse.run(p, 'c'); // c
parse.run(p, 'z'); // Error! MultipleError

parse.choices(choices)

Same as parse.choice but get parsers as a Nu stream. This allows parsing infinite lazy sequences of parsers.

parse.optional(default, p)

Parser p once, or return default if p fails.

var p = parse.optional('def', parse_text.character('b'));
parse.run(p, 'b'); // b
parse.run(p, ''); // 'def'
parse.run(p, 'z'); // 'def'

parse.expected(message, p)

Run p and if it fails without consuming input, produce an ExpectError with expect.

var twoBs = parse.next(
    parse_text.character('b'),
    parse_text.character('b'));

// Standard error message
parse.run(twoBs, 'x'); // ERROR, expected 'b' found 'x'


// Using expect
parse.run(
    parse.expect("two Bs", toBs),
    'x'); // ERROR, expected "two Bs" found 'x'

parse.not(p)

Run p, swapping result success and never consuming any input.

Enumeration

parse.many(p)

Consumes p zero or more times and succeeds with a Nu stream of results.

It is a ParserError to run many with a parser the succeeds but consumes no input.

var p = parse.many(parse_text.character('a'));
parse.run(p, ''); // empty stream 
parse.run(p, 'z'); // empty stream
parse.run(p, 'a'); // stream of ['a'] 
parse.run(p, 'aaa'); // stream of ['a', 'a', 'a'] 
parse.run(p, 'aabaa); // stream of ['a', 'a'] 

parse.many1(p)

Parser that consumes p one or more times and succeeds with a Nu stream of results.

var p = parse.many1(parse_text.character('a'));
parse.run(p, ''); // Error!
parse.run(p, 'z'); // Error!
parse.run(p, 'a'); // stream of ['a'] 
parse.run(p, 'aaa'); // stream of ['a', 'a', 'a'] 
parse.run(p, 'aabaa); // stream of ['a', 'a'] 

parse.manyTill(p, end)

Parser that consumes p until end matches. Does not consume end. Succeeds with a Nu stream of results.

var p = parse.manyTill(parse.anyToken, parse_text.character('!'));
parse.run(p, ''); // stream of []
parse.run(p, '!'); // stream of []
parse.run(p, 'a!'); // stream of ['a'] 
parse.run(p, 'ab!c'); // stream of ['a', 'b']

parse.cons(p, q)

Parser that cons the result of p onto result of q. q must return a Nu stream.

var p = parse.cons(
   parse_text.character('a'),
   parse.enumeration(
      parse_text.character('b'),
      parse_text.character('c'));

parse.run(p, ''); // Error!
parse.run(p, 'z'); // Error!
parse.run(p, 'ab'); // Error!
parse.run(p, 'abc'); // stream of ['a', 'b', 'c'] 
parse.run(p, 'abcxyz'); // stream of ['a', 'b', 'c'] 

parse.append(p, q)

Parser that joins the result of p onto result of q. Both p and q must return Nu streams.

var p = parse.append(
   parse.enumeration(
      parse_text.character('a'),
      parse_text.character('b'),
   parse.enumeration(
      parse_text.character('c'),
      parse_text.character('d'));

parse.run(p, ''); // Error!
parse.run(p, 'z'); // Error!
parse.run(p, 'ab'); // Error!
parse.run(p, 'abc'); // Error!
parse.run(p, 'abcd'); // stream of ['a', 'b', 'c', 'd'] 
parse.run(p, 'abcdefg'); // stream of ['a', 'b', 'c', 'd'] 

parse.enumeration(...parsers)

Consume parsers in order, building a Nu stream of results.

var p = parse.enumeration
    parse_text.character('a'),
    parse_text.character('b'));
        
parse.run(p, 'ab'); // stream of ['a', 'b'] 
parse.run(p, 'ax'); // Error, expected b found x. No backtracking

parse.enumerationa(parsers)

Same as parse.enumeration but takes parsers an array.

parse.enumerations(parsers)

Same as parse.enumeration but takes parsers as a potentially infinite Nu stream.

parse.eager(p)

Flattens the results of p to a Javascript array.

var p = parse.eager(parse.many(parse_text.character('a')));
parse.run(p, ''); // []
parse.run(p, 'z'); // []
parse.run(p, 'a'); //  ['a'] 
parse.run(p, 'aaa'); //  ['a', 'a', 'a'] 
parse.run(p, 'aabaa); //  ['a', 'a'] 

parse.binds(p, f)

Same operation as bind, but p succeeds with a Nu stream and f is called with stream values as arguments.

var seq = parse.enumeration(parse.always(1), parse.always(2));

var p = parse.binds(seq, \x y -> parse.always(x + y));
parse.run(p, ''); // 3

var err = parse.binds(parse.never(), \x, y -> parse.always(x + y));
parse.run(p, ''); // throws UnknownError, 'f' never called.

Tokens

parse.token(consume, [err])

Parser that consumes a single item from the head of input if the function consume returns true for that item.

Fails to consume input if consume is false, or there is no more input.

When consume succeeds, advanced the input.

var p = parse.token(\x -> x === 'a');
parse.run(p, ''); // Error!
parse.run(p, 'b'); // Error!
parse.run(p, 'a'); // 'a'
parse.run(p, 'abc'); // 'a'

var p = parse.next(
    parse.token(\x -> x === 'a'),
    parse.token(\x -> x === 'b'));
parse.run(p, ''); // Error!
parse.run(p, 'b'); // Error!
parse.run(p, 'a'); // Error!
parse.run(p, 'ab'); // 'b'
parse.run(p, 'abc'); // 'b'

err is an optional function that is called to get the error object when consume fails. Defaults to returning an UnexpectError.

var p = parse.token(
    \x -> x === 'a',
    \pos found -> new parse.ExpectError(pos, 'a', found));
    
parse.run(p, ''); // Error! ExpectError for expected 'a'
parse.run(p, 'b'); // Error!  ExpectError for expected 'a'

parse.anyToken

Parser that consumes any token.

Fails on end of input (EOF).

var p = parse.anyToken;
parse.run(p, ''); // Error! Unexpected eof
parse.run(p, 'b'); // 'b'
parse.run(p, 'a'); // 'a'

parser.eof

Parser that matches the end of input.

State Interaction

parse.getParserState

Succeeds with the current parser state.

parse.run(
     parse.next(
          parse_text.character('a'),
          parse.getParserState),
     'abc'); // returns a ParserState(Position(1), input, ud)

parse.setParserState(s)

Sets the parser state to s. Succeeds with the state s.

parse.modifyParserState(f)

Modify the parser state using function f, setting the new state as the result of f.

parse.run(
     parse.modifyParserState(\s ->
           s.setPosition(parse.Position.initial)),
     'abc'); // returns the parserState

parse.extract(f)

Parser that extracts a value from the parser state by calling function f with state.

parse.getState

Succeeds with the current [user state][user-state].

parse.run(
    parse.getState,
    "abc",
    'user state'); // returns 'user state'

parse.setState(s)

Sets the [user state][user-state] to s. Succeeds with s.

parse.run(
    parse.setState('new user state'),
    "abc",
    'user state'); // returns 'new user state'

####parse.modifyState(f) Modify the [user state][user-state] using function f, succeeding the result and setting the state to be the result.

parse.run(
    parse.sequence(
         parse.modifyState(\x -> x + 10),
         parse.modifyState(\x -> x / 2),
         parse.getUserState)
    "abc",
    0); // returns 5

parse.getPosition

Get the current position.

parse.run(
    parse.next(
         parse_text.character('a'),
         parse.getPosition)
    "abc"); // returns Position(1)

parse.setPosition(pos)

Set the current position.

parse.getInput

Get the current input. Returns a Nu stream of remaining input.

parse.run(
    parse.next(
         parse_text.character('a'),
         parse.bind(parse.getInput, \inputStream ->
              always(stream.toArray(inputStream)))
    "abc"); // returns ['b', 'c']

parse.setInput(input)

Set the current input to the Nu stream input.

Parsing continues on new input.

parse.run(
    parse.sequence(
         parse_text.character('a'),
         parse.setInput(stream.NIL),
         parse_text.character('b'))
    "abc"); // throws error, found eof expected 'b'

Memoization

parse.memo(p)

Memoizes the results of parser p using the current parser state as the key. This can be extremely important for performance, especially with heavily backtracking parsers.

The resulting parser is a transparent wrapper of p.

Objects

parse.Position

Default object used to track the parser's position in the input stream. 'parse.Position' simply keeps track of the index in the stream.

Also defines a parse.Position.initial constant as the position at the start of parsing.

parse.ParserState

Default [parser state][parse-state] object. Keeps track of the input stream, position, and a [user state][user-state].

parse.Parser(impl)

Base parser type. All Bennu parsers are instances of this object.

You will probably never need to use this object directly, but all base parsers are instances of this object and it implements the Fantasy Land methods.

Parser Creation

parse.rec(def)

Creates a parser using the factory function def to allow self references.

def is a function that is passed a reference to parser being created, and returns the parser.

For example, using a traditional definition the self reference to 'b' evaluates to undefined:

var b = parse.either(parse_text.character('b'), b)

// Really this is equivalent to:
var b = parse.either(parse_text.character('b'), undefined)

Using rec, we fix this.

var b = rec(\self ->
    parse.either(parse_text.character('b'), self));

and now 'b' correctly references itself.

parse.late(def)

Creates a placeholder for a parser that is resolved later. def is a function that, when called, resolves a parser. Will be called once.

This is useful for circular references when wrapping the entire thing in parse.rec would be annoying.

parse.label(name, impl)

Create a parser with display name name and implementation impl. Display names help with debugging.

Errors

ParserError(message)

Error thrown when there is an error with the parser definition itself (for example, calling many on a parser that succeeds and consumes no input).

ParseError(position, message)

Base type of error thrown durring during parsing.

The position property gets the location where the error occurred.

The message property gets the complete error description (you can also use toString).

The errorMessage property gets the description of just the error, without the position information.

MultipleError(...errors)

Merges one or more ParserError into a single error.

The position is the position of the first error. The message is combined messages from errors.

UnknownError(pos)

Error whose exact cause is unknown.

UnexpectError(pos, unexpected)

ParseError when an unexpected token 'unexpected' is encountered at position 'pos'.

ExpectedError(pos, expected, [found])

ParseError when an unexpected token 'found' is encountered at position 'pos' when 'expected was expected'.