An extension of LPeg that supports labeled failures
C Lua Makefile
Permalink
Failed to load latest commit information.
examples Adding to README an example about farthest failure Jan 4, 2017
rockspecs Updating HISTORY and adding rockspec to version 1.2.0-1 Jan 4, 2017
HISTORY Updating HISTORY and adding rockspec to version 1.2.0-1 Jan 4, 2017
LICENSE Updating lpeglabel to lpeg-1.0 Jun 27, 2016
README.md Adding to README an example about farthest failure Jan 4, 2017
lpcap.c Updating lpeglabel to lpeg-1.0 Jun 27, 2016
lpcap.h Updating to lpeg 0.12.2 Mar 23, 2015
lpcode.c Removing labeled choice, updating testlabel, and disabling an optmiza… Nov 10, 2016
lpcode.h Updating lpeglabel to lpeg-1.0 Jun 27, 2016
lpeglabel-logo.png Adjusting things to improve documentation Jun 28, 2016
lpprint.c Minor bug when printing a recovery pattern Dec 9, 2016
lpprint.h Updating lpeglabel to lpeg-1.0 Jun 27, 2016
lptree.c Renaming "lpeglabelrec" to "lpeglabel", and "relabelrec" to "relabel" Dec 16, 2016
lptree.h Removing labeled choice, updating testlabel, and disabling an optmiza… Nov 10, 2016
lptypes.h Tracking the farthest failure position in case of a regular fail Jan 3, 2017
lpvm.c Tracking the farthest failure position in case of a regular fail Jan 3, 2017
lpvm.h Removing labeled choice, updating testlabel, and disabling an optmiza… Nov 10, 2016
makefile Renaming "lpeglabelrec" to "lpeglabel", and "relabelrec" to "relabel" Dec 16, 2016
relabel.lua Renaming "lpeglabelrec" to "lpeglabel", and "relabelrec" to "relabel" Dec 16, 2016
test.lua Renaming "lpeglabelrec" to "lpeglabel", and "relabelrec" to "relabel" Dec 16, 2016
testlabel.lua Tracking the farthest failure position in case of a regular fail Jan 3, 2017
testrelabelparser.lua Renaming "lpeglabelrec" to "lpeglabel", and "relabelrec" to "relabel" Dec 16, 2016

README.md

LPegLabel

LPegLabel - Parsing Expression Grammars (with Labels) for Lua


Introduction

LPegLabel is a conservative extension of the LPeg library that provides an implementation of Parsing Expression Grammars (PEGs) with labeled failures. Labels can be used to signal different kinds of errors and to specify which recovery pattern should handle a given label. Labels can also be combined with the standard patterns of LPeg.

Besides that, LPegLabel also reports the farthest failure position in case of an ordinary failure (which is represented by label 0).

This document describes the new functions available in LpegLabel and presents some examples of usage.

With labeled failures it is possible to distinguish between an ordinary failure and an error. Usually, an ordinary failure is produced when the matching of a character fails, and this failure is caught by ordered choice. An error (a non-ordinary failure), by its turn, is produced by the throw operator and may be caught by the recovery operator.

In LPegLabel, the result of an unsuccessful matching is a triple nil, lab, sfail, where lab is the label associated with the failure, and sfail is the suffix input being matched when lab was thrown.

When lab is an ordinary failure and no error was thrown before, sfail is formed according to the farthest position where an ordinary failure occurred. In case lab is an ordinary failure and an error was thrown before, sfail is the farthest suffix where an ordinary failure occurred after the last error.

Below there is a brief summary of the new functions provided by LpegLabel:

FunctionDescription
lpeglabel.T (l) Throws a label l to signal an error
lpeglabel.Rec (p1, p2, l1 [, l2, ..., ln]) Specifies a recovery pattern p2 for p1, when the matching of p1 gives one of the labels l1, ..., ln.
%{l} Syntax of relabel module. Equivalent to lpeglabel.T(l)
p1 //{l1 [, l2, ..., ln} p2 Syntax of relabel module. Equivalent to lpeglabel.Rec(p1, p2, l1, ..., ln)
relabel.calcline(subject, i) Calculates line and column information regarding position i of the subject
relabel.setlabels (tlabel) Allows to specicify a table with mnemonic labels.

Functions

lpeglabel.T(l)

Returns a pattern that throws the label l. A label must be an integer between 1 and 255.

This pattern always causes a failure, whose associated position will be used to set sfail, no matter whether this is the farthest failure position or not.

lpeglabel.Rec(p1, p2, l1, ..., ln)

Returns a recovery pattern. If the matching of p1 gives one of the labels l1, ..., ln, then the matching of p2 is tried from the failure position of p1. Otherwise, the result of the matching of p1 is the pattern's result.

%{l}

Syntax of relabel module. Equivalent to lpeg.T(l).

p1 //{l1, ..., ln} p2

Syntax of relabel module. Equivalent to lpeglabel.Rec(p1, p2, l1, ..., ln).

The //{} operator is left-associative.

relabel.calcline (subject, i)

Returns line and column information regarding position i of the subject.

relabel.setlabels (tlabel)

Allows to specicify a table with labels. They keys of tlabel must be integers between 1 and 255, and the associated values should be strings.

Examples

Below there a few examples of usage of LPegLabel. The code of these and of other examples is available in the examples directory.

Reporting the farthest failure

This example illustrates the new values returned by the match function in case of an unsuccessful matching. As no error is thrown, when the matching fails sfail represents the farthest suffix where an ordinary failure occurred.

local m = require'lpeglabel'

function matchPrint(p, s)
  local r, lab, sfail = p:match(s)
  print("r: ", r, "lab: ", lab, "sfail: ", sfail)
end

local p = m.P"a"^0 * m.P"b" + m.P"c"
matchPrint(p, "abc")  --> r:    3   lab:    nil sfail:  nil
matchPrint(p, "c")    --> r:    2   lab:    nil sfail:  nil
matchPrint(p, "aac")  --> r:    nil lab:    0   sfail:  c
matchPrint(p, "xxc")  --> r:    nil lab:    0   sfail:  xxc

Matching a list of identifiers separated by commas

The following example defines a grammar that matches a list of identifiers separated by commas. A label is thrown when there is an error matching an identifier or a comma.

We use function newError to store error messages in a table and to return the index associated with each error message.

local m = require'lpeglabel'
local re = require'relabel'

local terror = {}

local function newError(s)
  table.insert(terror, s)
  return #terror
end

local errUndef = newError("undefined")
local errId = newError("expecting an identifier")
local errComma = newError("expecting ','")

local g = m.P{
  "S",
  S = m.V"Id" * m.V"List",
  List = -m.P(1) + (m.V"Comma" + m.T(errComma)) * (m.V"Id" + m.T(errId)) * m.V"List",
  Id = m.V"Sp" * m.R'az'^1,
  Comma = m.V"Sp" * ",",
  Sp = m.S" \n\t"^0,
}

function mymatch (g, s)
  local r, e, sfail = g:match(s)
  if not r then
    local line, col = re.calcline(s, #s - #sfail)
    local msg = "Error at line " .. line .. " (col " .. col .. "): "
    return r, msg .. terror[e] .. " before '" .. sfail .. "'"
  end
  return r
end

print(mymatch(g, "one,two"))              --> 8
print(mymatch(g, "one two"))              --> nil Error at line 1 (col 3): expecting ',' before ' two'
print(mymatch(g, "one,\n two,\nthree,"))  --> nil Error at line 3 (col 6): expecting an identifier before ''

In this example we could think about writing rule List as follows:

List = ((m.V"Comma" + m.T(errComma)) * (m.V"Id" + m.T(errId)))^0,

but when matching this expression against the end of input we would get a failure whose associated label would be errComma, and this would cause the failure of the whole repetition.

Error Recovery

By using the Rec function we can specify a recovery pattern that should be matched when a label is thrown. After matching the recovery pattern, and possibly recording the error, the parser will resume the regular matching. For example, in the example below we expect to match rule A, but when a failure occur the label 42 is thrown and then we will try to match the recovery pattern recp:

local m = require'lpeglabel'

local recp = m.P"oast"

local g = m.P{
  "S",
  S = m.Rec(m.V"A", recp, 42) * ".",
  A = m.P"t" * (m.P"est" + m.T(42))
}

print(g:match("test."))   --> 6
print(g:match("toast."))  --> 7
print(g:match("oast."))   --> nil  0  oast.
print(g:match("toward."))   --> nil  0  ward.

When trying to match subject 'toast.', in rule A the first 't' is matched, then the matching of m.P"est" fails and label 42 is thrown, with the associated inpux suffix 'oast.'. In rule S label 42 is caught and the recovery pattern matches 'oast', so pattern '.' matches the rest of the input.

When matching subject 'oast.', pattern m.P"t" fails, and the result of the matching is nil, 0, oast..

When matching 'toward.', label 42 is thrown after matching 't', with the associated input suffix 'oward.'. As the matching of the recovery pattern fails, the result is nil, 0, ward..

Usually, the recovery pattern is an expression that does not fail. In the previous example, we could have used (m.P(1) - m.P".")^0 as the recovery pattern.

Below we rewrite the grammar that describes a list of identifiers to use a recovery strategy. Grammar g remains the same, but we add a recovery grammar grec that handles the labels thrown by g.

In grammar grec we use functions record and sync. Function record, plus function recorderror, will help us to save the input position where a label was thrown, while function sync will give us a synchronization pattern, that consumes the input while is not possible to match a given pattern p.

When the matching of an identifier fails, a defaul value ('NONE') is provided.

local m = require'lpeglabel'
local re = require'relabel'

local terror = {}

local function newError(s)
  table.insert(terror, s)
  return #terror
end

local errUndef = newError("undefined")
local errId = newError("expecting an identifier")
local errComma = newError("expecting ','")

local id = m.R'az'^1

local g = m.P{
  "S",
  S = m.V"Id" * m.V"List",
  List = -m.P(1) + m.V"Comma" * m.V"Id" * m.V"List",
  Id = m.V"Sp" * id + m.T(errId),
  Comma = m.V"Sp" * "," + m.T(errComma),
  Sp = m.S" \n\t"^0,
}

local subject, errors

function recorderror(pos, lab)
  local line, col = re.calcline(subject, pos)
  table.insert(errors, { line = line, col = col, msg = terror[lab] })
end

function record (lab)
  return (m.Cp() * m.Cc(lab)) / recorderror
end

function sync (p)
  return (-p * m.P(1))^0
end

local grec = m.P{
  "S",
  S = m.Rec(m.Rec(g, m.V"ErrComma", errComma), m.V"ErrId", errId),
  ErrComma = record(errComma) * sync(id),
  ErrId = record(errId) * sync(m.P",")
}


function mymatch (g, s)
  errors = {}
  subject = s  
  local r, e, sfail = g:match(s)
  if #errors > 0 then
    local out = {}
    for i, err in ipairs(errors) do
      local msg = "Error at line " .. err.line .. " (col " .. err.col .. "): " .. err.msg
      table.insert(out,  msg)
    end
    return nil, table.concat(out, "\n") .. "\n"
  end
  return r
end

print(mymatch(grec, "one,two"))
-- Captures (separated by ';'): one; two; 
-- Syntactic errors found: 0

print(mymatch(grec, "one two three"))
-- Captures (separated by ';'): one; two; three; 
-- Syntactic errors found: 2
-- Error at line 1 (col 4): expecting ','
-- Error at line 1 (col 8): expecting ','

print(mymatch(grec, "1,\n two, \n3,"))
-- Captures (separated by ';'): NONE; two; NONE; NONE; 
-- Syntactic errors found: 3
-- Error at line 1 (col 1): expecting an identifier
-- Error at line 2 (col 6): expecting an identifier
-- Error at line 3 (col 2): expecting an identifier

print(mymatch(grec, "one\n two123, \nthree,"))
-- Captures (separated by ';'): one; two; three; NONE; 
-- Syntactic errors found: 3
-- Error at line 2 (col 1): expecting ','
-- Error at line 2 (col 5): expecting ','
-- Error at line 3 (col 6): expecting an identifier
relabel syntax

Below we describe again a grammar that matches a list of identifiers, now using the syntax supported by relabel, where //{} is the recovery operator, and %{} is the throw operator:

local re = require 'relabel' 

local errinfo = {
  {"errUndef",  "undefined"},
  {"errId",     "expecting an identifier"},
  {"errComma",  "expecting ','"},
}

local errmsgs = {}
local labels = {}

for i, err in ipairs(errinfo) do
  errmsgs[i] = err[2]
  labels[err[1]] = i
end

re.setlabels(labels)

local g = re.compile[[
  S      <- Id List
  List   <- !.  /  Comma Id List
  Id     <- Sp {[a-z]+} / %{errId}
  Comma  <- Sp ',' / %{errComma}
  Sp     <- %s*
]]

local errors

function recorderror (subject, pos, label)
  local line, col = re.calcline(subject, pos)
  table.insert(errors, { line = line, col = col, msg = errmsgs[labels[label]] })
  return true 
end

function sync (p)
  return '( !(' .. p .. ') .)*'
end

local grec = re.compile(
  "S         <- %g //{errComma} ErrComma //{errId} ErrId" .. "\n" ..
  "ErrComma  <-  ('' -> 'errComma' => recorderror) " .. sync('[a-z]+') .. "\n" ..
  "ErrId     <-  ('' -> 'errId' => recorderror) " .. sync('","') .. "-> default" 
  , {g = g, recorderror  = recorderror, default = "NONE"}
)

function mymatch (g, s)
  errors = {}
  subject = s  
  io.write("Input: ", s, "\n")
  local r = { g:match(s) }
  io.write("Captures (separated by ';'): ")
  for k, v in pairs(r) do
    io.write(v .. "; ")
  end
  io.write("\nSyntactic errors found: " .. #errors)
  if #errors > 0 then
    io.write("\n")
    local out = {}
    for i, err in ipairs(errors) do
      local msg = "Error at line " .. err.line .. " (col " .. err.col .. "): " .. err.msg
      table.insert(out,  msg)
    end
    io.write(table.concat(out, "\n"))
  end
  print("\n")
  return r
end

print(mymatch(grec, "one,two"))
-- Captures (separated by ';'): one; two; 
-- Syntactic errors found: 0

print(mymatch(grec, "one two three"))
-- Captures (separated by ';'): one; two; three; 
-- Syntactic errors found: 2
-- Error at line 1 (col 4): expecting ','
-- Error at line 1 (col 8): expecting ','

print(mymatch(grec, "1,\n two, \n3,"))
-- Captures (separated by ';'): NONE; two; NONE; NONE; 
-- Syntactic errors found: 3
-- Error at line 1 (col 1): expecting an identifier
-- Error at line 2 (col 6): expecting an identifier
-- Error at line 3 (col 2): expecting an identifier

print(mymatch(grec, "one\n two123, \nthree,"))
-- Captures (separated by ';'): one; two; three; NONE; 
-- Syntactic errors found: 3
-- Error at line 2 (col 1): expecting ','
-- Error at line 2 (col 5): expecting ','
-- Error at line 3 (col 6): expecting an identifier

Arithmetic Expressions

Here's an example of an LPegLabel grammar that matches an expression. We have used a function expect, that takes a pattern patt and a label as parameters and builds a new pattern that throws this label when patt fails.

When a subexpression is syntactically invalid, a default value of 1000 is provided by the recovery pattern, so the evaluation of an expression should always produce a numeric value.

In this example, we can see that it may be a tedious and error prone task to build manually the recovery grammar grec. In the next example we will show how to build the recovery grammar in a more automatic way.

local m = require"lpeglabel"
local re = require"relabel"

local labels = {
  {"ExpTermFirst",  "expected an expression"},
  {"ExpTermOp",   "expected a term after the operator"},
  {"MisClose",  "missing a closing ')' after the expression"},
}

local function labelindex(labname)
  for i, elem in ipairs(labels) do
    if elem[1] == labname then
      return i
    end
  end
  error("could not find label: " .. labname)
end

local errors, subject

local function expect(patt, labname)
  local i = labelindex(labname)
  return patt + m.T(i)
end


local num = m.R("09")^1 / tonumber
local op = m.S("+-")

local function compute(tokens)
  local result = tokens[1]
  for i = 2, #tokens, 2 do
    if tokens[i] == '+' then
      result = result + tokens[i+1]
    elseif tokens[i] == '-' then
      result = result - tokens[i+1]
    else
      error('unknown operation: ' .. tokens[i])
    end
  end
  return result
end

local g = m.P {
  "Exp",
  Exp = m.Ct(m.V"OperandFirst" * (m.C(op) * m.V"Operand")^0) / compute,
  OperandFirst = expect(m.V"Term", "ExpTermFirst"),
  Operand = expect(m.V"Term", "ExpTermOp"),
  Term = num + m.V"Group",
  Group = "(" * m.V"Exp" * expect(")", "MisClose"),
}

function recorderror(pos, lab)
  local line, col = re.calcline(subject, pos)
  table.insert(errors, { line = line, col = col, msg = labels[lab][2] })
end

function record (labname)
  return (m.Cp() * m.Cc(labelindex(labname))) / recorderror
end

function sync (p)
  return (-p * m.P(1))^0
end

function defaultValue (p)
  return p or m.Cc(1000) 
end

local grec = m.P {
  "S",
  S = m.Rec(m.V"A", m.V"ErrExpTermFirst", labelindex("ExpTermFirst")), 
  A = m.Rec(m.V"Sg", m.V"ErrExpTermOp", labelindex("ExpTermOp")),
  Sg = m.Rec(g, m.V"ErrMisClose", labelindex("MisClose")),
  ErrExpTermFirst = record("ExpTermFirst") * sync(op + ")") * defaultValue(),
  ErrExpTermOp = record("ExpTermOp") * sync(op + ")") * defaultValue(),
  ErrMisClose = record("MisClose") * sync(m.P")") * defaultValue(m.P""),
}

local function eval(input)
  errors = {}
  io.write("Input: ", input, "\n")
  subject = input
  local result, label, suffix = grec:match(input)
  io.write("Syntactic errors found: " .. #errors, "\n")
  if #errors > 0 then
    local out = {}
    for i, err in ipairs(errors) do
      local pos = err.col
      local msg = err.msg
      table.insert(out, "syntax error: " .. msg .. " (at index " .. pos .. ")")
    end
    print(table.concat(out, "\n"))
  end
  io.write("Result = ")
  return result  
end

print(eval "90-70-(5)+3")
-- Syntactic errors found: 0
-- Result = 18

print(eval "15+")
-- Syntactic errors found: 1
-- syntax error: expected a term after the operator (at index 3)
-- Result = 1015

print(eval "-2")
-- Syntactic errors found: 1
-- syntax error: expected an expression (at index 1)
-- Result = 998

print(eval "1+()+")
-- Syntactic errors found: 2
-- syntax error: expected an expression (at index 4)
-- syntax error: expected a term after the operator (at index 5)
-- Result = 2001

print(eval "1+(")
-- Syntactic errors found: 2
-- syntax error: expected an expression (at index 3)
-- syntax error: missing a closing ')' after the expression (at index 3)
-- Result = 1001

print(eval "3)")
-- Syntactic errors found: 0
-- Result = 3

Automatically Building the Recovery Grammar

Below we rewrite the previous example to automatically build the recovery grammar based on information provided by the user for each label (error message, recovery pattern, etc). In the example below we also throw an error when the grammar does not match the whole subject.

local m = require"lpeglabel"
local re = require"relabel"

local num = m.R("09")^1 / tonumber
local op = m.S("+-")

local labels = {}
local nlabels = 0

local function newError(lab, msg, psync, pcap)
  nlabels = nlabels + 1
  psync = psync or m.P(-1)
  pcap = pcap or m.P""
  labels[lab] = { id = nlabels, msg = msg, psync = psync, pcap = pcap }
end

newError("ExpTermFirst", "expected an expression", op + ")", m.Cc(1000)) 
newError("ExpTermOp", "expected a term after the operator", op + ")", m.Cc(1000))
newError("MisClose",  "missing a closing ')' after the expression",  m.P")")
newError("Extra", "extra characters found after the expression") 

local errors, subject

local function expect(patt, labname)
  local i = labels[labname].id
  return patt + m.T(i)
end

local function compute(tokens)
  local result = tokens[1]
  for i = 2, #tokens, 2 do
    if tokens[i] == '+' then
      result = result + tokens[i+1]
    elseif tokens[i] == '-' then
      result = result - tokens[i+1]
    else
      error('unknown operation: ' .. tokens[i])
    end
  end
  return result
end

local g = m.P {
  "Exp",
  Exp = m.Ct(m.V"OperandFirst" * (m.C(op) * m.V"Operand")^0) / compute,
  OperandFirst = expect(m.V"Term", "ExpTermFirst"),
  Operand = expect(m.V"Term", "ExpTermOp"),
  Term = num + m.V"Group",
  Group = "(" * m.V"Exp" * expect(")", "MisClose"),
}

function recorderror(pos, lab)
  local line, col = re.calcline(subject, pos)
  table.insert(errors, { line = line, col = col, msg = labels[lab].msg })
end

function record (labname)
  return (m.Cp() * m.Cc(labname)) / recorderror
end

function sync (p)
  return (-p * m.P(1))^0
end

function defaultValue (p)
  return p or m.Cc(1000) 
end

local grec = g * expect(m.P(-1), "Extra")
for k, v in pairs(labels) do
  grec = m.Rec(grec, record(k) * sync(v.psync) * v.pcap, v.id)
end

local function eval(input)
  errors = {}
  io.write("Input: ", input, "\n")
  subject = input
  local result, label, suffix = grec:match(input)
  io.write("Syntactic errors found: " .. #errors, "\n")
  if #errors > 0 then
    local out = {}
    for i, err in ipairs(errors) do
      local pos = err.col
      local msg = err.msg
      table.insert(out, "syntax error: " .. msg .. " (at index " .. pos .. ")")
    end
    print(table.concat(out, "\n"))
  end
  io.write("Result = ")
  return result  
end

print(eval "90-70-(5)+3")
-- Syntactic errors found: 0
-- Result = 18

print(eval "15+")
-- Syntactic errors found: 1
-- syntax error: expected a term after the operator (at index 3)
-- Result = 1015

print(eval "-2")
-- Syntactic errors found: 1
-- syntax error: expected an expression (at index 1)
-- Result = 998

print(eval "1+()+")
-- Syntactic errors found: 2
-- syntax error: expected an expression (at index 4)
-- syntax error: expected a term after the operator (at index 5)
-- Result = 2001

print(eval "1+(")
-- Syntactic errors found: 2
-- syntax error: expected an expression (at index 3)
-- syntax error: missing a closing ')' after the expression (at index 3)
-- Result = 1001

print(eval "3)")
-- Syntactic errors found: 1
-- syntax error: extra characters found after the expression (at index 2)
-- Result = 3