This is the code for the "middle layer" of Kollos. Below it is Libmarpa, a library written in the C language which contains the actual parse engine.
The interface to this layer from above are a set of Lua calls, called the PLIF (Pure Lua InterFace). We expect to add a layer above this, which will parse from a DSL to the PLIF.
The PLIF will be a documented interface. while not as convenient as a DSL, the PLIF can be used for programming.
The mid-level implements the logic
-
translates precedences rules to BNF.
-
translates sequence rules to BNF.
-
performs a set of normalizations on the BNF, to allow Libmarpa to be more efficient.
-
performs the Libmarpa calls that create a Libmarpa grammar.
Rewriting grammars prior to parsing is a technique long known and discuesed. A major obstacle is that parsing is almost always a means to an end -- the application is parsing as a first step to applying a semantics, and that semantics is defined in terms of the original grammar. Many of the most tempting rewrites undo the relationship between the original grammar and its semantics, to the extent that recovering it is so costly that the rewrite is counter-productive.
Marpa/Kollos uses only "semantics-safe" rewrites -- rewrite which can be undone cheaply. This approach has been tested in Marpa::R2, which translates back and forth between an pre-rewrite grammar and a post-rewrite grammar. In Marpa::R2, "semantics-safe" rewrites can be undone cheaply not just in post-processing, but also "on the fly".
We now proceed to define "semantics-safe". We consider two grammars. The external grammar is the one before rewriting. The internal grammar is the one after rewriting.
A partial function,
call it brick()
,
maps symbols in the internal
grammar to the external grammar.
Let isym
be an internal symbol.
isym
is called a brick symbol iff
brick(isym)
is defined.
isym
is called a mortar symbol iff
isym
is not a brick symbol.
In a parse tree, a brick symbol is a terminal iff it has no brick descendents. A brick symbol is the brick root iff it has no brick ancestors. Note that a brick terminal may have mortar descendants, and a brick root may have mortar ancestors.
A brick symbol is a brick interior symbol iff it is not a brick root symbol and it is not a brick terminal symbol. A brick symbol is a brick non-terminal iff it is not a brick terminal symbol. A brick non-terminal may be either the brick root or a brick interior symbol.
In a parse tree using the external grammar, consider the subtree formed by taking a brick non-terminal as the root, and traversing the tree left-to-tree, stopping at the first brick node encountered. In the subtree formed by such a traversal, a node will be a brick iff it is a terminal of the subtree or the root node of the subtree. We say that the terminals of this subtree are the brick children of its root. The order of the brick children is the order in which they are encountered in the left-to-right traversal described above.
Let i
be an input, whose n
'th position
we write as i[n]
.
In a parse tree, each node can also be seen
as an "instance" of a symbol.
We can write the instance as the triple
<sym,start,length>
,
where sym
is a symbol of the grammar,
start
is the location in i
where the
instance starts,
and length
is the instance's length.
The first input location of the instnace will therefore
be i[start]
and the last location will be
i[start+length-1]
.
Let i
be an input,
and itree
be a parse tree that results
from parsing i
with the internal grammar.
Let xtree
be a parse tree that results
from parsing i
with the external grammar.
We say that itree
and xtree
are
brick-consistent,
if two conditons hold:
The Brick Terminal Condition:
<xsym,start,length>
is an instance in xtree
iff
<isym,start,length>
is an brick instance in itree
,
where brick(isym) == xsym
.
The Brick Non-terminal Condition:
<xsym,start,length>
is a non-terminal instance
whose children are
`<xsym1,start1,length1>`, `<xsym2,start2,length2>` ... `<xsym-n,start-n,length-n>`
in
xtree
iff
<isym,start,length>
is a non-terminal instance in
itree
whose brick children are
`<isym1,start1,length1>`, `<isym2,start2,length2>` ... `<isym-n,start-n,length-n>`,
where
`brick(isym1) == xsym1`, `brick(isym2) == xsym2` ... `brick(isym-n) == xsym-n`.
We're now in a position to define
a semantically-safe rewrite.
A rewrite is semantically-safe
iff, for every application of the rewrite,
and for every input i
,
when both the external grammar
and internal grammar is used to parse i
,
-
every parse tree produced by the internal grammar is brick-consistent with some parse tree produced by the external grammar,
-
and vice versa, every parse tree produced by the external grammar is brick-consistent with some parse tree produced by the internal grammar.
The previous definition is perhaps clearer
when rephrased so that applies only to
the special case in which both the internal
and external grammars are unambiguous.
A rewrite is semantically-safe
iff, for every application of the rewrite,
and for every input i
,
when both the external grammar
and internal grammar is used to parse i
,
the parse tree produced by the internal grammar
is brick-consistent with the
parse tree produced by the external grammar.
Staying semantically-safe limits the kinds of rewrites that can be applied. But, with caution, many simple and highly useful rewrite techniques be used:
-
Rules can be split into alternatives.
-
Intermediate rules with mortar symbols can be added.
-
Rules can be split up or joined.
-
Multiple brick symbols can be mapped to the same external symbol.
-
In particular, rules can be binarized.
Many of the rewrites are of non-BNF rules (sequences and precedenced rules) to BNF. Others are transformations of the BNF. All of these techniques find use in the Kollos code.
The rewrites use 3 grammars -- external (x), working (w) and internal (i). Each has a one letter abbreviation, as shown.
The External grammar is either entered directly via a PLIF (Pure Lua InterFace) or indirectly, when the LUIF (LUa InterFace) is parsed into PLIF form As the name suggests, it matches the external form closely. The external form never changes, and is to kept for use by those features which must work in terms of the user's original grammar. These features include debugging, tracing, events and semantics. The external rules and symbols are what a LUIF or PLIF user sees.
To do the rewrite, I create a working grammar. This is highly changeable, and the rewrites take place using it.
Once the rewrites are done a Libmarpa grammar is created. The Libmarpa grammar's rules and symbols are tracked in the rules and symbols of Kollos's internal grammar.
Once the Libmarpa grammar is created, the working grammar is no longer needed and can be garbage collected. The internal grammar has links back to the external grammar. These links allow translation from internal to external, and this can be done even "on the fly".
The transformations are not complex enough to justify the overhead of having an external, working and internal version for each lexeme. An external lexeme is created for every lexeme in the external representation.
Some lexemes do result from transformations. These are created directly as internal lexemes. Internal lexemes do not map to external lexemes, they are independent.
Lexemes are in a sense, free form. Each lexeme has a "type", and zero or more values, but it up to the lexer to interpret what the type and spec means.
Two type are reserved: cc
and string
.
A lexer can give them any meaning it likes,
but the standard meaning is that cc
is a
character class,
and that string
is a string -- a
sequence of characters.
Lexemes can also be open -- simply a symbol name, whose meaning is completely up to the lexer.
When using "seamless" parsing,
Kollos imposes specific requirements on the
lexemes in a grammar.
The only lexemes allowed are those of the
two reserved types: cc
and string
.
The spec of string
lexeme
is a Lua string which is interpreted in the
Lua standard way, as a sequence of characters.
The spec of a cc
lexeme
is also a Lua string,
but in the cc
case the string
specifies a Lua character class.
Internal and external lexemes are kept in
a single table, referenced by
type
and spec
.
A given type
and spec
references at most
one lexeme, which is either internal or
external, not both.
Create a RHS instance of type 'xlexeme' These should only be called inside of a call to the alternative_new() method. 'throw' is always set by the caller, which catches any error.
-- luatangle: section xlexeme constructors
local function mt_xlexeme(table, key)
-- print('Looking in xlexeme for key', key)
if key == 'type' then return 'xlexeme'
elseif key == 'nullable' then return false
elseif key == 'productive' then return true
-- eventually put a default semantics here
elseif key == 'semantics' then return {}
elseif key == 'name' then
local lexeme_type = table.lexeme_type
local name =
table.name_base
.. ':'
.. table.line
.. ':' .. lexeme_type
.. '-' .. table.id
table.name = name
return name
end
-- print('Did not find key in xlexeme', key)
return nil
end
function grammar_class.lexeme(grammar, type, spec)
if type:match('[^%w]') then
grammar:development_error(
'Lexeme type "' .. type .. '" is not allowed\n'
.. ' It contains a non-alphanumeric character\n',
grammar.name_base,
grammar.line
)
end
local lexeme_by_spec = grammar.lexemes_by_type[type]
if not lexeme_by_spec then
lexeme_by_spec = {}
grammar.lexemes_by_type = lexeme_by_spec
end
local old_lexeme = lexeme_by_spec[spec]
if old_lexeme then return old_lexeme end
-- used to form the name of the lexeme
local lexeme_id = grammar.next_lexeme_id + 1
grammar.next_lexeme_id = lexeme_id
local new_lexeme = {
lexeme_type = type,
id = lexeme_id,
line = grammar.line,
name_base = grammar.name_base,
spec = spec
}
setmetatable(new_lexeme, { __index = mt_xlexeme })
lexeme_by_spec[spec] = new_lexeme
return new_lexeme
end
function grammar_class.string(grammar, value)
return grammar_class.lexeme(grammar, 'string', value)
end
function grammar_class.cc(grammar, value)
return grammar_class.lexeme(grammar, 'cc', value)
end
Currently, subrules always return nil.
This object "carries" that semantics.
It is analogous to the xlexeme
and xalt
objects, which carry the semantics.
-- luatangle: section xnone object
local xnone = { -- luacheck: ignore xnone
semantics = {}
}
-- luatangle: section ilexeme constructor
local function mt_ilexeme(table, key)
if key == 'type' then return 'ilexeme'
elseif key == 'name_base' then return table.source.name_base
elseif key == 'line' then return table.source.line
elseif key == 'nullable' then return false
elseif key == 'productive' then return true
elseif key == 'name' then
local lexeme_type = table.lexeme_type
local name =
table.name_base
.. ':'
.. table.line
.. ':' .. lexeme_type
.. '-' .. table.id
table.name = name
return name
end
return nil
end
local function i_cc_lexeme_new(spec, source)
local lexeme_by_spec = grammar.lexemes_by_type.cc
if not lexeme_by_spec then
lexeme_by_spec = {}
grammar.lexemes_by_type = lexeme_by_spec
end
local old_lexeme = lexeme_by_spec[spec]
if old_lexeme then return old_lexeme end
-- used to form the name of the lexeme
local lexeme_id = grammar.next_lexeme_id + 1
grammar.next_lexeme_id = lexeme_id
local new_lexeme = {
lexeme_type = 'cc',
spec = spec,
source = source,
id = lexeme_id,
}
setmetatable(new_lexeme, { __index = mt_ilexeme })
lexeme_by_spec[spec] = new_lexeme
return new_lexeme
end
Selected Lua tables are regarded as "objects" with "fields". This is OO but, in keeping with Lua's approach, there is a definite emphasis on the pragmatic, "quick and dirty" over the theoretical and "clean".
For development purposes, after the working and internal grammars are created, I census the fields in the important tables. It's a cheap substitute for the strict typing I neither use or, in this context, want.
-- luatangle: section Census fields
local wrule_field_census = {}
local wrule_field_census_expected = {
id = true,
lhs = true,
max = true,
min = true,
nullable = true,
rh_instances = true,
separator = true,
separation = true,
source = true,
xalt = true,
}
for rule_id = 1,#wrule_by_id do
local wrule = wrule_by_id[rule_id]
if wrule then
if not wrule.name_base then
print("missing 'name_base' in wrule:", wrule.desc)
end
if not wrule.line then
print("missing 'line' in wrule:", wrule.desc)
end
local is_nullable_per_rhs = nullable_per_rhs(wrule)
local nullable_per_property = wrule.nullable
if is_nullable_per_rhs ~= nullable_per_property then
print("Nullable discrepancy in wrule, rhs vs. direct: ",
is_nullable_per_rhs,
nullable_per_property, wrule.desc)
end
for field,_ in pairs(wrule) do
if not wrule_field_census_expected[field] then
wrule_field_census[field] = true
end
end
end
end
for field,_ in pairs(wrule_field_census) do
print("unexpected wrule field:", field)
end
With a nil position, it also works for non-dotted rules. Works with both internal and working rules.
-- luatangle: section declare show_dotted_rule()
local function show_dotted_rule(rule_props, dot_position)
local pieces = {
rule_props.lhs.name,
"::="}
for dot_ix = 1,#rule_props.rh_instances do
pieces[#pieces+1] = rule_props.rh_instances[dot_ix].element.name
end
if dot_position then
if dot_position >= 0 then
table.insert(pieces, dot_position+3, '.')
else
pieces[#pieces+1] = '.'
end
end
return table.concat(pieces, ' ')
end
### Working rule constructor
-- luatangle: section wrule constructor
local function nullable_per_rhs(wrule)
local rh_instances = wrule.rh_instances
for rh_ix = 1,#rh_instances do
local rh_instance = rh_instances[rh_ix]
if not rh_instance.nullable then return end
end
return true
end
local function wrule_new(rule_args)
local max = rule_args.max or 1
local min = rule_args.min or 1
local separator = rule_args.separator
local lhs = rule_args.lhs
local rh_instances = rule_args.rh_instances
local wrule = {
lhs = lhs,
max = max,
min = min,
nullable = rule_args.nullable,
rh_instances = rh_instances,
separation = rule_args.separation,
separator = separator,
source = rule_args.source,
xalt = rule_args.xalt,
}
setmetatable(wrule, {
__index = function (table, key)
if key == 'type' then return 'wrule'
elseif key == 'source' then return table.xalt or table.lhs
elseif key == 'line' then return table.source.line
elseif key == 'name_base' then return table.source.name_base
elseif key == 'desc' then return show_dotted_rule(table)
else return end
end
})
wrule_by_id[#wrule_by_id+1] = wrule
wrule.id = #wrule_by_id
return wrule
end
Assumes it is passed a wrule
whose symbols have
already been "stolen" by the internal grammar.
rhs
is the new RHS, if defined.
Otherwise, the RHS is stolen from wrule
.
-- luatangle: section Define irule constructor
local mt_irule = {
__index = function (table, key)
if key == 'type' then return 'irule'
elseif key == 'source' then return table.xalt or table.lhs
elseif key == 'line' then return table.source.line
elseif key == 'name_base' then return table.source.name_base
elseif key == 'desc' then return show_dotted_rule(table)
elseif key == 'show_dotted' then return show_dotted_rule
else return end
end
}
local function irule_from_wrule(wrule, rh_instance1, rh_instance2)
local lhs = wrule.lhs
local rhs = { rh_instance1 }
if rh_instance2 then
rhs[2] = rh_instance2
end
local irule = {
lhs = lhs,
nullable = wrule.nullable,
rh_instances = rhs,
source = wrule.source,
xalt = wrule.xalt,
}
setmetatable(irule, mt_irule)
local rhs1_mxid = rh_instance1.mxid
local rhs2_mxid = rh_instance2 and rh_instance2.mxid or nil
local rule_mxid = grammar:_rule_new(lhs.mxid, rhs1_mxid, rhs2_mxid)
if rule_mxid < 0 then
local error_code = grammar:error()
print('Problem with rule', show_dotted_rule(irule))
if error_code == luif_err_duplicate_rule then
print('Duplicate rule -- non-fatal')
else
kollos_c.error_throw(error_code, 'problem with rule_new()')
end
end
irule.mxid = rule_mxid
irule_by_mxid[rule_mxid] = irule
-- print('Adding irule', irule, rule_mxid)
return irule
end
-- luatangle: section declare wrule_from_xalt_new()
local function wrule_from_xalt_new(xalt, hidden)
-- at top, use xrule.lhs
-- otherwise, new internal lhs
local new_lhs
local precedence_level = xalt.precedence_level
if xalt.parent_instance then
new_lhs = lh_wsym_ensure(xalt)
else
local old_lhs = xalt.xprec.xrule.lhs
if xalt.precedence_level then
new_lhs =
precedenced_wsym_ensure(old_lhs, precedence_level)
else
new_lhs = cloned_wsym_ensure(old_lhs)
end
end
local rh_instances = xalt.rh_instances
-- just skip empty alternatives
if #rh_instances == 0 then return end
-- for now don't process precedences
local work_rh_instances = {}
-- print("compiling RHS ", new_lhs.name, #rh_instances)
for rh_ix = 1,#rh_instances do
local x_rh_instance = rh_instances[rh_ix]
local x_element = x_rh_instance.element
local element_type = x_element.type
-- print("RHS element", rh_ix, x_element.name, x_element.nulling)
-- Do nothing for a nulling instance
if not x_element.nulling then
if element_type == 'xalt' then
-- This is always a wsym, because an xsym
-- LHS occurs only for a top level alternative,
-- and, if we are here, we are dealing with
-- a subalternative
-- Skip a nulling rh instance.
-- While the xalt cannot be nulling, a subalt
-- can be.
-- Because of these skips, a working rule may have
-- a right side shorter than the external alternative
-- from which it is derived.
-- But it will *never* be zero length, because the caller
-- made sure the external alternative is not nulling
if not x_rh_instance.nulling then
local subalt_wrule = wrule_from_xalt_new(x_element, true)
local subalt_lhs = subalt_wrule.lhs
local new_work_instance = winstance_new(
subalt_lhs,
hidden and xalt or nil,
hidden and rh_ix or nil)
work_rh_instances[#work_rh_instances+1] = new_work_instance
end
elseif element_type == 'xsym' then
local new_element
local level = x_rh_instance.precedence_level
if level ~= nil then
new_element =
precedenced_wsym_ensure(x_element, level)
else
new_element = cloned_wsym_ensure(x_element)
end
local new_work_instance = winstance_new(
new_element,
hidden and xalt or nil,
hidden and rh_ix or nil)
work_rh_instances[#work_rh_instances+1] = new_work_instance
elseif element_type == 'xlexeme' then
local new_element = wsym_from_xlexeme_new(x_element)
local new_work_instance = winstance_new(
new_element,
hidden and xalt or nil,
hidden and rh_ix or nil)
work_rh_instances[#work_rh_instances+1] = new_work_instance
else assert(0) -- TODO remove after developement
end
end
end
-- I don't think it's possible for there to be an empty
-- RHS, because the caller has ensured that the xalt is
-- not nulling
assert( #work_rh_instances > 0 ) -- TODO remove after development
local separator_xsym = xalt.separator
local separator_wsym
if separator_xsym then
separator_wsym = cloned_wsym_ensure(separator_xsym)
end
assert(not xalt.separation or separator_wsym)
local new_wrule = wrule_new{
lhs = new_lhs,
rh_instances = work_rh_instances,
min = xalt.min,
max = xalt.max,
nullable = xalt.nullable or nil,
separator = separator_wsym,
separation = xalt.separation,
xalt = xalt,
}
return new_wrule
end
External rules are complicated.
An xrule
contains one or more precedence
levels, called xprec
's.
An xprec
contains one or more alternatives,
and these alternatives may contain others.
The "subalternative" object is used both for
all alternatives -- both top-level and those below.
-- luatangle: section+ Census fields
-- census subalt fields
-- TODO: remove after development
local xsubalt_field_census = {}
local xsubalt_field_census_expected = {
action = true,
id = true,
id_within_top_alternative = true,
line = true,
max = true,
min = true,
name_base = true,
name = true,
nullable = true,
nulling = true,
parent_instance = true,
precedence_level = true,
productive = true,
rh_instances = true,
separation = true,
separator = true,
subname = true,
xprec = true,
}
for xsubalt_id = 1,#xsubalt_by_id do
local xsubalt = xsubalt_by_id[xsubalt_id]
for field,_ in pairs(xsubalt) do
if not xsubalt_field_census_expected[field] then
xsubalt_field_census[field] = true
end
end
end
for field,_ in pairs(xsubalt_field_census) do
print("unexpected xsubalt field:", field)
end
It is useful in Kollos to uniquely identify each RHS location of every rule. These are called RHS instance objects, or more often just instances.
Instances serve two purposes: First, not every RHS item is a symbol, and some object is needed to identify these non-symbol items.
Second, while semantics is defined in terms of rules and symbols, knowing the ID of a symbol is not always sufficient to know its semantics, which can vary by rule and even by position within rule. We can find the semantics by going back to the rule, but this will not always be convenient. Instances provide a convenient place to put information about the role a RHS item plays in the rule's semantics.
-- luatangle: section+ Census fields
local xinstance_field_census = {}
local xinstance_field_census_expected = {
associator = true,
element = true,
precedence_level = true,
rh_ix = true,
xalt = true,
}
for xsubalt_id = 1,#xsubalt_by_id do
local xsubalt = xsubalt_by_id[xsubalt_id]
local rh_instances = xsubalt.rh_instances
for rh_ix = 1,#rh_instances do
local rh_instance = rh_instances[rh_ix]
for field,_ in pairs(rh_instance) do
if not xinstance_field_census_expected[field] then
xinstance_field_census[field] = true
end
end
end
end
for field,_ in pairs(xinstance_field_census) do
print("unexpected xinstance field:", field)
end
Working instances currently contain nothing
except their element
(the RHS item)
and, for brick instances,
a <xalt, rh_ix>
duple that allows their
semantics to be found quickly.
Currently, working instances never change after
creation.
Separators will usually (always?) be
a brick element.
They do not have the
<xalt, rh_ix>
duple defined.
Instead, the xalt
is the value of
their separates
named field.
-- luatangle: section+ Census fields
local winstance_field_census = {}
local winstance_field_census_expected = {
element = true,
separates = true,
rh_ix = true,
xalt = true,
}
for rule_id = 1,#wrule_by_id do
local wrule = wrule_by_id[rule_id]
if wrule then
local rh_instances = wrule.rh_instances
for rh_ix = 1,#rh_instances do
local winstance = rh_instances[rh_ix]
if not winstance.name_base then
print("missing 'name_base' in winstance:", winstance.name)
end
if not winstance.line then
print("missing 'line' in winstance:", winstance.name)
end
if winstance.rh_ix and not winstance.semantics then
-- print(inspect(winstance))
print("instance has rh_ix but no semantics:", winstance.name)
end
if winstance.xalt and not winstance.semantics then
print("instance has xalt but no semantics:", winstance.name)
end
for field,_ in pairs(winstance) do
if not winstance_field_census_expected[field] then
winstance_field_census[field] = true
end
end
end
end
end
for field,_ in pairs(winstance_field_census) do
print("unexpected winstance field:", field)
end
Since
working instances do not change after creation,
they
can be used to represent what are actually different
instances.
This is tempting when the instance contains only
an element
field, so that the instance adds
essentially no information to the underlying element
.
And the elements
are of course reused at different RHS instances.
This reuse somewhat muddies the idea of instance objects, but it is convenient, and it is used in certain places in the working grammar.
-- luatangle: section+ Census fields
-- census xsym fields
-- TODO: remove after development
local xsym_field_census = {}
local xsym_field_census_expected = {
id = true,
lhs_xrules = true,
line = true,
name_base = true,
name = true,
nullable = true,
nulling = true,
productive = true,
semantics = true,
top_precedence_level = true,
type = true,
}
for xsym_id = 1,#xsym_by_id do
local xsym = xsym_by_id[xsym_id]
for field,_ in pairs(xsym) do
if not xsym_field_census_expected[field] then
xsym_field_census[field] = true
end
end
end
for field,_ in pairs(xsym_field_census) do
print("unexpected xsym field:", field)
end
An internal method for creating external symbols.
-- luatangle: section xsym constructor
local function xsym_new(grammar, args)
local name = args.name
if not name then
return nil, [[symbol must have a name]]
end
if type(name) ~= 'string' then
return nil, [[symbol 'name' is type ']]
.. type(name)
.. [['; it must be a string]]
end
-- decimal 055 is hyphen (or minus sign)
-- strip initial angle bracket and whitespace
name = name:gsub('^[<]%s*', '')
-- strip find angle bracket and whitespace
name = name:gsub('%s*[>]$', '')
local charclass = '[^a-zA-Z0-9_%s\055]'
if name:find(charclass) then
return nil, [[symbol 'name' characters must be in ]] .. charclass
end
-- normalize internal whitespace
name = name:gsub('%s+', ' ')
if name:sub(1, 1):find('[_\055]') then
return nil, [[symbol 'name' first character may not be '-' or '_']]
end
local xsym_by_name = grammar.xsym_by_name
local props = xsym_by_name[name]
if props then return props end
props = {
name = name,
type = 'xsym',
lhs_xrules = {},
-- am not trying to be very accurate about the line
-- it should be the line of an alternative containing that symbol
name_base = grammar.name_base,
-- the actual semantics is associated with the rule
semantics = {},
line = grammar.line,
}
xsym_by_name[name] = props
local xsym_by_id = grammar.xsym_by_id
xsym_by_id[#xsym_by_id+1] = props
props.id = #xsym_by_id
return props
end
terminal
: I am not sure this field is necessary.
-- luatangle: section+ Census fields
-- TODO: remove after development
local wsym_field_census = {}
local wsym_field_census_expected = {
id = true,
ilexeme = true,
line = true,
miid = true,
mxid = true,
name_base = true,
name = true,
nullable = true,
precedence_level = true,
source = true,
terminal = true,
xlexeme = true,
xnone = true,
xsym = true,
}
for _,wsym in pairs(wsym_by_name) do
if not wsym.name_base then
print("missing 'name_base' in wsym:", wsym.name)
end
if not wsym.line then
print("missing 'line' in wsym:", wsym.name)
end
local specifiers = {}
if wsym.ilexeme then
specifiers[#specifiers+1]
= 'ilexeme=' .. wsym.ilexeme.name
end
if wsym.xlexeme then
specifiers[#specifiers+1]
= 'xlexeme=' .. wsym.xlexeme.name
end
local semantics = wsym.semantics
if wsym.xnone then
if not semantics then
print("xnone defined, but not brick:", wsym.name)
end
specifiers[#specifiers+1]
= 'xnone'
end
if wsym.xsym then
if not semantics then
print("xsym defined, but not brick:", wsym.name)
end
specifiers[#specifiers+1]
= 'xsym=' .. wsym.xsym.name
end
if #specifiers > 1 then
print("More than 1 semantic for:", wsym.name, table.concat(specifiers, ' '))
end
if semantics then
if not wsym.xsym and not wsym.xnone and not wsym.xlexeme
then
print("brick, but no semantics specified", wsym.name)
end
end
for field,_ in pairs(wsym) do
if not wsym_field_census_expected[field] then
wsym_field_census[field] = true
end
end
end
for field,_ in pairs(wsym_field_census) do
print("unexpected wsym field:", field)
end
These wsym functions
are internal to the compile()
method
because they use up-values internal
to the compile()
method.
Having these function be internal
to compile()
also makes it easy for the
"working data" to be cleaned up -- it will just be garbage collected
when compile() returns. If these functions were top-level, the data
would have to be top-level as well.
-- luatangle: section wsym constructors
-- 2nd return value is true if this is
-- a new symbol
local function wsym_ensure(name)
local wsym_props = wsym_by_name[name]
if wsym_props then return wsym_props end
wsym_props = {
name = name,
}
setmetatable(wsym_props, {
__index = function (table, key)
if key == 'type' then return table.mxid and 'isym' or 'wsym'
elseif key == 'line' then return table.source.line
elseif key == 'name_base' then return table.source.name_base
elseif key == 'source' then
return table.xsym or table.xlexeme or table.ilexeme
elseif key == 'xsym' then return nil
elseif key == 'ilexeme' then return nil
elseif key == 'xlexeme' then return nil
elseif key == 'xnone' then return nil
else
local parent_object
= table.xsym or table.xlexeme or table.xnone or table.ilexeme
if parent_object then
-- if table.name == 'number' then
-- print("table.xsym:", table.xsym)
-- print("table.xnone:", table.xnone)
-- print("table.xlexeme:", table.xlexeme)
-- print("parent_object:", parent_object)
-- print("Looking in wsym for key:", key)
-- end
return parent_object[key]
end
return nil
end
end
}
)
wsym_by_name[name] = wsym_props
return wsym_props,true
end
-- Create a new *internal* lhs for this
-- alt
local function lh_wsym_ensure(alt)
local alt_name = alt.name
local name = 'lhs!' .. alt_name
local wsym_props,is_new = wsym_ensure(name)
if is_new then
wsym_props.nullable = alt.nullable
wsym_props.source = alt
end
return wsym_props
end
-- Create a new *internal* lhs for this
-- alt
local function lh_of_wrule_new(name, wrule)
local new_wsym,is_new = wsym_ensure(name)
-- TODO: remove after development
assert(is_new)
new_wsym.nullable = wrule.nullable
new_wsym.source = wrule
return new_wsym
end
local function precedenced_wsym_ensure(base_wsym, precedence)
local name = base_wsym.name .. '!prec' .. precedence
local new_wsym,is_new = wsym_ensure(name)
if is_new then
new_wsym.nullable = nil
new_wsym.xsym = base_wsym.xsym
end
return new_wsym
end
local function wsym_from_xlexeme_new(xlexeme)
local name = xlexeme.name .. '-sym'
local new_wsym,is_new = wsym_ensure(name)
assert(is_new) -- TODO delete after development
new_wsym.xlexeme = xlexeme
new_wsym.terminal = true
return new_wsym
end
local function cloned_wsym_ensure(xsym)
local name = xsym.name
local new_wsym,is_new = wsym_ensure(name)
if is_new then
new_wsym.nullable = xsym.nullable
new_wsym.terminal = xsym.terminal
new_wsym.xsym = xsym
end
return new_wsym
end
The RHS transitive closure is Jeffrey's coinage, to describe a kind of property useful in Marpa.
Let P
be a symbol property.
We will write P(sym)
if symbol sym
has property P.
We say that the symbol property holds of a rule r
,
or P(r)
,
if r
is of the form
LHS ::= RHS
,
where RHS
is is a series
of zero or more RHS symbols,
and P(Rsym)
for every Rsym
in RHS
.
A property P
is RHS transitive if and only if
when r = LHS ::= RHS
and P(r)
,
then P(LHS)
.
Note that the definition of a RHS transitive property implies that every LHS of an empty rule has that property. This is because, in the case of an empty rule, it is vacuously true that all the RHS symbols have the RHS transitive property.
Also note the definition only describes the transitivity of the
property, not which symbols have it.
That is, while P
is a RHS transitive property,
a symbol must have property P
if it appears on the LHS
of a rule with property P
.
the converse is not necessarily true:
A symbol may have property P
even if it never appears on the LHS
of a rule with property P
.
In Marpa, "being productive" and "being nullable" are RHS transitive properties
-- luatangle: section RHS transitive closure function
local function xrhs_transitive_closure(grammar, property)
local changes_made
-- ok to shadow upvalue property, I think
local function property_of_instance_element(element,
property) -- luacheck: ignore property
if element[property] ~= nil then
-- print('Already has', property, element[property])
return element[property]
end
if element.type == 'xsym' then
-- If a symbol with the property still not set,
-- return nil and the symbol id
return nil, element.id
end
-- If we are here, the element must be an xalt
local rh_instances = element.rh_instances
-- assume true, unless found false
local element_has_property = true
-- print('#rh_instances', #rh_instances)
for rh_ix = 0,#rh_instances do
-- print('rh_ix', rh_ix)
local child_element
-- As a bit of a hack,
-- An rh_ix of 0 has a special meaning:
-- check the separator
if rh_ix == 0 then
-- Check only if the separator is always used
-- by this sequence. There is always an internal
-- separator is min>2; and there is always a
-- terminating separator, if the separation
-- type is 'terminating'
if element.separation == 'terminating' or element.min>2 then
child_element = element.separator
end
else
local rh_instance = rh_instances[rh_ix]
child_element = rh_instance.element
end
-- Child element may be nil, if rh_ix==0 and we did not have
-- or are not checking the separator
if child_element then
local has_property = property_of_instance_element(child_element, property)
if has_property == nil then
return nil
elseif has_property == false then
element_has_property = false
break
end
end
end
element[property] = element_has_property
-- print("Setting" .. element.name .. " " .. property .. " to", element_has_property)
changes_made = true
-- If instance is top level
if element.is_top then
local lhs = element.lhs_of_top
-- print("Setting " .. lhs.name .. " " .. property .. " to ", element_has_property)
lhs[property] = element_has_property
end
return element_has_property
end
local xsubalt_by_id = grammar.xsubalt_by_id
-- Make LHS symbols consist with subalternatives
for xsubalt_id = 1,#xsubalt_by_id do
local xsubalt = xsubalt_by_id[xsubalt_id]
if xsubalt.is_top then
local has_property = xsubalt[property]
if has_property ~= nil then
local lhs = xsubalt.lhs_of_top
lhs[property] = has_property
end
end
end
changes_made = true
while changes_made do
changes_made = false
for xsubalt_id = 1,#xsubalt_by_id do
local xsubalt = xsubalt_by_id[xsubalt_id]
property_of_instance_element(xsubalt, property)
end
end
end
-- luatangle: section nullable semantics functions
-- Right now empty table indicates default semantics
-- Maybe do something more explicit?
local default_nullable_semantics = { }
-- Use the alternative, which the caller has ensured is
-- nullable, as the source of a nullable semantics
local function nullable_semantics_create(alternative)
local action = alternative.action
if action then
return { action = action }
end
-- return default semantics
return default_nullable_semantics
end
--[[
--
-- Find and return the nullable semantics of <symbol>.
-- On error, return nil and the error object, or else
-- throw the error, as specified by the grammar
--
-- Semantics are
--
-- 1) Those of the only nullable alternative, when there
-- is only one.
--
-- 2) Those of the explicit empty rule, if there is one
--
-- 3) The default semantics, if that is the semantics of
-- all the nullable alternatives
--
-- If none of the above are true, it is a fatal error
--
--]]
local function find_nullable_semantics(grammar, symbol)
local xrules = symbol.lhs_xrules
local nullable_alternatives = {}
local has_only_default_semantics = true
for xrule_ix = 1, #xrules do
local precedences = xrules[xrule_ix].precedences
for prec_ix = 1, #precedences do
local alternatives = precedences[prec_ix].top_alternatives
for alt_ix = 1, #alternatives do
local alternative = alternatives[alt_ix]
if alternative.nullable then
local rhs = alternative.rh_instances
if #rhs <= 0 then
return nullable_semantics_create(grammar, alternative)
end
nullable_alternatives[#nullable_alternatives+1] = alternative
if alternative.action then
has_only_default_semantics = false
end
end
end
end
end
if #nullable_alternatives == 1 then
return nullable_semantics_create(grammar, nullable_alternatives[1])
end
if has_only_default_semantics then
return default_nullable_semantics
end
-- If here, we consider the semantics ambiguous, and report
-- an error
local error_table = {
'grammar_new():' .. 'Ambiguous nullable semantics',
' That is not allowed',
' An explicit empty rule is one solution ...',
' The nullable semantics of an empty rule override all other choices.',
' The symbol with ambiguous semantics was <' .. symbol.name .. '>',
}
error_table[#error_table+1]
= ' Nullable alternative #1 is ' .. nullable_alternatives[1].name
error_table[#error_table+1]
= ' Nullable alternative #2 is ' .. nullable_alternatives[2].name
if #nullable_alternatives > 2 then
error_table[#error_table+1]
= ' Nullable alternative #'
.. #nullable_alternatives
.. ' is '
.. nullable_alternatives[#nullable_alternatives].name
end
-- For now, report just the rule.
-- At some point, find one of the alternatives
-- which was nullable, and report that
return nil,
grammar:development_error(
table.concat(error_table, '\n'),
nullable_alternatives[1].name_base,
nullable_alternatives[1].line
)
end
This code rewrites sequences rules to
eliminate the counts -- that is,
so that, in effect,
min = 1
and max = 1
.
The min
and max
fields are not actually changed
but will no longer be meaningful.
It is assumed at this point that
-
previous rewrites have eliminated
liberal
andterminating
separation, so that onlyproper
separation sequences and unseparated sequences remain. -
min > 0
. -
the sequence rule's RHS has length of exactly 1.
-
the sequence is non-trivial, that either
min ~= 1
ormax ~= 1
.
Per the above assumptions, there is exactly one
RHS symbol.
The RHS symbol is called the repetend.
It is the first (and only) symbol in working_rule
.
-- luatangle: section Create the repetend instance
local repetend_instance = working_wrule.rh_instances[1]
For separators, we reuse the same instance object for for multiple RHS instances. Working instances are never changed, and the same semantic information can be used in every case, so we can get away with this.
-- luatangle: section Create the separator instance if needed
local separator_instance
if separator then
separator_instance = winstance_new(separator)
separator_instance.separates = working_wrule.xalt
end
I call a block a sequence of fixed length.
I call a block a sequence of fixed length.
I call any other sequence a range.
If a range has no maximum length, I call it open.
If a range is not open, then it is closed.
If, in a sequence, min == 1
, then I say the
sequence is 1-based.
The following code works first dividing the sequence into one or two 1-based sequences. If there is only one, the sequece may be a block or a range. If there are two, the block always comes first. If there is a range, it may be either open or closed.
We start by determining what sequences we have:
-- luatangle: section Rewrite the sequence counts
local block_size
local range_size
if min == max then
block_size = max
elseif min == 1 then
range_size = max
else
block_size = min-1
range_size = max == -1 and -1 or max-block_size
end
At this point,
we've divided the range into at most two
1-based blocks.
block_size
, if non-nil, contains the size
of a block.
range_size
, if non-nil, contains the size
of a range.
We create a new RHS,
composed of
-
the block, if it is non-nil;
-
a separator, if it is needed; and
-
the range, if it is non-nil;
and we change the working_rule
accordingly.
-- luatangle: section+ Rewrite the sequence counts
-- luatangle: insert Function to rewrite 1-based blocks
-- luatangle: insert Function to rewrite 1-based ranges
local new_rhs = {}
if block_size then
local block_lhs = blk_lhs(block_size)
new_rhs[#new_rhs+1] = winstance_new(block_lhs)
end
if range_size then
local new_lhs = range_lhs(range_size)
if #new_rhs > 0 and separator_instance then
new_rhs[#new_rhs+1] = separator_instance
end
new_rhs[#new_rhs+1] = winstance_new(new_lhs)
end
working_wrule.rh_instances = new_rhs
-- luatangle: section disallow nulling separator
if separator and separator.nulling then
grammar:development_error(
who
.. 'Separator ' .. separator.name .. ' is nulling\n'
.. ' That is not allowed\n',
working_wrule.name_base,
working_wrule.line
)
end
We force a singleton RHS, by creating a new rule. Since we want a new, unique, symbol for the repetend, we do this even if the rule is already an singleton. Each sequence has a unique semantics, and we will use the repetend symbol name as a unique ID for this sequence.
-- luatangle: section Force singleton RHS in working_rule
local new_sym
= lh_of_wrule_new('rh1!' .. unique_number, working_wrule)
unique_number = unique_number + 1
local new_winstance = winstance_new(new_sym)
working_wrule.rh_instances = {new_winstance}
wrule_new{
lhs = new_sym,
rh_instances = rh_instances,
}
Elminate terminating
and liberal
separation
by rewriting them in terms of proper
separation.
After this rewrite all rules will either have no
separator, or proper
separation.
-- luatangle: section Normalize separation
if separation == 'terminating' or
separation == 'liberal'
then
local middle_sym
= lh_of_wrule_new('term!' .. unique_number, working_wrule)
unique_number = unique_number + 1
local middle_winstance = winstance_new(middle_sym)
unique_number = unique_number + 1
-- The old LHS of the wrule with the actual sequence,
-- which we are going to replace
local previous_lhs = working_wrule.lhs
wrule_new{
lhs = previous_lhs,
rh_instances = {
middle_winstance,
separator_instance,
}
}
working_wrule.lhs = middle_sym
-- If liberal separation, also add an
-- unterminated variant
if separation == 'liberal' then
wrule_new{
lhs = previous_lhs,
rh_instances = {
middle_winstance,
}
}
end
end
-- luatangle: section Check and expand lexemes
if at_bottom then
for rule_id = 1,#wrule_by_id do
local working_wrule = wrule_by_id[rule_id]
if working_wrule then
local working_rh_instances = working_wrule.rh_instances
for rh_ix = 1,#working_rh_instances do
local rh_instance = working_rh_instances[rh_ix]
local lexeme_type = rh_instance.lexeme_type
local spec = rh_instance.spec
if lexeme_type == 'cc' then
if type(spec) ~= 'string' then
grammar:development_error(
[[charclass spec is type ']]
.. type(spec)
.. [['; the spec must be a string]],
rh_instance.name_base,
rh_instance.line
)
end
if not spec:match('^%[.+%]$') then
grammar:development_error(
[[charclass in alternate must be in square brackets]])
end
elseif lexeme_type == 'string' then
if type(spec) ~= 'string' then
grammar:development_error(
[[spec in lexeme of type 'string' is type ']]
.. type(spec)
.. [['; the spec must be a string]],
rh_instance.name_base,
rh_instance.line
)
end
-- luatangle: insert expand string into internal 'cc' lexemes
elseif lexeme_type ~= nil then
grammar:development_error(
[[lexeme is of type ']]
.. type(spec)
.. [['; in 'at bottom' grammars, the type must be ]]
.. [['cc' or 'string']],
rh_instance.name_base,
rh_instance.line
)
end
end
end
end
end
If the grammar is an at_bottom
grammar
(one which reads and interprets its own characters
instead of passing this on to a lower layer),
then strings are broken into character classes.
-- luatangle: section expand string into internal 'cc' lexemes
local string_rhs = {}
local string_lhs = rh_instance.element
for string_ix = 1,#spec do
local char = spec:sub(string_ix,string_ix)
local cc_spec
if char:match('[%w]') then
cc_spec = '[' .. char .. ']'
else
cc_spec = string.format('[\\%d]', char:byte())
end
local ilexeme_new = i_cc_lexeme_new(cc_spec, string_lhs)
local new_wsym, is_new
new_wsym,is_new = wsym_ensure(ilexeme_new.name .. '-ilex')
assert(is_new) -- TODO delete after development
new_wsym.ilexeme = ilexeme_new
new_wsym.terminal = true
new_wsym.source = string_lhs
local new_instance = winstance_new(new_wsym)
string_rhs[#string_rhs+1] = new_instance
end
wrule_new(
{
lhs = string_lhs,
rh_instances = string_rhs
}
)
-- luatangle: section Binarize the working grammar
for rule_id = 1,#wrule_by_id do
local working_wrule = wrule_by_id[rule_id]
if working_wrule then
local working_rh_instances = working_wrule.rh_instances
local original_rule_length = #working_rh_instances
local final_wrule
local rule_source = working_wrule.source
if original_rule_length > 2 then
local last_lhs
-- luatangle: insert Binarize: Create the final rule
-- luatangle: insert Binarize: Create the medial rules
-- luatangle: insert Binarize: Replace original rule with initial rule
else final_wrule = working_wrule
end
-- luatangle: insert Binarize: Final rule hack
end
end
-- luatangle: section Binarize: Create the final rule
do
local pos = original_rule_length - 1
local is_new
last_lhs, is_new = wsym_ensure('chaf' .. rule_id .. '@' .. pos)
assert(is_new) -- TODO: remove after development
last_lhs.source = rule_source
local instance1 = working_rh_instances[pos]
local instance2 = working_rh_instances[pos+1]
local rhs = { instance1, instance2 }
if instance1.nullable and instance2.nullable then
last_lhs.nullable = true
end
final_wrule = wrule_new(
{
lhs = last_lhs,
rh_instances = rhs,
nullable = last_lhs.nullable
}
)
end
There will not be any medial rules
unless #rh_instances >= 4
.
On the first pass, last_lhs
was set
when the "final" rule of the binarization
was created.
-- luatangle: section Binarize: Create the medial rules
for pos = original_rule_length - 2, 2, -1 do
local lhs, is_new
lhs, is_new = wsym_ensure('chaf' .. rule_id .. '@' .. pos)
assert(is_new) -- TODO: remove after development
lhs.source = rule_source
local instance1 = working_rh_instances[pos]
local instance2 = winstance_new(last_lhs)
local rhs = { instance1, instance2 }
if instance1.nullable and instance2.nullable then
lhs.nullable = true
end
wrule_new(
{
lhs = lhs,
rh_instances = rhs,
nullable = lhs.nullable
}
)
last_lhs = lhs
end
-- luatangle: section Binarize: Replace original rule with initial rule
do
local instance1 = working_rh_instances[1]
local instance2 = winstance_new(last_lhs)
working_wrule.rh_instances = { instance1, instance2 }
end
This hack covers a special case.
It is required because, for every symbol, we need
to determine its place in an external rule, in order
to apply the semantics.
The xalt
and rh_ix
fields in the winstances do
this in most cases.
But what if a binarized rule contains two instances of the same nullable symbol. When we do the rewrite for nullables, we eliminate first one, then the other. But that results in two rules which are identical from the Libmarpa point of view. (Libmarpa does not see winstances.)
Various hacks will solve this -- for example we could allow ambiguous winstances, so that having only one rule is fine. That way the two cases -- nulling the first instance and nulling the second instance -- look different from the Libmarpa point of view.
final_wrule
was set previously
We apply the hack only if final_wrule
is nullable
and has two symbols on its RHS.
-- luatangle: section Binarize: Final rule hack
local rh_instances = final_wrule.rh_instances
if #rh_instances >= 2 then
local instance1 = rh_instances[1]
local instance2 = rh_instances[2]
if instance1.nullable and instance2.nullable then
-- the new LHS is "at" the final symbol of the original rule
-- We will not have used this position in any of the LHS
-- symbol names above
local new_lhs, is_new
new_lhs, is_new = wsym_ensure('chaf' .. rule_id .. '@' .. original_rule_length)
assert(is_new) -- TODO remove after development
new_lhs.source = rule_source
new_lhs.nullable = true
-- Create a new final rule for the binarization
wrule_new(
{
lhs = new_lhs,
rh_instances = { instance2 },
nullable = true
}
)
-- Replace the RHS in the existing "final rule",
-- which will not longer be final
final_wrule.rh_instances = { instance1, winstance_new(new_lhs) }
end
end
-- luatangle: section Augment the working grammar
local augment_symbol
if at_top then
augment_symbol = wsym_ensure('!augment')
local start_wsym = wsym_by_name[start_symbol_name]
local nullable = start_wsym.nullable or nil
augment_symbol.nullable = nullable
augment_symbol.terminal = start_wsym.terminal
augment_symbol.name_base = grammar.name_base
augment_symbol.line = compile_call_line
wrule_new(
{
lhs = augment_symbol,
rh_instances = { winstance_new(
start_wsym,
grammar.name_base,
compile_call_line
) },
nullable = nullable
}
)
else
-- TODO
return nil,
grammar:development_error(who .. [[L0 grammars not yet implemented]])
end
Right now, cycles are dealt with in Libmarpa. In the future, I should deal with them here. That will involve splitting apart the elimination of nullables from the creation of Libmarpa rules and symbols, because cycles are easiest to dealt with in-between the two. It is easiest to spot unit rules in a grammar that is free of nulling and nullable symbols.
-- luatangle: section Create the internal grammar
grammar= kollos_c.grammar_new(grammar)
local irule_by_miid = {}
local irule_by_mxid = {}
-- luatangle: insert Define irule constructor
-- TODO: Add logic to deal with cycles
for rule_id = 1,#wrule_by_id do
local working_wrule = wrule_by_id[rule_id]
-- TODO: Not sure that I ever delete a rule, but
-- just in case.
if working_wrule then
local rh_instances = working_wrule.rh_instances
for rh_ix = 1,#rh_instances do
local rh_instance = rh_instances[rh_ix]
local element = rh_instance.element
if element.type == 'wsym' then
-- "steal" the wsym, making it an isym
element.mxid = grammar:_symbol_new()
end
-- TODO: delete after development
assert(rh_instance.type == 'isym')
end
local lhs = working_wrule.lhs
-- TODO: delete after development
if lhs.type == 'wsym' then
lhs.mxid = grammar:_symbol_new()
end
-- TODO: delete after development
assert(lhs.type == 'isym')
if #rh_instances == 1 then
irule_from_wrule(working_wrule, rh_instances[1])
else
local rh_instance1 = rh_instances[1]
local rh_instance2 = rh_instances[2]
irule_from_wrule(working_wrule, rh_instance1, rh_instance2)
if rh_instance1.nullable then
irule_from_wrule(working_wrule, rh_instance2)
end
if rh_instance2.nullable then
irule_from_wrule(working_wrule, rh_instance1)
end
end
end
end
-- TODO: At this point, all symbols are "stolen"
-- Will this always be the case? Don't know.
-- When I do know, I should either delete this
-- check or convert it to nil out the `wsym_by_name`
-- entry so that unstolen `wsym`'s may be
-- garbage-collected.
for _,wsym in pairs(wsym_by_name) do
assert(wsym.type == 'isym')
end
The next code
performs the rewrite for a block of size n
,
without separation.
Assumed to be available as an upvalue are
-
working_wrule
, the current sequence rule. -
repetend_instance
, the instance for the repetend.
-- luatangle: section Function to rewrite 1-based blocks
-- For memoizing blocks by cont
local blocks = {}
local function blk_lhs(n)
local rhs
local lhs = blocks[n]
if lhs then return lhs end
if n == 1 then
rhs = { repetend_instance }
elseif n == 2 then
if separator_instance then
rhs = { repetend_instance, separator_instance,
repetend_instance }
else
rhs = { repetend_instance, repetend_instance }
end
else
local n1 = pow2(n)
local n2 = n - n1
local lhs1 = blk_lhs(n1)
local lhs2 = blk_lhs(n2)
if separator_instance then
rhs = { winstance_new(lhs1), separator_instance,
winstance_new(lhs2) }
else
rhs = { winstance_new(lhs1), winstance_new(lhs2) }
end
end
local lhs_name = 'blk' .. n .. '!' .. repetend_instance.name
local is_new
lhs, is_new = wsym_ensure(lhs_name)
assert(is_new) -- TODO: remove after development
lhs.source = working_wrule.source
wrule_new(
{
lhs = lhs,
rh_instances = rhs
}
)
blocks[n] = lhs
return lhs
end
The next code
performs the rewrite for a range of size n
.
Assumed to be available as an upvalue are
-
working_wrule
, the current sequence rule. -
repetend_instance
, the instance for the repetend.
-- luatangle: section Function to rewrite 1-based ranges
-- For memoizing ranges by cont
local ranges = {}
local function range_lhs(n)
local lhs = ranges[n]
if lhs then return lhs end
local short_rhs, long_rhs
if n == 1 then
lhs = blk_lhs(1)
ranges[n] = lhs
return lhs
end
local lhs_name = 'rng' .. n .. '!' .. repetend_instance.name
local is_new
lhs, is_new = wsym_ensure(lhs_name)
assert(is_new) -- TODO: remove after development
lhs.source = working_wrule.source
if n == -1 then
short_rhs = { repetend_instance }
if separator_instance then
long_rhs = { winstance_new(lhs), separator_instance,
repetend_instance }
else
long_rhs = { winstance_new(lhs), repetend_instance }
end
-- n == 0 not allowed
-- n == 1 was handled above
elseif n == 2 then
short_rhs = { repetend_instance }
if separator_instance then
long_rhs = { repetend_instance, separator_instance,
repetend_instance }
else
long_rhs = { repetend_instance, repetend_instance }
end
else
local n1 = pow2(n)
local n2 = n - n1
local range_lhs1 = range_lhs(n1)
local block_lhs1 = blk_lhs(n1)
local lhs2 = range_lhs(n2)
short_rhs = { winstance_new(range_lhs1) }
if separator_instance then
long_rhs = { winstance_new(block_lhs1), separator_instance,
winstance_new(lhs2) }
else
long_rhs = { winstance_new(block_lhs1), winstance_new(lhs2) }
end
end
wrule_new(
{
lhs = lhs,
rh_instances = short_rhs
}
)
wrule_new(
{
lhs = lhs,
rh_instances = long_rhs
}
)
ranges[n] = lhs
return lhs
end
Many of the documented interfaces share
the line
, file
and throw
named argument,
and therefore also
some of the argument processing.
This code is in-lined in several places.
It expects the who
, grammar
, and args
variables to be set,
and it sets the line
and file
variables.
-- luatangle: section Common PLIF argument processing
if type(args) ~= 'table' then
return nil, grammar:development_error(who .. [[ must be called with a table of named arguments]])
end
local file = args.file
if file == nil then
file = grammar.file
end
if type(file) ~= 'string' then
return nil,
grammar:development_error(
who .. [[ 'file' named argument is ']]
.. type(file)
.. [['; it should be 'string']]
)
end
grammar.file = file
args.file = nil
local line = args.line
if line == nil then
if type(grammar.line) ~= 'number' then
return nil,
grammar:development_error(
who .. [[ line is not numeric for grammar ']]
.. grammar.name
.. [['; a numeric line number is required]]
)
end
line = grammar.line + 1
end
grammar.line = line
args.line = nil
I changed to literate programming in mid-project, and am "making my code literate" as I go. For earlier code, this means I get to it when there's a rewrite, or otherwise as occasion demands.
-- luatangle: section main
-- Kollos top level grammar routines
-- luacheck: std lua51
-- luacheck: globals bit
-- luacheck: globals __FILE__ __LINE__
local inspect = require "kollos.inspect" -- luacheck: ignore
local kollos_c = require "kollos_c"
local luif_err_development = kollos_c.error_code_by_name['LUIF_ERR_DEVELOPMENT']
local luif_err_none -- luacheck: ignore
= kollos_c.error_code_by_name['LUIF_ERR_NONE'] -- luacheck: ignore
local luif_err_duplicate_rule -- luacheck: ignore
= kollos_c.error_code_by_name['LUIF_ERR_DUPLICATE_RULE'] -- luacheck: ignore
local matrix = require "kollos.matrix"
local recce = require "kollos.recce"
local a8lex = require "kollos.a8lex"
local function here() return -- luacheck: ignore here
debug.getinfo(2,'S').source .. debug.getinfo(2, 'l').currentline
end
local rshift = bit.rshift
local lshift = bit.lshift
-- Return highest power of 2 less than its argument
-- Valid only for n>=2
local function pow2(n)
local pow = 1
while pow < n do
pow = lshift(pow, 1)
end
return rshift(pow, 1)
end
local grammar_class = { }
function grammar_class.file_set(grammar, file_name)
grammar.file = file_name or debug.getinfo(2,'S').source
end
function grammar_class.line_set(grammar, line_number)
grammar.line = line_number or debug.getinfo(2, 'l').currentline
end
-- note that a throw_flag of nil sets throw to *true*
-- returns the previous throw value
function grammar_class.throw_set(grammar, throw_flag)
local throw = true -- default is true
local old_throw_value = grammar.throw
if throw_flag == false then throw = false end
grammar.throw = throw
return old_throw_value
end
local function development_error_stringize(error_object)
return
"Grammar error at line "
.. error_object.line
.. " of "
.. error_object.file
.. ":\n "
.. error_object.string
end
function grammar_class.development_error(grammar, string, file, line)
local error_object
= kollos_c.error_new{
stringize = development_error_stringize,
code = luif_err_development,
string = string,
file = file or grammar.file,
line = line or grammar.line,
}
if grammar.throw then error(tostring(error_object)) end
return error_object
end
-- luatangle: insert declare show_dotted_rule()
-- luatangle: insert xsym constructor
-- luatangle: insert xlexeme constructors
function grammar_class.rule_new(grammar, args)
local who = 'rule_new()'
-- luatangle: insert Common PLIF argument processing
-- if line is nil, the "file" is actually an error object
if line == nil then return line, file end
local lhs = args[1]
args[1] = nil
if not lhs then
return nil, grammar:development_error([[rule must have a lhs]])
end
local field_name = next(args)
if field_name ~= nil then
return nil, grammar:development_error(who .. [[: unacceptable named argument ]] .. field_name)
end
local xrule_by_id = grammar.xrule_by_id
local new_xrule_id = #xrule_by_id + 1
local new_xrule = {
id = new_xrule_id,
line = grammar.line,
name_base = grammar.name_base,
}
setmetatable(new_xrule, {
__index = function (table, key)
if key == 'type' then return 'xrule'
elseif key == 'subname' then
local subname =
'r'
.. table.id
table.subname = subname
return subname
elseif key == 'name' then
local name =
table.name_base
.. ':'
.. table.line
.. table.subname
table.name = name
return name
end
return nil
end
})
xrule_by_id[new_xrule_id] = new_xrule
new_xrule.id = new_xrule_id
local symbol_props, symbol_error = xsym_new(grammar, { name = lhs })
if not symbol_props then
return nil, grammar:development_error(symbol_error)
end
new_xrule.lhs = symbol_props
local lhs_xrules = symbol_props.lhs_xrules
lhs_xrules[#lhs_xrules+1] = new_xrule
local current_xprec = {
level = 0,
xrule = new_xrule,
top_alternatives = {},
line = grammar.line,
name_base = grammar.name_base,
}
setmetatable(current_xprec, {
__index = function (table, key)
if key == 'type' then return 'xprec'
-- 'name' and 'subname' are computed "just in time"
-- and then memoized
elseif key == 'subname' then
local subname =
'p'
.. table.level
.. table.xrule.subname
table.subname = subname
return subname
elseif key == 'name' then
local name =
table.name_base
.. ':'
.. table.line
.. table.subname
table.name = name
return name
end
return nil
end
})
local xprec_by_id = grammar.xprec_by_id
xprec_by_id[#xprec_by_id+1] = current_xprec
grammar.current_xprec = current_xprec
new_xrule.precedences = { current_xprec }
end
function grammar_class.precedence_new(grammar, args)
local who = 'precedence_new()'
-- luatangle: insert Common PLIF argument processing
-- if line is nil, the "file" is actually an error object
if line == nil then return line, file end
local xrule_by_id = grammar.xrule_by_id
if #xrule_by_id < 1 then
return nil, grammar:development_error(who .. [[ called, but no current rule]])
end
local last_xprec = grammar.current_xprec
local new_level = last_xprec.level + 1
local current_xrule = xrule_by_id[#xrule_by_id]
local xrule_precedences = current_xrule.precedences
local new_xprec = {
xrule = current_xrule,
top_alternatives = {},
line = grammar.line,
name_base = grammar.name_base,
level = new_level,
}
setmetatable(new_xprec, getmetatable(last_xprec))
xrule_precedences[#xrule_precedences+1] = new_xprec
local field_name = next(args)
if field_name ~= nil then
return nil, grammar:development_error(who .. [[: unacceptable named argument ]] .. field_name)
end
if #xrule_by_id < 1 then
return nil, grammar:development_error(who .. [[ called, but no current rule]])
end
local xprec_by_id = grammar.xprec_by_id
xprec_by_id[#xprec_by_id+1] = new_xprec
grammar.current_xprec = new_xprec
end
local function xinstance_new(element, xalt, rh_ix)
local new_instance = {
xalt = xalt,
rh_ix = rh_ix,
element = element
}
setmetatable(new_instance, {
__index = function (table, key)
if key == 'line' then return element.line
elseif key == 'name_base' then return element.name_base
else return table.element[key] end
end
})
return new_instance
end
local function winstance_new(element, xalt, rh_ix)
assert(element)
local new_instance = {
xalt = xalt,
rh_ix = rh_ix,
element = element
}
setmetatable(new_instance, {
__index = function (table, key)
if key == 'line' then return element.line
elseif key == 'name_base' then return element.name_base
elseif key == 'element' then return nil
else return table.element[key] end
end
})
return new_instance
end
for k,v in pairs(kollos_c) do
if k:match('^[_]?grammar_') then
local c_wrapper_name = k:gsub('grammar', '', 1)
grammar_class[c_wrapper_name] = v
end
end
-- luatangle: insert RHS transitive closure function
-- luatangle: insert nullable semantics functions
-- luatangle: insert subalternative_new() internal method
-- luatangle: insert Grammar alternative_new() method
-- luatangle: insert Grammar compile() method
-- luatangle: insert Grammar constructor
Throw is always set for this method. The error is caught by the caller and re-thrown or not, as needed.
-- luatangle: section subalternative_new() internal method
local function _top_xalt_lhs(top_xalt)
local xprec = top_xalt.xprec
local xrule = xprec.xrule
return xrule.lhs
end
local function _xalt_lhs(xalt)
-- parent_instance field is only defined
-- when the xalt is the child of *exactly*
-- one instance.
local parent = xalt
while true do
local grandparent = parent.parent_instance
if not grandparent then break end
parent = grandparent.xalt
end
return _top_xalt_lhs(parent)
end
local function subalternative_new(grammar, src_subalternative)
-- use name of caller
local who = 'alternative_new()'
local new_rh_instances = {}
local current_xprec = grammar.current_xprec
local current_xrule = current_xprec.xrule
local xlhs_by_rhs = grammar.xlhs_by_rhs
-- used to form the name of the subalternative
local id_within_top_alternative = grammar.alt_id_within_top_alternative
id_within_top_alternative = id_within_top_alternative + 1
grammar.alt_id_within_top_alternative = id_within_top_alternative
local new_subalternative = {
xprec = current_xprec,
line = grammar.line,
id_within_top_alternative =
id_within_top_alternative,
name_base = grammar.name_base
}
setmetatable(new_subalternative, {
__index = function (table, key)
if key == 'type' then return 'xalt'
elseif key == 'lhs' then return _xalt_lhs(table)
elseif key == 'is_top' then return not table.parent_instance
elseif key == 'lhs_of_top' then
return _top_xalt_lhs(table)
elseif key == 'subname' then
local subname =
'a'
.. table.id_within_top_alternative
.. table.xprec.subname
table.subname = subname
return subname
elseif key == 'name' then
local name =
table.name_base
.. ':'
.. table.line
.. table.subname
table.name = name
return name
end
return nil
end
})
local xsubalt_by_id = grammar.xsubalt_by_id
xsubalt_by_id[#xsubalt_by_id+1] = new_subalternative
local new_subalternative_id = #xsubalt_by_id
new_subalternative.id = new_subalternative_id
-- maxn() is used, because the src_alternative's are
-- part of a user interface, so that there might be nil's
-- mixed in the series anywhere.
-- These are fatal errors, and we have to make sure
-- we can detect them.
for rh_ix = 1, table.maxn(src_subalternative) do
local src_rh_instance = src_subalternative[rh_ix]
if not src_rh_instance then
grammar:development_error(
[[Problem with rule rhs item #]] .. rh_ix .. ' '
.. "wrong type: "
.. type(src_rh_instance)
)
end
local new_rh_instance
if type(src_rh_instance) == 'table' then
local instance_type = src_rh_instance.type
if not instance_type then
local new_rhs_xalt = subalternative_new(grammar, src_rh_instance)
new_rh_instance = xinstance_new(new_rhs_xalt, new_subalternative, rh_ix)
new_rhs_xalt.parent_instance = new_rh_instance
else
new_rh_instance = xinstance_new(src_rh_instance, new_subalternative, rh_ix)
end
else
local error_string
local new_rhs_sym
new_rhs_sym, error_string = xsym_new(grammar, { name = src_rh_instance })
if not new_rhs_sym then
-- using return statements even for thrown errors is the
-- standard idiom, but in this case, I think it is clearer
-- without the return
grammar:development_error(
[[Problem with rule rhs item #]] .. rh_ix .. ' ' .. error_string
)
end
new_rh_instance = xinstance_new(new_rhs_sym, new_subalternative, rh_ix)
xlhs_by_rhs[new_rhs_sym.id] = current_xrule.lhs.id
end
new_rh_instances[#new_rh_instances+1] = new_rh_instance
end
new_subalternative.rh_instances = new_rh_instances
local action = src_subalternative.action
if action then
if type(action) ~= 'function' then
grammar:development_error(
who
.. [[: action must be of type function; actual type is ']]
.. type(action)
.. [[']]
)
end
new_subalternative.action = action
src_subalternative.action = nil
end
local min = src_subalternative.min
local max = src_subalternative.max
if min ~= nil and type(min) ~= 'number' then
grammar:development_error(
who
.. [[: min must be of type 'number'; actual type is ']]
.. type(min)
.. [[']]
)
end
if max ~= nil and type(max) ~= 'number' then
grammar:development_error(
who
.. [[: max must be of type 'number'; actual type is ']]
.. type(min)
.. [[']]
)
end
if min == nil then
min = 1
if max == nil then max = 1 end
elseif max == nil then max = -1 end
new_subalternative.min = min
src_subalternative.min = nil
new_subalternative.max = max
src_subalternative.max = nil
local is_sequence = min ~= 1 or max ~= 1
local separator_symbol
local separator = src_subalternative.separator
if separator ~= nil then
if not is_sequence then
grammar:development_error(
who
.. ': separator specified, but alternative is not sequence\n'
)
end
local separator_type = type(separator)
if separator_type ~= 'string' then
grammar:development_error(
who
.. ': separator is type "'
.. separator_type
.. '"; it needs to be a string'
)
end
local symbol_error
separator_symbol, symbol_error
= xsym_new(grammar, { name = separator })
if not separator_symbol then
grammar:development_error(symbol_error)
end
new_subalternative.separator = separator_symbol
src_subalternative.separator = nil
end
local separation = src_subalternative.separation
if separation ~= nil then
if not separator_symbol then
grammar:development_error(
who
.. ': separation style '
.. [["]] .. separation .. [["]]
.. ' specified, but no separator\n'
)
end
if separation == 'proper' then
separation = nil -- 'proper' is the default
elseif separation ~= 'liberal'
and separation ~= 'terminating' then
grammar:development_error(
who
.. ': unknown separation style '
.. [["]] .. separation .. [["]]
.. ' specified\n'
)
end
new_subalternative.separation = separation
src_subalternative.separation = nil
end
assert(not new_subalternative.separation or
new_subalternative.separator)
if min == 0 then
if max == 0 then
grammar:development_error(
who
.. [[: a nulling sequence, where min == max == 0, is not allowed ]]
)
end
new_subalternative.productive = true
new_subalternative.nullable = true
end
for field_name,_ in pairs(src_subalternative) do
if type(field_name) ~= 'number' then
grammar:development_error(who .. [[: unacceptable named argument ]] .. field_name)
end
end
return new_subalternative
end
-- luatangle: section Grammar alternative_new() method
function grammar_class.alternative_new(grammar, args)
local who = 'alternative_new()'
-- luatangle: insert Common PLIF argument processing
-- if line is nil, the "file" is actually an error object
if line == nil then return line, file end
grammar.alt_id_within_top_alternative = 0
local old_throw_value = grammar:throw_set(true)
local ok, new_alternative = pcall(function () return subalternative_new(grammar, args) end)
-- if ok is false, then new_alterative is actually an error object
if not ok then
if old_throw_value then error(new_alternative, 2)
else return nil, new_alternative end
end
grammar:throw_set(old_throw_value)
new_alternative.xprec = grammar.current_xprec
local xprec_top_alternatives = grammar.current_xprec.top_alternatives
xprec_top_alternatives[#xprec_top_alternatives+1] = new_alternative
local xtopalt_by_ix = grammar.xtopalt_by_ix
xtopalt_by_ix[#xtopalt_by_ix+1] = new_alternative
end
-- luatangle: section grammar miid_name() method
function grammar_class.miid_name(grammar, miid)
local isym_by_miid = grammar.isym_by_miid
local isym = isym_by_miid[miid]
if not isym then
return '[MIID' .. miid .. ']'
end
return isym.name
end
-- luatangle: section grammar show_dotted_irl() method
function grammar_class.show_dotted_irl(grammar, irl_id, dot_position)
local lhs_id = grammar:__irl_lhs(irl_id)
local irl_length = grammar:__irl_length(irl_id)
local pieces = { grammar:miid_name(lhs_id), '::=' }
if dot_position < 0 then
dot_position = irl_length
end
for ix = 0, irl_length - 1 do
local rhs_nsy_id = grammar:__irl_rhs(irl_id, ix)
local rhs_nsy_name = grammar:miid_name(rhs_nsy_id)
pieces[#pieces+1] = rhs_nsy_name
end
if dot_position then
if dot_position >= 0 then
table.insert(pieces, dot_position+3, '.')
else
pieces[#pieces+1] = '.'
end
end
return table.concat(pieces, ' ')
end
-- luatangle: section Grammar compile() method
function grammar_class.compile(grammar, args)
local who = 'grammar.compile()'
-- luatangle: insert Common PLIF argument processing
local compile_call_line = line
local at_top = false -- luacheck: ignore at_top
local at_bottom = false -- luacheck: ignore at_bottom
local start_xsym
local start_symbol_name
if args.seamless then
at_top = true -- luacheck: ignore at_top
at_bottom = true -- luacheck: ignore at_bottom
start_symbol_name = args.seamless
args.seamless = nil
start_xsym
= grammar.xsym_by_name[start_symbol_name]
if not start_xsym then
return nil, grammar:development_error(
who
.. [[ value of 'seamless' named argument must be the start symbol]]
)
end
elseif args.start then
at_top = true -- luacheck: ignore at_top
args.start = nil
return nil, grammar:development_error(
who
.. [[ 'start' named argument not yet implemented]]
)
elseif args.lexer then
at_bottom = true -- luacheck: ignore at_bottom
args.lexer = nil
return nil, grammar:development_error(
who
.. [[ 'lexer' named argument not yet implemented]]
)
else
return nil, grammar:development_error(
who
.. [[ must have 'seamless', 'start' or 'lexer' named argument]]
)
end
local field_name = next(args)
if field_name ~= nil then
return nil, grammar:development_error(who .. [[: unacceptable named argument ]] .. field_name)
end
local xtopalt_by_ix = grammar.xtopalt_by_ix
-- luatangle: insert Error routines internal to compile()
-- Check for duplicate topalt's
-- ignore min,max,action, etc.
do
local sorted_table = {}
for ix = 1,#xtopalt_by_ix do
sorted_table[ix] = xtopalt_by_ix[ix]
end
-- return true if alt1 < alt2, nil if ==, otherwise true
local function comparator(alt1, alt2)
local type = alt1.type
if type ~= alt2.type then
return type < alt2.type
end
if type == 'xlexeme' then
if alt1.spec == alt2.spec then return nil end
return alt1.spec < alt2.spec
elseif type == 'xsym' then
if alt1.name == alt2.name then return nil end
return alt1.name < alt2.name
elseif type == 'xalt' then
-- Only an xalt can be the top, so only here do we
-- worry about the LHS
local lhs_name1 = alt1.xprec.xrule.lhs.name
local lhs_name2 = alt2.xprec.xrule.lhs.name
if lhs_name1 ~= lhs_name2 then
return lhs_name1 < lhs_name2
end
local rh_instance1 = alt1.rh_instances
local rh_instance2 = alt2.rh_instances
local rhs_length = #rh_instance1
if rhs_length ~= #rh_instance2 then
return rhs_length < #rh_instance2
end
for rh_ix = 1, rhs_length do
local result = comparator(
rh_instance1[rh_ix].element,
rh_instance2[rh_ix].element
)
if result ~= nil then return result end
end
return nil
else
-- Should never happen
error("Unknown type " .. type .. " in table.sort comparator")
end
end
table.sort(sorted_table, comparator)
for ix = 1, #sorted_table-1 do
if comparator(sorted_table[ix], sorted_table[ix+1]) == nil then
return nil,
grammar:development_error(
who
.. [[ Duplicate alternatives: ]]
.. sorted_table[ix].name
.. ' and '
.. sorted_table[ix+1].name
.. '\n'
)
end
end
end
local xsym_by_id = grammar.xsym_by_id
local matrix_size = #xsym_by_id+2
-- Not the real augment symbol, but a temporary that
-- "fakes" it
local augment_symbol_id = #xsym_by_id + 1
local terminal_sink_id = #xsym_by_id + 2
local reach_matrix = matrix.init(matrix_size)
if at_top then
matrix.bit_set(reach_matrix, augment_symbol_id, start_xsym.id)
end
xrhs_transitive_closure(grammar, 'nullable')
xrhs_transitive_closure(grammar, 'productive')
-- Start symbol must be a LHS
if #start_xsym.lhs_xrules <= 0 then
return nil,
grammar:development_error(
who
.. [[ start symbol must be LHS]] .. '\n'
.. [[ start symbol is <]] .. start_xsym.name '>\n'
)
end
-- Ban unproductive symbols (and therefore rules)
-- If we allow them, we must make sure that they, all
-- all symbols and rule they recursively make
-- unproductive are not used in what follows.
-- Much of the logic requires that all symbols be
-- productive
--
-- Also, we must always make sure that the start symbol
-- is productive
for symbol_id = 1,#xsym_by_id do
local symbol_props = xsym_by_id[symbol_id]
if not symbol_props.productive then
return nil,
grammar:development_error(
who
.. [[ unproductive symbol: ]]
.. symbol_props.name
)
end
end
-- Ban nullable precedenced rules.
-- They are just too confusing. Tf the user
-- *really* is sure they want it, and that she
-- knows what she is doing, she can write it out
-- in BNF.
-- Also, ensure that precedenced LHS is not shared
-- with any other rule. Again, this reduces confusion.
-- There is no loss of generality. Any grammar which
-- breaks this rule can be rewritten
-- by adding a dedicated LHS symbol for the
-- precedenced rule.
-- This can be done while preserving the semantics.
-- Also, this loop labels all precedenced alternatives
-- with their precedence level, in preparation for later
-- processing
local xrule_by_id = grammar.xrule_by_id
for xrule_id = 1,#xrule_by_id do
local xrule = xrule_by_id[xrule_id]
local precedences = xrule.precedences
-- If it is a rule with multiple precedences
if #precedences > 1 then
local lhs = xrule.lhs
if lhs.nullable then
return report_nullable_precedenced_xrule(grammar, xrule)
end
if #lhs.lhs_xrules > 1 then
return report_shared_precedenced_lhs(grammar, xrule, lhs)
end
end
end
-- Note -- using the nullability
-- of the subalternative is not very useful,
-- even as a clue, because the subalternative
-- may be nullable because min=0, or
-- non-nullable because of the separator
--
-- If the user really wants sequences with nullable
-- repetends, and knows what she is doing,
-- then she can write them out in BNF.
local xsubalt_by_id = grammar.xsubalt_by_id
for subalt_id = 1,#xsubalt_by_id do
local xsubalt = xsubalt_by_id[subalt_id]
if xsubalt.min ~= 1 or xsubalt.max ~= 1
then
-- Check to see if there are any
-- indelible (= non-nullable) elements in the
-- repetend
local item_is_indelible = false
local rh_instances = xsubalt.rh_instances
for rh_ix = 1,#rh_instances do
local rhs_instance = rh_instances[rh_ix]
if not rhs_instance.element.nullable then
item_is_indelible = true
break
end
end
if not item_is_indelible then
return report_nullable_repetend(grammar, xsubalt)
end
end
end
-- Nullable semantics is unique
for symbol_id = 1,#xsym_by_id do
local symbol_props = xsym_by_id[symbol_id]
if symbol_props.nullable then
local semantics, error_object
= find_nullable_semantics(grammar, symbol_props)
if not semantics then
-- the lower level did not throw the error, so it
-- should not be thrown
return nil, error_object
end
symbol_props.semantics = semantics
end
end
local xlhs_by_rhs = grammar.xlhs_by_rhs
for symbol_id = 1,#xsym_by_id do
local symbol_props = xsym_by_id[symbol_id]
-- every symbol reaches itself
matrix.bit_set(reach_matrix, symbol_id, symbol_id)
for _,lhs_id in pairs(xlhs_by_rhs) do
matrix.bit_set(reach_matrix, lhs_id, symbol_id)
end
if #symbol_props.lhs_xrules <= 0 then
matrix.bit_set(reach_matrix, symbol_id, terminal_sink_id)
symbol_props.productive = true
end
if symbol_props.lexeme then
matrix.bit_set(reach_matrix, augment_symbol_id, symbol_id)
end
end
--[[
All symbols are assumed to be productive, so if any reaches an indelible
element, then we mark it as reaching a terminal. This means no symbol
which derives it can be nulling. Indelible (or terminal) elements
include not just non-LHS symbols, but also charclasses and strings --
--]]
for xsubalt_id = 1,#xsubalt_by_id do
local xsubalt = xsubalt_by_id[xsubalt_id]
local separator = xsubalt.separator
local lhs_reaches_terminal = false
if separator and not separator.nullable then
lhs_reaches_terminal = true
else
local rh_instances = xsubalt.rh_instances
for rh_ix = 1,#rh_instances do
local rhs_instance = rh_instances[rh_ix]
if not rhs_instance.element.nullable then
lhs_reaches_terminal = true
break
end
end
end
if lhs_reaches_terminal then
local lhs = xsubalt.lhs
matrix.bit_set(reach_matrix, lhs.id, terminal_sink_id)
lhs.productive = true
end
end
matrix.transitive_closure(reach_matrix)
for symbol_id = 1,#xsym_by_id do
local symbol_props = xsym_by_id[symbol_id]
-- Later, make it so some symbols can be set to be "inaccessible ok"
if not matrix.bit_test(reach_matrix, augment_symbol_id, symbol_id) then
grammar:development_error(
who
.. "Symbol " .. symbol_props.name .. " is not accessible",
symbol_props.name_base,
symbol_props.line
)
end
-- Since all symbols are now productive, a symbol is nulling iff
-- it is nullable and does NOT reach a terminal
if symbol_props.nullable and
not matrix.bit_test(reach_matrix, symbol_id, terminal_sink_id)
then
symbol_props.nulling = true
end
-- A nullable lexeme is a fatal error
if #symbol_props.lhs_xrules <= 0 then
if symbol_props.nullable then
grammar:development_error(
who
"Symbol " .. symbol_props.name .. " is a nullable lexeme",
symbol_props.name_base,
symbol_props.line
)
end
symbol_props.terminal = true
end
end
--[[ COMMENTED OUT
for from_symbol_id,from_symbol_props in ipairs(xsym_by_id) do
for to_symbol_id,to_symbol_props in ipairs(xsym_by_id) do
if matrix.bit_test(reach_matrix, from_symbol_id, to_symbol_id) then
print( from_symbol_props.name, "reaches", to_symbol_props.name)
end
end
end
--]]
if start_xsym.nulling then
print(inspect(start_xsym))
grammar:development_error(
who
.. "Start symbol " .. start_xsym.name .. " is nulling\n"
.. " This is not yet implemented",
start_xsym.name_base,
start_xsym.line
)
end
local wrule_by_id = {}
local wsym_by_name = {}
-- luatangle: insert wsym constructors
-- luatangle: insert wrule constructor
-- luatangle: insert xnone object
-- luatangle: insert ilexeme constructor
-- luatangle: insert declare wrule_from_xalt_new()
local precedenced_instances = {}
local function gather_precedenced_instances(xalt, lhs_id, in_seq)
local rh_instances = xalt.rh_instances
for rh_ix = 1,#rh_instances do
local rh_instance = rh_instances[rh_ix]
local element = rh_instance.element
local type = element.type
if type == 'xsym' then
if element.id == lhs_id then
if in_seq then rh_instance.in_seq = true end
precedenced_instances[#precedenced_instances+1] = rh_instance
end
elseif type == 'xalt' then
local element_is_seq = element.min ~= 1 or element.max ~= 1
gather_precedenced_instances(element, lhs_id, in_seq or element_is_seq)
end
end
end
-- This logic
-- relies on a precedenced xrule having a dedicated LHS
for xrule_id = 1,#xrule_by_id do
local xrule = xrule_by_id[xrule_id]
local precedences = xrule.precedences
local lhs = xrule.lhs
-- If it is a rule with multiple precedences
-- Create "precedence ladder" of wrules,
-- and precedenced symbols
if #precedences > 1 then
local top_precedence_level = precedences[#precedences].level
lhs.top_precedence_level = top_precedence_level
for prec_ix = 1, #precedences do
local xprec = precedences[prec_ix]
local level = xprec.level
local alternatives = xprec.top_alternatives
for alt_ix = 1, #alternatives do
local alternative = alternatives[alt_ix]
alternative.precedence_level = level
precedenced_instances = {}
gather_precedenced_instances(
alternative, lhs.id,
false)
if #precedenced_instances > 0 then
if level == 0 then
-- Need a more precise explanation of what kind of
-- this *is* OK at precedence 0
grammar:development_error(
who
.. "Recursive alternative " .. alternative.name .. " at precedence 0\n"
.. " That is not allowed\n"
.. " Precedence 0 is the bottom level\n",
alternative.name_base,
alternative.line
)
end
local assoc = alternative.assoc or 'left'
-- First set them all to the next lower precedence
local associator_ix
-- Default is top level, so we do nothing for 'group' precedence
if assoc == 'left' then associator_ix = 1
elseif assoc == 'right' then associator_ix = #precedenced_instances
end
-- Second, check, correct and mark the associator
if associator_ix then
--
for ix = 1,#precedenced_instances do
precedenced_instances[ix].precedence_level = level-1
end
local associator_instance = precedenced_instances[associator_ix]
if associator_instance.in_seq then
-- At some point, I might add logic to allow associators
-- in sequences with min>=1, automatically rewriting to
-- break out the singleton. But the automatic
-- rewrite would be complicated, what with the various
-- sequence-separation and -termination options,
-- and it is quite possible the user is writing the rule
-- that way because he has not thought the matter through.
-- If the user does know what he is doing,
-- it may be best to require him to write out explicitly what
-- he wants.
grammar:development_error(
who
.. 'Precedence ' .. assoc .. '-associator is inside a sequence\n'
.. ' That is not allowed\n'
.. ' Marpa must be able to find a unique '
.. assoc
.. '-associator\n'
.. ' Possible solution: rewrite so that an unique '
.. assoc
.. ' is outside the sequence\n',
associator_instance.name_base,
associator_instance.line
)
end
associator_instance.precedence_level = level
associator_instance.associator = assoc
end
end
end
end
local next_ladder_lhs = cloned_wsym_ensure(lhs)
do
-- We need a symbol for the top precedence level
-- in addition to the original symbol
local wsym_props = wsym_ensure(lhs.name)
wsym_props.xsym = lhs
wsym_props.precedence_level = top_precedence_level
for level = top_precedence_level,0,-1 do
wsym_props = precedenced_wsym_ensure(lhs, level)
wsym_props.xsym = lhs
wsym_props.precedence_level = level
wrule_new{lhs = next_ladder_lhs,
rh_instances = {winstance_new(wsym_props)},
}
next_ladder_lhs = wsym_props
end
end
end
end
for topalt_ix = 1,#xtopalt_by_ix do
local xtopalt = xtopalt_by_ix[topalt_ix]
wrule_from_xalt_new(xtopalt)
end
-- will be used to ensure names are unique
local unique_number = 0
-- We change wrule_by_id as we proceed, and
-- so do cannot use ipairs
for rule_id = 1,#wrule_by_id do
local working_wrule = wrule_by_id[rule_id]
-- As of this writing, no wrules should be
-- deleted at this point,
-- but we are being careful
if working_wrule then
-- TODO -- split rules with internal nulling
-- events, and convert that event to
-- a completion event
local min = working_wrule.min
local separator = working_wrule.separator
local max = working_wrule.max
-- luatangle: insert disallow nulling separator
if min <= 0 then
min = 1
working_wrule.nullable = nil
end
-- Do not need to fix working_wrule.min
-- because we will no longer use it
-- Rewrite sequence rules
if min ~= 1 or max ~= 1 then
local rh_instances = working_wrule.rh_instances
local separation = working_wrule.separation
-- luatangle: insert Force singleton RHS in working_rule
-- luatangle: insert Create the repetend instance
-- luatangle: insert Create the separator instance if needed
-- luatangle: insert Normalize separation
-- luatangle: insert Rewrite the sequence counts
end
end
end
-- luatangle: insert Check and expand lexemes
-- luatangle: insert Binarize the working grammar
-- luatangle: insert Augment the working grammar
-- luatangle: insert Create the internal grammar
-- pairs(), because 0 is a valid id
for _,irule in pairs(irule_by_mxid) do
if irule then print(irule.desc) end
end
-- luatangle: insert Census fields
local start_mxid = wsym_by_name['!augment'].mxid
grammar:_start_symbol_set(start_mxid)
local status = grammar:_precompute()
if not status then
return nil, kollos_c.error_new{
"grammar precomputation failed",
debug.getinfo(2,'S').source,
debug.getinfo(2, 'l').currentline
}
end
local isym_by_mxid = {}
local isym_by_miid = {}
local mxids_by_cc = {}
for _,isym in pairs(wsym_by_name) do
local mxid = isym.mxid
local miid = kollos_c._grammar_xsy_nsy(grammar, isym.mxid)
isym_by_mxid[mxid] = isym
isym.miid = miid
isym_by_miid[miid] = isym
if isym.lexeme_type == 'cc' then
local cc_spec = isym.spec
local mxids = mxids_by_cc[cc_spec]
if not mxids then
mxids_by_cc[cc_spec] = { mxid }
else
mxids[#mxids+1] = mxid
end
end
end
local start_miid
local nsy_count = kollos_c._grammar_nsy_count(grammar)
-- print('NSY count', nsy_count)
for nsy_id = 0,nsy_count-1 do
if kollos_c._grammar_nsy_is_nulling(grammar, nsy_id) ~= 0 then
error('nulling NSY: ' .. nsy_id)
end
if kollos_c._grammar_nsy_is_start(grammar, nsy_id) ~= 0
then
start_miid = nsy_id
end
-- if start_miid ~= nsy_id and not isym_by_miid[nsy_id]
-- then
-- print('start NSY: ', nsy_id)
-- end
end
local irl_count = kollos_c._grammar_irl_count(grammar)
-- print('IRL count', irl_count)
for miid = 0,irl_count-1 do
local mxid = kollos_c._grammar_source_xrl(grammar, miid)
if not mxid then
local lhs = kollos_c._grammar_irl_lhs(grammar, miid)
if lhs ~= start_miid then
error('no-XRL IRL: lhs is ' .. lhs)
end
else
local irule = irule_by_mxid[mxid]
irule.miid = miid
irule_by_miid[miid] = irule
end
end
grammar.isym_by_name = wsym_by_name
grammar.isym_by_miid = isym_by_miid
grammar.isym_by_mxid = isym_by_mxid
grammar.irule_by_miid = irule_by_miid
grammar.irule_by_mxid = irule_by_mxid
grammar.mxids_by_cc = mxids_by_cc
grammar.inner_g = inner_g
grammar.default_lexer_factory = a8lex.factory
return grammar
end
-- luatangle: section Grammar constructor
-- this will actually become a method of the config object
local function grammar_new(config, args) -- luacheck: ignore config
local who = 'grammar_new()'
local grammar = {
throw = true,
name = '[NEW]',
name_base = '[NEW]',
xrule_by_id = {},
xprec_by_id = {},
xtopalt_by_ix = {},
xsubalt_by_id = {},
_type = 'grammar',
lexemes_by_type = {},
next_lexeme_id = 0,
xsym_by_id = {},
xsym_by_name = {},
-- maps LHS id to RHS id
xlhs_by_rhs = {},
}
setmetatable(grammar, {
__index = grammar_class,
})
if not args.file then
return nil, grammar:development_error(
who .. [[ requires 'file' named argument]],
debug.getinfo(2,'S').source,
debug.getinfo(2, 'l').currentline
)
end
if not args.line then
return nil, grammar:development_error(
who .. [[ requires 'line' named argument]],
debug.getinfo(2,'S').source,
debug.getinfo(2, 'l').currentline
)
end
-- luatangle: insert Common PLIF argument processing
-- if line is nil, the "file" is actually an error object
if line == nil then return line, file end
local name = args.name
if not name then
return nil, grammar:development_error([[grammar must have a name]])
end
if type(name) ~= 'string' then
return nil, grammar:development_error([[grammar 'name' must be a string]])
end
if name:find('[^a-zA-Z0-9_]') then
return nil, grammar:development_error(
[[grammar 'name' characters must be ASCII-7 alphanumeric plus '_']]
)
end
if name:byte(1) == '_' then
return nil, grammar:development_error([[grammar 'name' first character may not be '_']])
end
args.name = nil
grammar.name = name
-- This is used to name child objects of the grammar
-- For now, it is just the name of the grammar.
-- Someday I may create a method that allows it to be changed.
grammar.name_base = name
local field_name = next(args)
if field_name ~= nil then
return nil, grammar:development_error([[grammar_new(): unacceptable named argument ]] .. field_name)
end
return grammar
end
-- luatangle: section+ main
-- luatangle: insert grammar miid_name() method
-- luatangle: insert grammar show_dotted_irl() method
grammar_class.recce_new = recce.new
grammar_class.a8lex_new = a8lex.new
local grammar_static_class = {
new = grammar_new,
show_dotted_rule = show_dotted_rule
}
return grammar_static_class
--luatangle: write stdout main
-- luatangle: section Error routines internal to compile()
local function report_nullable_precedenced_xrule(grammar_arg, xrule)
local precedences = xrule.precedences
local nullable_alternatives = {}
for prec_ix = 1, #precedences do
local xprec = precedences[prec_ix]
local alternatives = xprec.top_alternatives
for alt_ix = 1, #alternatives do
local alternative = alternatives[alt_ix]
nullable_alternatives[#nullable_alternatives+1] = alternative
if #nullable_alternatives >= 3 then break end
end
if #nullable_alternatives >= 3 then break end
end
local error_table = {
'grammar_new():' .. 'precedenced rule is nullable',
' That is not allowed',
[[ The rule is ]] .. xrule.name
}
for ix = 1, #nullable_alternatives do
error_table[#error_table+1]
= ' Alternative ' .. nullable_alternatives[ix].name .. " is nullable"
end
-- For now, report just the rule.
-- At some point, find one of the alternatives
-- which was nullable, and report that
return nil,
grammar_arg:development_error(
table.concat(error_table, '\n'),
xrule.name_base,
xrule.line
)
end
local function report_nullable_repetend(grammar_arg, xsubalt)
local error_table = {
'grammar.compile():' .. 'sequence repetend is nullable',
' That is not allowed',
[[ The sequence is ]] .. xsubalt.name
}
return nil,
grammar_arg:development_error(
table.concat(error_table, '\n'),
xsubalt.name_base,
xsubalt.line
)
end
local function report_shared_precedenced_lhs(grammar_arg, precedenced_xrule, lhs)
local error_table = {
'grammar.compile():' .. 'precedenced rule shares LHS with another rule',
' That is not allowed: a precedenced rule must have a dedicated LHS',
[[ The precedenced rule is ]] .. precedenced_xrule.name,
[[ The LHS is ]] .. lhs.name,
[[ This LHS is shared with the rule ]] .. lhs.name,
}
-- just show at most 3 other rules
local xrules = lhs.xrules
local shown_count = 0
while shown_count >= 3 do
local other_xrule = xrules[shown_count+1]
if other_xrule == nil then break end
if other_xrule ~= precedenced_xrule then
error_table[#error_table+1] =
[[ This LHS is shared with the rule ]] .. other_xrule.name
shown_count = shown_count + 1
end
end
return nil,
grammar_arg:development_error(
table.concat(error_table, '\n'),
precedenced_xrule.name_base,
precedenced_xrule.line
)
end