Writing concise grammar specifications

Table of Contents

Shorthand affix domain declarations
Left side affix in rule alternatives
EBNF operators

The previous tutorials have used quite verbose grammar specification syntax to let the reader concentrate on the new approach of the Extended Affix Grammars while knowing how context-free grammars are noted in a BNF like syntax.

This, however, comes with some superfluous repetitions and requires additional non-terminals which can be omitted by using EBNF like hyper grammar specifications.

But first some other valuable short-hands for concise grammar specifications.

Shorthand affix domain declarations

At affix signature declarations (e.g., at the left hand side of a rule definition) almost the meta non-terminal domain must be given, separated by a semicolon. It’s very similar to a type definition in a programming language. E.g.

DeclAppl <+ Table: Table>:
    Statements<, Table>
.

Here, the output affix on the left hand side of the DeclAppl rule declares that the synthesized affix form must be in the Table domain. Means, the affix form used at that affix position must be derivable by the Table meta rule.

However, if the affix form is just an affix variable and not a real sentence form, what is quite often the case in EAG grammar specifications, the affix domain declaration can be omitted. It will be automatically derived from the affix variable name. So, the equivalent short-hand for the rule above is

DeclAppl <+ Table>:
    Statements<, Table>
.

This also works for affix variables with an index:

DeclAppl <+ Table1>:
    Statements<, Table1>
.

Left side affix in rule alternatives

In the DECL/APPL language the Statements rules were originated from the following BNF grammar rules.

Statements:
  "DECL" id Statements.
Statements:
  "APPL" id Statements.
Statements: .

Making it a hyper grammar rule by adding affix parameters, it became

Statements<- Table, + Table1>:
    "DECL" id<id> Statements<id ";" Table, Table1>.
Statements<- Table, + Table1>:
    "APPL" id<id> Find<id, Table> Statements<Table, Table1>.
Statements <- Table, + Table>: .

as seen in the example/bnf/decl-appl.eag grammar file.

The original BNF rules can also be noted in BNF as alternatives in one rule, what is more concise:

Statements:
    "DECL" id Statements.
  | "APPL" id Statements.
  |
  .

For a corresponding hyper grammar rule with alternatives the left hand side affix parameters before the colon needs to be added to the alternative specifications. The EAG language syntax at Gamma defines, they must be at the first position of each alternative. So, the 3 Statements hyper rules from above are equivalent to

Statements:
    | <- Table, + Table1>
        "DECL" id<id> Statements<id ";" Table, Table1>.
    | <- Table, + Table1>
        "APPL" id<id> Statements<Table, Table1>.
    | <- Table, + Table>
    .

EBNF operators

But there is an even more concise grammar specification possible if grammars are specified in the Extended Backus Naur Form (EBNF).

As hyper rules with EBNF operators are initially hard to cover, especially due to the affix parameters, we try to explain it her in two different approaches: First, by an example where the semantic of the specified language is quite clear and the reader may derive the semantic of EBNF operators and their associated affix parameters from accompanying explanation. Second, defining the meaning of an EBNF hyper rule example by the lowering in a corresponding BNF hyper rule example.

For the second approach, the EBNF rules may be found at example/ebnf.eag. Whereas the corresponding, equally named BNF hyper rule may be found at example/bnf/ebnf.eag.

Note	EBNF operator are only available for hyper grammar rules, not for meta grammar rules!

Repetitions

Repetitions are used quite often in formal language grammars. They become quite handy where a list of an undefined length needs to be defined, e.g. parameter lists or statement lists.

The challenge for hyper rules is to get all the needed affix parameters in and to understand their referential semantics.

Repetitions explained by examples

The BNF rules for the DECL/APPL language are as follows, including the Statements rules from above.

DeclAppl:
  Statements.

Statements:
  "DECL" id Statements.
Statements:
  "APPL" id Statements.
Statements: .

In EBNF The Statements rules can be written down in a more concise way as one rule:

Statements:
        { "DECL" id
        | "APPL" id
        }
        .

When looking into the BNF-like hyper rule, there are a lot of affix parameters to be distributed if we want to annotate that EBNF rule with affix parameters. Here are the original BNF-like rules, including the start non-terminal DeclAppl but excluding the Find predicate (which is not important in this context):

DeclAppl <+ Table>:
     Statements<, Table>
.

Statements<- Table, + Table1>:
   "DECL" id<id> Statements<id ";" Table, Table1>.
Statements<- Table, + Table1>:
   "APPL" id<id> Statements<Table, Table1>.
Statements <- Table, + Table>: .

First, every affix parameter list from the BNF-like hyper rules will occur somewhere in the EBNF-like hyper rule. The affix parameters of the hyper rule’s left hand sides get moved into the front of every alternative, means, behind the leading { or behind each |. The affix parameters after the recursive Statements application gets moved to the end of each EBNF alternative before the subsequent | or the trailing }. The single affix parameter at the final, empty word deriving Statement rule which finalizes the recursion gets moved behind the final trailing }. Here is the resulting EBNF hyper rule:

Statements<- Table, + Table1>:
    <Table, Table1>
        { <- Table, + Table1>
            'DECL' id<id>
            <id ';' Table, Table1>
        | <- Table, + Table1>
            'APPL' id<id>
            <Table, Table1>
        } <- Table, + Table>.

Try to think of the affix parameters in the sense of affix data flows as we know it from method parameters: The value of the first Table affix parameter at Statements non-terminal at left hand side of the rule is flowing to the first Table parameter at affix parameter list before the { as both are defined as input affix (-). The value of Table1 affix variable is flowing from the second affix parameter in front of { to the output (+) affix parameter Table1 at the Statements affix parameter list.

The curly braces {} however open new name spaces for affix variables, one per alternative. This mean, first name space starts from the opening curly brace { and ends at the subsequent pipe alternative operator |. The second name space starts at this pipe alternative operator and goes until the final closing brace }. The last name space covers the affix parameter list behind the closing brace. Hence, the Table1 affix variable at the first alternative is not the same as the Table1 affix variable at the second alternative, especially they are not required to hold the same affix value. Both could have an different index and the semantic of the whole EBNF rule would be the same.

E.g. the DECL alternative:

        { <- Table, + Table1>
            'DECL' id<id>
            <id ';' Table, Table1>
        |

Here, the Table's affix value flows into the affix form of the first affix parameter id ';' Table just before the pipe operator synthesizing a new affix value. Secondly, the value of the id affix variable is flowing from id hyper non-terminal to the same affix form Table is flowing in. Finally, the value of the Table1 affix variable coming out from recursion affix parameter before the pipe alternative operator is flowing upwards into the output affix parameter Table1 at the affix parameter declaration for this alternative behind the { opening brace.

The affix value flow for the APPL alternative is very similar. While at the final recursion closure affix parameters < - Table, + Table> behind the closing brace is just passing the incoming Table affix value into the outgoing Table affix variable.

Of course, the rule’s left hand side affix parameter and the first affix parameter before the { look redundant. And in fact, they are. It can be written as follows:

Statements:
    { <- Table, + Table1>
        'DECL' id <id>
        NotAlreadyDeclared  <id, Table>
        <id ';' Table, Table1>
    | <- Table, + Table1>
        'APPL' id <id>
        Declared <id, Table>
        <Table, Table1>
    } <- Table, + Table>.

This construct is called a named EBNF operator. Sometimes it make sense having it explicitly specified, e.g in case of naming a concept explicitly or, if it is reused at several rules. But if not, like here, it can be written even more concise by eliminating the Statements non-terminal completely and make it an anonymous non-terminal at the referencing rule, right at the DeclAppl rule:

DeclAppl <+ Table>:
    <, Table>
        { <- Table, + Table1>
            'DECL' id<id>
            <id ';' Table, Table1>
        | <- Table, + Table1>
            'APPL' id<id>
            <Table, Table1>
        } <- Table, + Table>.

Here, the initial value originally passed as affix to the Statements non-terminal at the DeclAppl rule becomes the input affix to the anonymous non-terminal representing the EBNF operation.

An example of the DECL/APPL language can be examined at the following Gitpod workspace: https://gitpod.io/#https://github.com/linkrope/gamma/blob/wiki-1.0.0/example/decl-appl.eag. Please not, that it contains the additional predicates Declared, NotAlreadyDeclared_, and Find for semantic checks which have been let out here for tutorial reasons.

Note	The generation of the compiler runs automatically when the according EAG file has been opened. Normally, the deep Gitpod link from above should open it implicitly. The generated and compiled compiler executable is accessible under the `build` directory.

Repetition explained by lowering to BNF

As said before, repetitions, as an EBNF operation, can be defined and may be understood through their BNF counterparts. E.g., the EBNF hyper rule

Repetition:
    { <+ 'a' L: L> 'a' <L>
    | <+ 'b' L: L> 'b' <L>
    } <+ : L>.

is equal to the BNF hyper rules

Repetition <+ 'a' L: L>:
    'a' Repetition <L>.
Repetition <+ 'b' L: L>:
    'b' Repetition <L>.
Repetition <+ : L>:
    .

The affix parameters are the same in both specifications but appear on different positions. However, their semantic is the same. The alternatives in the EBNF repetition hyper rule map directly to the BNF-like hyper rules with the same affix parameters.

Repetitions as an anonymous non-terminal

If the BNF-like Repetition non-terminal from above is used embedded in another rule (the leading and trailing '"' represent arbitrary left and right context), e.g.

RepetitionSubrule <+L>:
    '"'
    Repetition <L>
    '"'.

it may be replaced by its EBNF counter part:

RepetitionSubrule <+L>:
    '"'
    <L>
        { <+ 'a' L: L> 'a' <L>
        | <+ 'b' L: L> 'b' <L>
        } <+ : L>
    '"'.

In that case it is called an anonymous repetition non-terminal. The affix parameter list of that anonymous non-terminal is the <L> in front of the repetition operation.

Options

There are similar concise notations for options, instead of braces {}, brackets [] are used. Option is a special case of a repetition with one or no repetition. The affix positions are used similar. Just the recursion closure affix before each | and the trailing ] is omitted.

Options explained by example

Here is an example for an optional sign, intended to be used in front of a term:

SimpleExpr <+ sign>:
    <sign> [ <+ 'plus': sign> '+' | <+ 'minus': sign> '-' ] <+ 'plus': sign>
    Term.

sign = 'plus' | 'minus'.

Term: 'x'.

The language accepts only the terms + x, - x, and x and returns how the term is signed as compilation result. If no sign is given, the term is treated as a positive, c.f. a plus is returned as compilation result. This is specified in the empty optional alternative behind the closing ] operator. Term stands for any, normally more complex right context of the sign symbol.

For a working example try it out at the following Gitpod workspace: https://gitpod.io/#https://github.com/linkrope/gamma/blob/wiki-1.0.0/example/ebnf-option.eag

Note	The generation of the compiler runs automatically when the according EAG file has been opened. Normally, the deep Gitpod link from above should open it implicitly. The generated and compiled compiler executable is accessible under the `build` directory.

Here is an example output from that workspace:

gitpod /workspace/gamma/build (master) $ echo - x | ./SimpleExpr
info: SimpleExpr compiler (generated with epsilon)
minus
gitpod /workspace/gamma/build (master) $ echo + x | ./SimpleExpr
info: SimpleExpr compiler (generated with epsilon)
plus
gitpod /workspace/gamma/build (master) $ echo x | ./SimpleExpr
info: SimpleExpr compiler (generated with epsilon)
plus
gitpod /workspace/gamma/build (master) $

Options explained by lowering to BNF

As before, we want to define the meaning of an EBNF optional operator by giving the semantically equivalent BNF rules.

The following hyper rule with an optional operation

Option:
    [ <+ 'a': L> 'a'
    | <+ 'b': L> 'b'
    ] <+ : L>.

is equivalent the following set of BNF-like hyper rules:

Option <+ 'a': L>:
    'a'.
Option <+ 'b': L>:
    'b'.
Option <+ : L>:
    .

Again all affix parameters on both representations have the same meaning, the alternatives in the EBNF option hyper rule map directly to the BNF-like hyper rules with the same affix parameters.

Options as an anonymous non-terminal

If the BNF-like Option non-terminal from above is used embedded in another rule (the leading and trailing '"' represent arbitrary left and right context), e.g.

OptionSubrule <+ L>:
    '"'
    Option <L>
    '"'.

it may be replaced by its EBNF counter part:

OptionSubrule <+ L>:
    '"'
    <L>
        [ <+ 'a': L> 'a'
        | <+ 'a': L> 'b'
        ] <+ : L>
    '"'.

In that case it is called an anonymous option non-terminal. The affix parameter list of that anonymous non-terminal is the <L> in front of the option operation.

Grouping

Grouping is especially intended to group alternatives in more complex rules, where one of them is mandatory to be chosen.

Grouping explained by example

As an example, let’s specify a reduced expression language where terms can be combined by an operation.

Expr <+ op>:
    Term
    <op> ( <+ 'addition': op> '+' | <+ 'subtraction': op> '-' )
    Term.

op = 'addition' | 'subtraction'.

Term: 'x'.

The language accepts only the expressions x + x and x - x and returns as compilation result whether the expression is an addition or a subtraction. Term stands for any, normally more complex left and right context for the specified operation rule.

Again, you may try this out in a working Gitpod workspace: https://gitpod.io/#https://github.com/linkrope/gamma/blob/wiki-1.0.0/example/ebnf-grouping.eag.

Note	The generation of the compiler runs automatically when the according EAG file has been opened. Normally, the deep Gitpod link from above should open it implicitly. The generated and compiled compiler executable is accessible under the `build` directory.

Here is an example output from that workspace:

gitpod /workspace/gamma/build (master) $ echo x + x | ./Expr
info: Expr compiler (generated with epsilon)
addition
gitpod /workspace/gamma/build (master) $ echo x - x | ./Expr
info: Expr compiler (generated with epsilon)
subtraction
gitpod /workspace/gamma/build (master) $

Grouping explained by lowering to BNF

Also here, we want to defined the semantics of the EBNF grouping operator by lowering it to the corresponding BNF-like hyper rule.

The EBNF group rule

Group:
    ( <+ 'a': L> 'a'
    | <+ 'b': L> 'b'
    ).

is equivalent to

Group <+ 'a': L>:
    'a'.
Group <+ 'b': L>:
    'b'.

Grouping as an anonymous non-terminal

If the BNF-like Group nonterminal from above is used embedded in another rule (the leading and trailing '"' represent arbitrary left and right context), e.g.

GroupSubrule <+ L>:
    '"'
    Group <L>
    '"'.

it may be replaced by its EBNF counter part:

GroupSubrule <+ L>:
    '"'
    <L>
        ( <+ 'a': L> 'a'
        | <+ 'b': L> 'b'
        )
    '"'.

In that case it is called an anonymous group non-terminal. The affix parameter list of that anonymous non-terminal is the <L> in front of the grouping operation.