Skip to content

How to Specify Compilation Results

Denis Kuniss edited this page Jan 24, 2023 · 12 revisions

From the introduction to Extended Affix Grammars (EAG) it should become clear that the formalism of EAG in general consist of 3 conceptional parts:

  • a meta grammar

  • a hyper grammar annotated with sentence forms of the language defined by the meta grammar, called affix forms

  • the principle of consistent substitutions

So, it’s obvious that the compilation result must be expressed using these conceptional parts. Mainly the first two are intended to do that.

Compilation Result Example

The simple example we want to go trough here is the recognition of an infix expression language which is mapped to an equivalent sentence of the Reversed Polish Notation (RPN). The RPN is often used in stack languages, e.g. Forth.

Let’s first show an example of an intended compilation. E.g., the add expression of the input language

1 + 0 + 1

would be equal to the RPN expression

1 ENTER 0 ENTER ADD 1 ENTER ADD

Think of the RPN expression as a stack processing machine: First 1 is entered on the stack, then 0 is entered on the stack. Now the ADD operation is performed on the 2 top most stack elements and the result is again put on the stack. After that 1 is entered on the stack. And again the 2 top most elements on the stack are added by applying the ADD operation.

But the semantic of the stack machine is not of interest here. We want to compile the first expression into the second. This is very similar what real compilers do.

First we need a meta grammar for our output language, the target language. It is given as

The meta grammar of the target language

Code = Code Code "ADD" | Number "ENTER".
Number = "0" | "1" .

To let the example stay small, we only allow 0 and 1 as terminals.

Now the context-free grammar for the input infix expression language, the source language, needs to be specified.

Expr: Term ExprTail.
Term: number.
ExprTail: "+" Term ExprTail | .
number: "0" | "1".

Again we restrict the terminals to 0 and 1 to get the example small.

Now we want to combine both grammars into an EAG grammar to specify how sentences of the source language are mapped to sentences of the target language.

The Expr start symbol and the mandatory output affix

We start with the start symbol of the context-free grammar and enrich it by an affix parameter.

Expr <+ Code2: Code>:
	Term <Code1> ExprTail<Code1, Code2>

The start symbol Expr of the context-free grammar is annotated by the affix parameter <+Code2: Code>. As mentioned in the introduction the start symbol must always have exactly one output affix parameter. The plus sign specifies: it is an output affix parameter. It is followed by an affix variable of the Code domain. It is a variable because it has the index 2 behind the meta rule non-terminal Code implying it is representing a sentence of the language defined by the Code rule from the meta grammar. The index is used to differentiate potentially different sentences of the same language domain.

That way, the sentences of the Code language domain represented by Code1 at Term<Code1> is potentially different than that represented by Code2 at the start symbol affix. At least we do not request by this EAG grammar specification that they have to be equal.

Let’s examine, how the hyper non-terminal Term is defined for understanding how it is used in the context of the start symbol rule. It is defined as follows.

Term <+ Number "ENTER": Code>:
    number <Number>.

At our example grammar a number is recognized as Term. It is written in lower case, means, it is a lexical rule. So, no white-spaces would be expected between terminals specified in this rule. Any white-space characters would end the number terminals recognition.

However, in our simple grammar this is not important as number recognizes only one character, either "0" or "1" as seen in the number rule.

number<+ "0": Number>: "0".
number<+ "1": Number>: "1".

In case the hyper terminal symbol "0" is recognized — that one on the right hand side of the rule, right from the 2nd colon — the meta terminal "0" — the "0" at the affix behind the "+" sign — is "created", we say synthesized. The meta terminal "0" is part of the Number domain which is defined as meta grammar rule

Number = "0" | "1".

The meta terminal "0" is synthesized because the affix at the number hyper non-terminal is an output affix, means, their affix values are consumed in the context of the occurrences of the non-terminal number in other rules, e.g. at the Term rule in our case.

The hyper rule for number can also be specified in a shorter form which will be especially helpful if the number of alternatives rises:

number:
      <+ "0": Number> "0"
    | <+ "1": Number> "1".

Now, as we know how number is defined, we can understand how it is applied in the context of the Term rule.

The Term hyper rule

Term <+ Number "ENTER": Code>:
    number <Number>.

Every Term hyper rule application recognizes a number by applying the number hyper rule which provides an affix value from the Number domain. This affix value is used at the output affix of the Term hyper non-terminal to synthesize the affix form Number "ENTER" which consists of the value represented by the affix variable Number and the meta terminal "ENTER". This affix form is a sentence form of the meta non-terminal rule Code, we saw already above. Here the second alternative:

Code = Code Code "ADD" | Number "ENTER".

Again we step up in the grammar, now knowing, how the hyper non-terminal Term is defined, and what values its affix provides to the context of its application in other rules, e.g. in the Expr hyper rule.

Expr <+ Code2: Code>:
	Term <Code1> ExprTail<Code1, Code2>

The affix variable Code1 at the Term hyper non-terminal in this rule is provided by an output affix. If it occurs a second time in this rule it is either consumed or needs to be compared for equality according to the principle of consistent substitution we talked about in the introduction.

The ExprTail hyper rule

So, to understand what is applied, consumption or comparison for equality, we need to examine how the hyper non-terminal ExprTail is defined at which affix the variable Code1 appears the second time. ExprTail is defined as

ExprTail<- Code1, + Code3>:
    "+" Term<Code2> ExprTail<Code1 Code2 "ADD", Code3>.
ExprTail<- Code, + Code>: .

The first affix of the ExprTail hyper non-terminal is an input affix due to the "-" sign. This means the value of the affix is consumed inside the rule. You can imagine this like the affix value is "passed in" into the rule from the context of the ExprTail's application in other rules. From that we know that in the hyper rule Expr the value of the affix Code1 is consumed when ExprTail get applied. Whereas, the affix variable Code3 gets "returned" to a rule context when ExprTail is applied inside a rule.

Back to the Expr rule.

Expr <+ Code2: Code>:
	Term <Code1> ExprTail<Code1, Code2>

From examining the affix signature of the ExprTail hyper non-terminal we know now, that the affix form represented by Code1 is provided to the rule context of Expr by the hyper non-terminal Term. This affix form gets "passed in", means consumed by the hyper-nonterminal ExprTail which itself provides the affix form represented by Code2 to this rule context.

As the output affix of the Expr hyper non-nonterminal on the rule’s left hand side is referencing the affix variable Code2 too, the affix form provided by ExprTail under Code2 is passed up to Expr. And because Expr is our start symbol, the affix form represented by Code2 is the result of the compilation.

The result of the compilation is always written out to the console.

Now, while we have understood, how the affix are handled at the start symbol rule, it is valuable step down in grammar and examine the last remaining rule - ExprTail.

ExprTail<- Code1, + Code3>:
    "+" Term<Code2> ExprTail<Code1 Code2 "ADD", Code3>.
ExprTail<- Code, + Code>: .

Before, we examined only the affix signature of that hyper non-terminal to understand what gets "passed in" to that hyper non-terminal and what "comes out".

The rule ExprTail consists of 2 alternatives. The first alternative first recognizes a plus "+" character at our source language followed by a Term recognition (please remember, Term provides an affix form in Code2 to this rule context). Subsequently, followed by a recursive reference to the ExprTail hyper non-terminal. This reference has 2 affix: <Code1 Code2 "ADD", Code3>.

The first one synthesizes an affix form which consists of the affix form represented by Code1 passed in to this rule context from ExprTail's application context. The second part of the synthesized affix form is the affix variable Code2 provided by Term before. The synthesized affix form is finished with the ADD meta terminal. This affix form matches the first alternative of the intially mentioned meta rule for Code

Code = Code Code "ADD" | Number "ENTER".

at our meta grammar. Because, the affix domain (you may call it - the affix type) of the first affix of ExprTail is Code. So, any synthesized affix form on that position must be a sentence form of the language created by Code. This is checked by the compiler generator as a compiler would check the type in static programming languages.

The affix form represented by the Code3 affix variable, at the ExprTail reference on the right hand side of the rule, is provided to this rule context. And, by referencing it at the affix of the left hand side of the rule, the affix form is passed up to the application context of the hyper non-terminal ExprTail at other rules.

As the first alternative of the ExprTail rule is specified right-recursive, a termination rule for that recursion is needed, repeated here:

ExprTail<- Code, + Code>: .

This rule recognizes the empty word at our source language. If so, it just returns the affix from passed in and represented by the Code affix variable back to its context, unchanged.

The affix data flow graphs

So, at the end the affix data flow applied at the ExprTail hyper rule is like this:

ExprTail rule affix data flow

And finally, we can examine the affix data flow at the Expr hyper rule as follows:

Expr rule affix data flow

The the affix data flow per hyper rule form an affix dependency graph which is input to the Evaluator Generator mentioned in the introduction chapter. By sorting the affix dependencies topologically and in case they are cycle free an affix graph traversal can be computed during generation time which makes the compiler run very effectively.

However, the graphs may be more sophisticated than here which requires classes of generators which are able to deal with that graphs. The arbitrary class defines the type of EAG grammar it is able to deal with. The Gamma compiler generator implements 3 different classes of such evaluator generators:

  • SLEAG – Strong Left Defined EAGs
    For generating single pass compilers, recursive decent parser with integrated affix evaluation restricted to grammars with left defined affixes.

  • Single Sweep EAGs
    Like SLEAG but analysis of affixes in any permutation, only one visit down stream.

  • SOAG - Sequential Orientable EAGs
    Most sophisticated affix analysis technique.
    More than a “multi pass” compiler.

Running the compiler generator on the example

Time to run compiler gnerator…​

Just click the following link. It will open a full blown VS code editor in your browser.

DISCLAIMER
A free registration at gitpod.io is required when following the link!

The compiler generator itself will already be pre-compiled. Just go to the console and run the compiler generator on the expr-bnf.eag example source as shown in the following console snapshot.

gitpod /workspace/gamma (master) $ ./gamma -s example/bnf/expr.eag
info: Epsilon 1.02   JoDe/SteWe  22.11.96
info: SOAG-Evaluatorgenerator 1.06 dk 14.03.98
info: Analysing Expr
info: Expr grammar is valid
info: Analyser: no warnings on Expr's hyper-nonterminals
info: predicates in Expr: 0
info: ELL(1) testing Expr
info: Expr grammar is ELL(1)
info: SLAG testing Expr
info: Expr grammar is SLAG
info: LexGen writing Expr
info: ELL(1) writing Expr
info: +rc +ct
dmd Expr.d ExprLex.d -g include/runtime.d src/io.d src/log.d src/epsilon/soag/listacks.d
gitpod /workspace/gamma (master) $

Now, the generated and already compiled compiler is available as ./Expr executable. It behaves like a Unix command such that input source can just be redirect to it as an input stream:

gitpod /workspace/gamma (master) $ echo 1 + 0 + 1 | ./Expr
info: Expr compiler (generated with epsilon)
1 ENTER 0 ENTER ADD 1 ENTER ADD
gitpod /workspace/gamma (master) $

The compiler produces the specified output in the RPN target language:

    1 ENTER 0 ENTER ADD 1 ENTER ADD

As an exercise, you may try to add further digits to the example. Don’t forget to specify them at the meta grammar as well as at the hyper grammmar! The editor will show warnings and errors in case the specification is wrong or incomplete helping to get it right.