-
Notifications
You must be signed in to change notification settings - Fork 1
Introduction to Extended Affix Grammars
Parser generators are quite well known: They process a context free grammar specification to generate a parser.
The generated parser processes an arbitrary input text deciding whether this text is a sentence of the language described by the context free grammar the parser generator has generated the parser for. And if so, it almost generate a parse tree representing the input structurally.
A compiler generator, however, consist of an parser generator accompanied with an evaluator generator. The constructed parstree built by the generated parser is passed to the generated evaluator which performs semantic checks and generates the output in the specified target language.
However, what is the input to a compiler generator?
It must be partly a context free grammar specification. But this is not sufficient! Almost all real compilers are written for languages (like all general purpose programming languages) which are not context free, they are context-sensitive.
The specification for an compiler generator must belong to the class of context-sensitive grammars to be powerfull enough to define a context-sensitive language.
A quite famous context-sensitive language example which is not context-free is
E.g., words of this language are "abc", "aabbcc", "aaabbbccc" and so on. The number of "a", "b" and "c" characters is always equal in a word that belongs to that language.
This cannot be specified by a context-free grammar!
Let’s try to describe this language using a context-free grammar. We will use formalism similar to the BNF grammar formalism.
A: "a" A | .
B: "b" B | .
C: "c" C | .
Possible word derivation using that grammar are
S →* aaabbc
Both words are part of the language defined by our context-free grammar. However, the second word is not part of our context-sensitive language { anbncn | n ≥ 1 } — the number of "a", "b" and "c" characters are not equal! The language defined by this context free grammar has "more" words than the context-sensitive one we are looking for.
So, the the context-free grammar from above is not sufficient to describe our language. We somehow need to "count" the "a", "b" and "c" characters and compare the counts.