In [1]:
from IPython.core.display import HTML
with open ("../../style.css", "r") as file:
    css = file.read()
HTML(css)

# Converting a Grammar into <span style="font-variant:small-caps;">Html</span>

You should store the grammar in the file `Grammar.g4`.  This grammar should describe the lexical structure of the grammar for the language 
`C` that is contained in the file 
<a href="https://github.com/karlstroetmann/Formal-Languages/blob/master/Exercises/Grammar2HTML-Antlr/c-grammar.g"><tt>c-grammar.g</tt></a>.

Your grammar <b style="color:red">must not</b> use the string `rule` as a variable name.  The reason is that `rule` is a variable that is already used in the parser generated by 
<span style="font-variant:small-caps;">Antlr</span>.

You grammar should generate an abstract syntax tree that conforms to the following type specification:
```
Grammar: List<Rule>
Rule:    Pair<String, List<Body>>
Body:    List<Item>
Item:    Pair<'var', String> + Pair<'token', String> + Pair<'literal', String>
```

In [3]:
!cat Grammar.g4

grammar Grammar;

start returns [result]
    : g=grammatik {$result = $g.result}
    ;



In [None]:
!type Grammar.g4

The file `c-grammar.g` contains a context-free grammar for the language `C`.

In [2]:
!cat c-grammar.g

primary_expression
	: IDENTIFIER
	| CONSTANT
	| STRING_LITERAL
	| '(' expression ')'
	;

postfix_expression
	: primary_expression
	| postfix_expression '[' expression ']'
	| postfix_expression '(' ')'
	| postfix_expression '(' argument_expression_list ')'
	| postfix_expression '.' IDENTIFIER
	| postfix_expression '->' IDENTIFIER
	| postfix_expression '++'
	| postfix_expression '--'
	;

argument_expression_list
	: assignment_expression
	| argument_expression_list ',' assignment_expression
	;

unary_expression
	: postfix_expression
	| '++' unary_expression
	| '--' unary_expression
	| unary_operator cast_expression
	| 'sizeof' unary_expression
	| 'sizeof' '(' type_name ')'
	;

unary_operator
	: '&'
	| '*'
	| '+'
	| '-'
	| '~'
	| '!'
	;

cast_expression
	: unary_expression
	| '(' type_name ')' cast_expression
	;

multiplicative_expression
	: cast_expression
	| multiplicative_expression '*' cast_expression
	| multiplicative_expression '/' cas

In [None]:
!type c-grammar.g

Our goal is to convert this grammar into an <span style="font-variant:small-caps;">Html</span> <a href="c-grammar.html">file</a>.

We start by generating both scanner and parser.  

In [None]:
!antlr4 -Dlanguage=Python3 Grammar.g4

In [None]:
from GrammarLexer  import GrammarLexer
from GrammarParser import GrammarParser
import antlr4

The function `grammar_2_string` takes a list of grammar rules as its input and renders these rules as an <span style="font-variant:small-caps;">Html</span> file. 

In [None]:
def grammar_2_string(grammar):
        result  = ''
        result += '<html>\n'
        result += '<head>\n'
        result += '<title>Grammar</title>\n'
        result += '</head>\n'
        result += '<body>\n'
        result += '<table>\n'
        for rule in grammar:
            result += rule_2_string(rule)
        result += '</table>\n'
        result += '</body>\n'
        result += '</html>\n'            
        return result

The function `rule_2_string` takes a grammar rule $r$ as its input and transforms this rule into an <span style="font-variant:small-caps;">Html</span> 
string.  Here the grammar rule $r$ has the form
$$ r = (V, L) $$
where $V$ is the name of the variable defined by $r$ and $L$ is a list of <em style="color:blue">grammar rule bodies</em>.  A single grammar rule
body is a list of <em style="color:blue">grammar items</em>.  A grammar item is either a non-terminal, a token or a literal.

In [None]:
def rule_2_string(rule):
    head, body = rule
    result  = ''
    result += '<tr>\n'
    result += '<td style="text-align:right"><a name="' + head + '"><em>' + head + '<em></a></td>\n'
    result += '<td><code>:</code></td>\n'
    result += '<td>' +  body_2_string(body[0]) + '</td>'
    result += '</tr>\n'
    for i in range(1, len(body)):
        result += '<tr><td></td><td><code>|</code></td><td>'
        result += body_2_string(body[i])
        result += '</td></tr>\n'
    result += '<tr><td></td><td><code>;</code></td><tr>\n\n'
    return result

The function `body_2_string` takes a list of grammar items as its inputs and turns them into an <span style="font-variant:small-caps;">Html</span> string.

In [None]:
def body_2_string(body):
    result = ''
    if len(body) > 0:
        for item in body:
            result += item_2_string(item) + ' '
    else:
        result += '<code>/* empty */</code>'
    return result

The function `item_2_string` takes a grammar item as its inputs and turns the item into an <span style="font-variant:small-caps;">Html</span> string.
An item represents either a non-terminal or a terminal.  If it represents a non-terminal it has the form
$$(\texttt{'var'}, \textrm{name}) $$
where $\textrm{name}$ is the name of the variable. Otherwise it has the form
$$(\textrm{kind}, \textrm{name}), $$
where $\textrm{kind}$ is either `token` or `literal`.

In [None]:
def item_2_string(item):
    kind, contend = item
    if kind == 'var':
        return '<a href="#' + contend + '"><em>' + contend + '</em></a>'
    else:
        return '<code>' + contend + '</code>'

In [None]:
def main():
    input_stream  = antlr4.FileStream('c-grammar.g')
    lexer         = GrammarLexer(input_stream)
    token_stream  = antlr4.CommonTokenStream(lexer)
    parser        = GrammarParser(token_stream)
    grammar       = parser.start()
    result        = grammar_2_string(grammar.result)
    file = open('c-grammar.html', 'w')
    file.write(result)

In [None]:
main()

In [None]:
!open c-grammar.html

In [None]:
!explorer c-grammar.html

The command below cleans the directory.  If you are running windows, you have to replace `rm`with `del`.

In [None]:
!rm *.py *.tokens *.interp