In [None]:
%load_ext nb_mypy

# Parsing Regular Expressions

The grammar for regular expressions is stored in the file `RegularExpressions.g4`. 

In [None]:
!cat -n RegularExpressions.g4

We start by generating both the scanner and the parser.  

In [None]:
!antlr4 -Dlanguage=Python3 RegularExpressions.g4

The files `CalculatorLexer.py` and `CalculatorParser.py` contain the generated scanner and parser, respectively.  We have to import these files.  Furthermore, the runtime of 
<span style="font-variant:small-caps;">Antlr</span>
needs to be imported.

In [None]:
from RegularExpressionsLexer  import RegularExpressionsLexer
from RegularExpressionsParser import RegularExpressionsParser
import antlr4

In [None]:
from typing import TypeVar
NestedTuple = TypeVar('NestedTuple')
NestedTuple = str | tuple[NestedTuple, ...]

The function `ast` takes a string `s` as input.  This string is then parsed and the resulting abstract syntax tree is printed. 

In [None]:
def ast(s: str) -> NestedTuple:
    input_stream  = antlr4.InputStream(s)
    lexer         = RegularExpressionsLexer(input_stream)
    token_stream  = antlr4.CommonTokenStream(lexer)
    parser        = RegularExpressionsParser(token_stream)
    return parser.regExp().result

In [None]:
%nb_mypy Off

In [None]:
ast('a+b*⋅(a+b⋅b*)')

In [None]:
!rm *.py *.tokens *.interp
!rm -r __pycache__/

In [None]:
!ls -l