In [None]:
%%HTML
<style>
.container { width: 100% }
</style>

This example has been extracted from the official documentation of Ply.

### A tokenizer for numbers and the arithmetical operators `+`, `-`, `*`, `/`.

The module `ply.lex` contains the code that is necessary to create a scanner.

In [None]:
import ply.lex as lex

We start with a definition of the <em style="color:blue">token names</em>.  Note that all token names have to start with 
a capital letter.

In [None]:
tokens = [
   'NUMBER',
   'PLUS',
   'MINUS',
   'TIMES',
   'DIVIDE',
   'LPAREN',
   'RPAREN'
]

Next, we define regular expressions that define the tokens that are to be recognized.
Note that some operators have to be prefixed with a backslash since these operators are
also used as operators for regular expressions.

In [None]:
t_PLUS    = r'\+'
t_MINUS   = r'-'
t_TIMES   = r'\*'
t_DIVIDE  = r'/'
t_LPAREN  = r'\('
t_RPAREN  = r'\)'

If we need to transform a token, we can define the token via a function.  In that case, the first line of the function 
has to be a string that is a regular expression.  This regular expression then defines the token.  After that,
we can add code to transform the token.  The string that makes up the token is stored in `t.value`.  Below, this string
is transformed into an integer.

In [None]:
def t_NUMBER(t):
    r'0|[1-9][0-9]*'
    t.value = int(t.value)
    return t

The rule below is used to keep track of line numbers.

In [None]:
def t_newline(t):
    r'\n+'
    t.lexer.lineno += len(t.value)

The keyword `t_ignore` specifies those characters that should be discarded.
In this case, spaces and tabs are ignored.

In [None]:
t_ignore  = ' \t'

All characters not recognized by any of the defined tokens are handled by the function `t_error`.
the function `t.lexer.skip(1)` skips the character that has not been recognized.

In [None]:
def t_error(t):
    print("Illegal character '%s'" % t.value[0])
    t.lexer.skip(1)

Below the function `lex.lex()` creates the lexer specified above.  Since this code is expected to be part of some python file 
but really isn't since it is in a Jupyter notebook we have to set the variable `__file__` manually to fool the system
into believing that the code given above is located in a file called `hugo.py`.

In [None]:
__file__ = 'hugo'
lexer = lex.lex()

Lets test it with the following string:

In [None]:
data = '3 + 4 * 10 + 007 + (-20) * 2'

Let us feed the lexier with the string `data`.

In [None]:
lexer.input(data)

Put the lexer to work.

In [None]:
for tok in lexer:
    print(tok)