In [1]:
%%HTML
<style>
.container { width: 100% }
</style>

### A tokenizer for numbers and the arithmetical operators `+`, `-`, `*`, `/`.

In [2]:
import ply.lex as lex

We start with a definition of the *token names*.  

In [3]:
tokens = (
   'NUMBER',
   'PLUS',
   'MINUS',
   'TIMES',
   'DIVIDE',
   'LPAREN',
   'RPAREN',
)

Next, we define regular expressions that define the tokens that are to be recognized.
Note that some operators have to be prefixed with a backslash since these operators are
also used as operators for regular expressions.

In [4]:
t_PLUS    = r'\+'
t_MINUS   = r'-'
t_TIMES   = r'\*'
t_DIVIDE  = r'/'
t_LPAREN  = r'\('
t_RPAREN  = r'\)'

A regular expression rule with some action code

In [5]:
def t_NUMBER(t):
    r'0|[1-9][0-9]*'
    t.value = int(t.value)
    return t

Define a rule so we can track line numbers

In [6]:
def t_newline(t):
    r'\n+'
    t.lexer.lineno += len(t.value)

A string containing ignored characters (spaces and tabs)

In [7]:
t_ignore  = ' \t'

Error handling rule

In [8]:
def t_error(t):
    print("Illegal character '%s'" % t.value[0])
    t.lexer.skip(1)

Build the lexer.  Since this code is expected to be part of some python file but really isn't, 
we have to set the variable `__file__` manually to fool the system into believing that the code 
given above is located in a file called `hugo.py`.

In [9]:
__file__ = 'hugo'
lexer = lex.lex()

Lets test it with the following string:

In [10]:
data = \
'''
3 + 4 * 10 + 007
  + (-20)*2 abc
'''

Let us feed the lexier with the string `data`.

In [11]:
lexer.input(data)

Put the lexer to work.

In [12]:
for tok in lexer:
    print(tok)

LexToken(NUMBER,3,2,1)
Illegal character ' '
LexToken(PLUS,'+',2,3)
Illegal character ' '
LexToken(NUMBER,4,2,5)
Illegal character ' '
LexToken(TIMES,'*',2,7)
Illegal character ' '
LexToken(NUMBER,10,2,9)
Illegal character ' '
LexToken(PLUS,'+',2,12)
Illegal character ' '
LexToken(NUMBER,0,2,14)
LexToken(NUMBER,0,2,15)
LexToken(NUMBER,7,2,16)
Illegal character ' '
Illegal character ' '
LexToken(PLUS,'+',3,20)
Illegal character ' '
LexToken(LPAREN,'(',3,22)
LexToken(MINUS,'-',3,23)
LexToken(NUMBER,20,3,24)
LexToken(RPAREN,')',3,26)
LexToken(TIMES,'*',3,27)
LexToken(NUMBER,2,3,28)
Illegal character ' '
Illegal character 'a'
Illegal character 'b'
Illegal character 'c'
