# Episode 17: Parsing with SLY

In our last heroic effort, we used an approach to top-down parsing called recursive descent.  Today, we'll be using a parser generator called SLY (https://github.com/dabeaz/sly) to create a bottom-up parsing state machine.

SLY handles a lot of the parsing machinery that we had to work with manually in episode 16.  To use SLY, we'll have to refactor our calculator grammar a little.  First, we have to remove all our regular-expression shorthands (such as `?` for an optional clause, or `()` for a sub-clause).  Second, we have to ensure our recursive definitions avoid right recursion.  For recursive-descent we had to avoid left recursive definitions.  SLY requires us to do the exact opposite, and use left recursion instead of right.  For example:

```
statements : statements statement
           | EMPTY
statement  : assign '\n'
           | expression '\n'
assign     : ID '=' expression
expression : expression op atom
           | atom
atom       : ID | NUM
op         : '+' | '-'
```

In [1]:
!pip install sly
# `conda install sly` also works thanks to Conda-Forge.
import nbimport
import Episode16
from sly import Lexer, Parser



## Step 1: Lexical Analysis

Writing a lexer in SLY is easier than doing it by hand...

In [2]:
class CalcLexer(Lexer):
    tokens = { NL, ID, EQ, NUM, PLUS, MINUS }
    ignore = ' \t'
    
    NL = '\n'
    ID = r'[a-zA-Z_][a-zA-Z0-9_]*'
    EQ = r'='
    NUM = r'\d+'
    PLUS = r'\+'
    MINUS = r'-'
    
    def NL(self, tok):
        self.lineno += 1
        return tok
    
lexer = CalcLexer()
list(lexer.tokenize('1 + 2 - 3\nA = 5\n'))

[Token(type='NUM', value='1', lineno=1, index=0),
 Token(type='PLUS', value='+', lineno=1, index=2),
 Token(type='NUM', value='2', lineno=1, index=4),
 Token(type='MINUS', value='-', lineno=1, index=6),
 Token(type='NUM', value='3', lineno=1, index=8),
 Token(type='NL', value='\n', lineno=1, index=9),
 Token(type='ID', value='A', lineno=2, index=10),
 Token(type='EQ', value='=', lineno=2, index=12),
 Token(type='NUM', value='5', lineno=2, index=14),
 Token(type='NL', value='\n', lineno=2, index=15)]

## Step 2: Syntactic Analysis

In SLY we use decorators to embed the grammar definition into the parser.  The job of the parsing class is simplified by no longer having to worry about interactions with the lexer or other non-terminal symbols.  What we have to do instead is tell SLY what to do with the child information it has assembled.  We write a method for each rule in the grammar definition as shown...

In [7]:
Tree = Episode16.Tree

class CalcParser(Parser):
    tokens = CalcLexer.tokens

    @_('statements statement')
    def statements(self, children):
        '''Recognize rule `statements : statements statement`...
        '''
        return Tree('statements', [children.statements, children.statement])
    
    @_('') # EMPTY!
    def statements(self, children):
        return Tree('statements', [])

    @_('assign NL')
    def statement(self, children):
        return Tree('statement', [children.assign])
    
    @_('expression NL')
    def statement(self, children):
        return Tree('statement', [children.expression])

    @_('ID EQ expression')
    def assign(self, children):
        return Tree('assign', [Tree(children.ID, []), Tree(children.EQ, []), children.expression])

    @_('expression op atom')
    def expression(self, children):
        return Tree('expression', [children.expression, children.op, children.atom])

    @_('atom')
    def expression(self, children):
        return Tree('expression', [children.atom])

    @_('PLUS')
    def op(self, children):
        return Tree('op', [Tree(children.PLUS, [])])

    @_('MINUS')
    def op(self, children):
        return Tree('op', [Tree(children.MINUS, [])])

    @_('ID')
    def atom(self, children):
        return Tree('atom', [Tree(children.ID, [])])

    @_('NUM')
    def atom(self, children):
        return Tree('atom', [Tree(children.NUM, [])])


## Step 3: Tie Everything Together

In [9]:
class CalcFrontend:
    def __init__(self):
        self.lexer = CalcLexer()
        self.parser = CalcParser()

    def __call__(self, source):
        return self.parser.parse(self.lexer.tokenize(source))


calc_frontend = CalcFrontend()
ep17_test_result = calc_frontend('1 + 2 - 3\nA = 5\n')

In [10]:
Episode16.tree_to_tuple(ep17_test_result)

('statements',
 [('statements',
   [('statements', []),
    ('statement',
     [('expression',
       [('expression',
         [('expression', [('atom', [('1', [])])]),
          ('op', [('+', [])]),
          ('atom', [('2', [])])]),
        ('op', [('-', [])]),
        ('atom', [('3', [])])])])]),
  ('statement',
   [('assign',
     [('A', []), ('=', []), ('expression', [('atom', [('5', [])])])])])])

In [12]:
Episode16.tree_to_tuple(Episode16.quick_test_result)

('statements',
 [('statement',
   [('expression',
     [('atom', [((4, '1'), [])]),
      ('op', [((5, '+'), [])]),
      ('expression',
       [('atom', [((4, '3'), [])]),
        ('op', [((6, '-'), [])]),
        ('atom', [((4, '5'), [])])])]),
    ((1, '\n'), [])]),
  ((0, ''), [])])

In [16]:
class SLYTestParser(Parser):
    tokens = CalcLexer.tokens
    
    @_('spine ID')
    def spine(self, children):
        print(children)
        return children, children.spine, children.ID
    
    @_('')
    def spine(self, children):
        return children

lexer = CalcLexer()
parser = SLYTestParser()
result = parser.parse(lexer.tokenize('A b C D'))
result

<sly.yacc.YaccProduction object at 0x10c967c00>
<sly.yacc.YaccProduction object at 0x10c967c00>
<sly.yacc.YaccProduction object at 0x10c967c00>
<sly.yacc.YaccProduction object at 0x10c967c00>




(<sly.yacc.YaccProduction at 0x10c967c00>,
 (<sly.yacc.YaccProduction at 0x10c967c00>,
  (<sly.yacc.YaccProduction at 0x10c967c00>,
   (<sly.yacc.YaccProduction at 0x10c967c00>, ('spine',), 'A'),
   'b'),
  'C'),
 'D')

In [21]:
list(result[0]), result[0].spine

([(<sly.yacc.YaccProduction at 0x10c967c00>,
   (<sly.yacc.YaccProduction at 0x10c967c00>,
    (<sly.yacc.YaccProduction at 0x10c967c00>, ('spine',), 'A'),
    'b'),
   'C'),
  'D'],
 (<sly.yacc.YaccProduction at 0x10c967c00>,
  (<sly.yacc.YaccProduction at 0x10c967c00>,
   (<sly.yacc.YaccProduction at 0x10c967c00>, ('spine',), 'A'),
   'b'),
  'C'))