# Episode 18: Parsing with PyParsing

Battered by Imperial Algebraic Formalisms, we now seek shelter in the caves of declarative code using the PyParsing library (https://github.com/pyparsing/pyparsing/).

Having already struggled with parsing by hand, and the soon to be infamous SLY tooling, perhaps our rag-tag band will find some relief in recognizing the calculator language using expressions instead of imperative code or grammar definitions!

In [1]:
!pip install pyparsing
# Once more `conda install pyparsing` would also work...
import nbimport
import Episode17



## Step 1: Lexical Analysis

In [2]:
import pyparsing
from pyparsing import Word, LineEnd, alphas, nums
pyparsing.ParserElement.setDefaultWhitespaceChars(' \t')

In [3]:
ID = pyparsing.Combine(pyparsing.Char(alphas) + pyparsing.ZeroOrMore(pyparsing.Char(pyparsing.alphanums)))
ID.parseString('ABC123 1234')

(['ABC123'], {})

In [4]:
NUM = Word(nums)
NUM.parseString('1234')

(['1234'], {})

In [5]:
atom = ID ^ NUM
atom.parseString('1234'), atom.parseString('abC33')

((['1234'], {}), (['abC33'], {}))

In [6]:
op = pyparsing.Char('+-')
op.parseString('+-+-'), op.parseString('- 42')

((['+'], {}), (['-'], {}))

In [8]:
token_test = (LineEnd() ^ atom ^ op ^ '=')[...]
test_src = '1 + 2 - 3\nA = 42\n'
token_test.parseString(test_src)

(['1', '+', '2', '-', '3', '\n', 'A', '=', '42', '\n'], {})

In [9]:
token_test2 = (LineEnd() | atom | op | '=')[...]
token_test2.parseString(test_src)

(['1', '+', '2', '-', '3', '\n', 'A', '=', '42', '\n'], {})

## Step 2: Syntactic Analysis

In [10]:
expression = pyparsing.Forward()
expression <<= atom + pyparsing.Optional(op + expression)
assign = ID + '=' + expression
statement = (assign | expression) + LineEnd()
statements = statement[...]

## Step 3: There is NO Step 3...

Or is there?

In [11]:
statements.parseString(test_src)

(['1', '+', '2', '-', '3', '\n', 'A', '=', '42', '\n'], {})

In [12]:
statements.parseString(test_src, parseAll=True)

(['1', '+', '2', '-', '3', '\n', 'A', '=', '42', '\n'], {})

In [14]:
statement2 = pyparsing.Group((assign ^ expression) + LineEnd())
statements2 = statement2[...]
statements2.setDebug()
statements2.parseString(test_src, parseAll=True)

Match [Group:({{{Combine:({W:(ABCD...) [W:(ABCD...)]...}) "=" Forward: {{Combine:({W:(ABCD...) [W:(ABCD...)]...}) ^ W:(0123...)} [{W:(+-) : ...}]}} ^ Forward: {{Combine:({W:(ABCD...) [W:(ABCD...)]...}) ^ W:(0123...)} [{W:(+-) : ...}]}} LineEnd})]... at loc 0(1,1)
Matched [Group:({{{Combine:({W:(ABCD...) [W:(ABCD...)]...}) "=" Forward: {{Combine:({W:(ABCD...) [W:(ABCD...)]...}) ^ W:(0123...)} [{W:(+-) : ...}]}} ^ Forward: {{Combine:({W:(ABCD...) [W:(ABCD...)]...}) ^ W:(0123...)} [{W:(+-) : ...}]}} LineEnd})]... -> [['1', '+', '2', '-', '3', '\n'], ['A', '=', '42', '\n']]


([(['1', '+', '2', '-', '3', '\n'], {}), (['A', '=', '42', '\n'], {})], {})

### Cruft...

...from an abortive attempt to use `setDefaultWhitespaceChars()` after the fact.

In [None]:
statements.setDebug()
statements.parseString(test_src, parseAll=True)

In [None]:
statements.setDefaultWhitespaceChars = pyparsing.ParserElement.setDefaultWhitespaceChars
statements.setDefaultWhitespaceChars(' \t')
statements.parseString(test_src, parseAll=True)

In [None]:
pyparsing.ParserElement.setDefaultWhitespaceChars(' \t')
ID = Word(alphas)
atom = ID | Word(nums)
expression = pyparsing.Forward()
expression <<= atom + pyparsing.Optional(op + expression)
assign = ID + '=' + expression
statement = (assign | expression) + LineEnd()
statements = statement[...]

In [None]:
statements.parseString(test_src, parseAll=True)