### here is a tokenizer for boolean logic

[a tokenizer example python re doc page](https://docs.python.org/3/library/re.html#writing-a-tokenizer)

In [6]:
import re

AND = r'(?P<AND>[Aa][Nn][Dd])'
OR = r'(?P<OR>[Oo][Rr])'
NOT = r'(?P<NOT>[Nn][Oo][Tt])'
VAR = r'(?P<VAR>[a-zA-Z_][a-zA-Z_0-9]*)'
LP = r'(?P<LP>\()'
RP = r'(?P<RP>\))'
WS = r'(?P<WS>\s+)'

In [7]:
logic1 = "a and b or (not c)"

In [8]:
pat = re.compile('|'.join([AND,OR,NOT,VAR,LP,RP,WS]))

In [9]:
sc = pat.scanner(logic1)
a = sc.match()
while a:
    print(a.lastgroup + " : " + a.group())
    a=sc.match()

VAR : a


### if without handling the whitespace

> If any nonmatching text is found,
scanning simply stops. This is why it was necessary to specify the whitespace (WS) token
in the example.

In [10]:
pat_without_ws = re.compile('|'.join([AND,OR,NOT,VAR,LP,RP]))
sc2 = pat_without_ws.scanner(logic1)
a = sc2.match()
while a:
    print(a.lastgroup + " : " + a.group())
    a=sc2.match()

VAR : a


> The order of tokens in the master regular expression also matters.

In the example below, all logic op "and", "or" and "not" are recognized as variable(VAR)

In [12]:
pat_order = re.compile('|'.join([VAR,AND,OR,NOT,LP,RP,WS]))
sc3 = pat_order.scanner(logic1)
a = sc3.match()
while a:
    print(a.lastgroup + " : " + a.group())
    a=sc3.match()

VAR : a
WS :  
VAR : and
WS :  
VAR : b
WS :  
VAR : or
WS :  
LP : (
VAR : not
WS :  
VAR : c
RP : )
