# Pyparsing Tutorial to capture ML-SQL language

## Authors

Written by: Neeraj Asthana (under Professor Robert Brunner)

University of Illinois at Urbana-Champaign

Summer 2016

## Acknowledgements

Followed Tutorial at: http://www.onlamp.com/lpt/a/6435

## Description

This notebook is meant to experiment with pyparsing in order to abstract the process for use with the ML-SQL language. The goal is to be able to understand ML-SQL syntax and port commands to actionable directives in Python.

___

### Libraries

In [7]:
from pyparsing import Word, Literal, alphas, Optional, OneOrMore, Group

___
### Phone number parser

Mentioned in the tutorial

Grammer: 
- number      :: '0'.. '9'*
- phoneNumber :: [ '(' number ')' ] number '-' number

In [2]:
#Definitions of literals
dash   = Literal( "-" )
lparen = Literal( "(" )
rparen = Literal( ")" )

#Variable lengths and patterns of number => Word token
digits = "0123456789"
number = Word( digits )

#Define phone number with And (+'s)
#Literals can also be defined with direct strings
phoneNumber = lparen + number + rparen + number + dash + number

#Create a results name for easy access
areacode = number.setResultsName("areacode")

#Make the area code optional
phoneNumber = Optional( "(" + areacode + ")" ) + number + "-" + number

#List of phone numbers
phoneNumberList = OneOrMore( phoneNumber )

In [4]:
#Using the grammer
inputString = "(978) 844-0961"
data = phoneNumber.parseString( inputString )

data.areacode

'978'

In [5]:
#Bad input
inputStringBad = "978) 844-0961"

data2 = phoneNumber.parseString( inputStringBad )

ParseException: Expected "-" (at char 3), (line:1, col:4)

___

### Chemical Formula parser

Mentioned in the tutorial

Grammer
- integer       :: '0'..'9'+
- cap           :: 'A'..'Z'
- lower         :: 'a'..'z'
- elementSymbol :: cap lower*
- elementRef    :: elementSymbol [ integer ]
- formula       :: elementRef+

In [13]:
#Define Grammer
caps       = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
lowers     = caps.lower()
digits     = "0123456789"

element    = Word( caps, lowers )

#Groups elements so that element and numbers appear together
elementRef = Group( element + Optional( Word( digits ), default="1" ) )
formula    = OneOrMore( elementRef )

testString = "CO2"
elements   = formula.parseString( testString )
print(elements)

[['C', '1'], ['O', '2']]


In [22]:
tests = [ "H2O", "C6H5OH", "NaCl" ]
for t in tests:
    try:
        results = formula.parseString( t )
        print (t,"->", results)
    except ParseException as pe:
        print (pe)
    else:
        wt = sum( [atomicWeight[elem]*int(qty) for elem,qty in results] )
        print ("(%.3f)" % wt)

H2O -> [['H', '2'], ['O', '1']]
(18.015)
C6H5OH -> [['C', '6'], ['H', '5'], ['O', '1'], ['H', '1']]
(94.111)
NaCl -> [['Na', '1'], ['Cl', '1']]
(58.442)
