Skip to content

GSoC 2014 Application Aditya Shah SymPy Parsing Framework

Aditya Shah edited this page Mar 9, 2014 · 18 revisions

#Personal Details

Name: Aditya Shah

University: Birla Institute of Technology and Science-Pilani

Email and Google Code: adityashah30@gmail.com

IRC Handle: adityashah30

GitHub: adityashah30

##Bio I am a third year undergraduate pursuing B.E. in Computer Science at Birla Institute of Technology and Science-Pilani, Goa Campus. I have covered a wide range of subjects ranging from Data Structures and Algorithms, Object oriented programming, Operating systems, Computer Architecture. I am currently pursuing Compiler construction which led to my interest in this particular project.

#Programming Details

##Platform and Editor Details I use an Ubuntu 12.04 LTS box as my primary work machine. I am quite familiar with the concept of version control especially Git. I use Windows 8.1 occasionally for developmental purposes (especially to test cross platform capabilities). I generally use Sublime Text 2 as my primary editor because of the immense repository of add-ons that enhance the capabilities of the editor greatly.

##Programming Experience I am quite proficient in C, C++, Java and Python for programming. I generally use Qt and PyQt for GUI development. As a part of my course work, I have used Java for Object Oriented Programming and Data Structures and Algorithms. I have used C/C++ for Operating Systems and Computer Networks. I have mostly used Python for academic research projects (done under the guidance of the faculty here at BITS). My first project was creation of a system that facilitates data migration between 2 incompatible database systems, where I used python for data pattern matching and data extraction. Regular expressions were extensively used in the project. My second project (which I am currently pursuing) is the creation of a NLP (Natural Language Processing) enables Linux Super Shell, which uses feature extraction and vectorization of queries to generate the relevant output regarding to Linux command to be used. It allows any Linux newbie to learn Linux efficiently and quickly.

##Python and SymPy I have been using Python for over a year now and I have become quite proficient at it. I would say that the feature of Python that I miss the most in the other languages I use is the lambda statements borrowed from the functional language paradigm. Lambda statements allow very efficient code construction by locally defining non-recurring functions. I was introduced to SymPy by a friend, who too is working with Sympy. I admit that I am quite new to SymPy, but I am quite impressed by the range of functions that it can execute. I am drawn in particular to the parsing module of SymPy, having done projects on the subject myself.

###Contributions to SymPy All my past work for SymPy has been regarding improvement of the Parsing module present in SymPy, under the guidance of Aaron Meurer (@asmeurer) and Sergey Kirpichev (@skirpichev). Following of my PRs were merged-

  • Fixed a bug in mathematica parser module regarding its inability to parse functions when arithmetic expressions are passed as arguments. Issue: 1160 . PR: https://github.com/sympy/sympy/pull/2955
  • Added functionality to sympy_parser.py module to allow for conversion (“Sympification”) of standard Python lambda statements to their Lambda equivalents. Issue: 3051. PR: https://github.com/sympy/sympy/pull/2965

#The Project

##Project Abstract The project aims at standardizing the current parsing module present in SymPy. It allows the developer and the user to extend the immense computational power of SymPy by inclusion of code snippets and even entire programs (in time) written in contemporary CAS such as Mathematica or Math Spec Languages such as Latex, by developing a standard framework by which parsers for the aforementioned languages can be written efficiently. The project also aims at developing working parsers for Mathematica and Latex using the created framework. The last part of the project consists of a rudimentary natural language (English) parser for SymPy, which allows SymPy to gather functionality similar to that of exhibited by WolphramAlpha in the area of natural language query processing and producing relevant outputs.

##Why this project I am deeply impressed with the flawless code organization and implementation of SymPy, all except the parsing module. The current parsers for Mathematica and Maxima are more of a hack, as they don’t cover all the cases possible and they can parse and convert small expressions at best. So, in order to change this situation, according to me, a standard has to be enforced which will enforce completeness in the parsers and also enable rapid development of existing and new parsers.

##Qualifications I have completed a course on Theory of Computation, in which I learned all about Regular languages, Finite State Automata, Context Free Languages, Push Down Automata and Turing machines. So, I have a deep understanding of how languages are defined and how corresponding matching machines are created. I am currently pursuing a course on Compiler Construction, where I have learnt about the various parsing techniques used by the compilers and how they make use of standard algorithms to compile programs.

##Implementation Details Here I present the details of how the project will be implemented. The project comprises mainly of 3 parts.

  1. Language Spec File
  2. Spec to Grammar Converter
  3. Parser Generator Framework (PGF)

###Language Spec File

  • This file contains the details of the grammar to be processed by the PGF.
  • It doesn’t contain the grammar in its entirety. The reason behind this is that I observed that most of the Math-Spec Languages tend to have similarities in their grammar which leads to unnecessary repetition of generic rules.
  • So here we define only the rules that differ from the predefined rules which are open to inspection and modulation whenever necessary. So the conflicting rules in the predefined file can be overridden and new rules can be added to the grammar.
  • This architecture allows for a flexible approach to grammar design by implementing the concept of code reusability.

###Spec to Grammar Converter

  • Most of the PGFs use python code to create the final parser. So we need to convert the simple spec file into equivalent python grammar.
  • This module also serves to analyze the spec file and combine the user’s spec file with the predefined spec file, while taking care of the overridden portions of the grammar as defined by the user.

###Parser Generator Framework

  • This module uses a preexisting parser generator such as “modgrammar” which takes grammar in form of python code and generates the parser that recognizes the specified grammar.
  • Here, I choose modgrammar because of its ability to recognize ambiguous grammars in case such a grammar is provided. (Modgrammar is a GLR grammar parser).

###AST Processor

  • This module processed the AST generates by the Parser and convert it to relevant tokens to pass to sympify module to be converted to SymPy expression.

###Final Block Diagram

Language Spec File ==> Spec to Grammar Converter + Predefined Spec File ==> Parser Generator Framework ==> Parser

###Parser in Action

Program ==> Parser ==> AST Processor ==> Sympify ==> SymPy Code

###Prototype API Here I demonstrate the essence of the framework using a very simple example.

####Predefined Spec File

@R1
E → E + T | T
@R2
T → T * F | F
@R3
F → (E) | id

####User's Language Spec File

@Name Sample_Parser
@Output grammar1.py
@Override R3
F → (E) | alphabet | num

####Spec To Grammar Converter The program combines both the grammar to yield the final grammar

@Name Sample_Parser
@Output grammar1.py
@R1
E → E + T | T
@R2
T → T * F | F
@R3
F → (E) | alphabet | num

The program actually generates the final grammar as “grammar1.py” file to be inputted to modgrammar.

####Parser Generator Framework A PGF such a modgrammar takes in the “grammar1.py” and outputs a parser named “Sample_Parser.py” which recognizes the strings belonging to the grammar.

####Parser in Action

  • Suppose the string “(x*2) + 3” is given to the parser to convert to equivalent SymPy expression.
  • The parser generates the following AST (Abstract Syntax Tree):
					E
				E	+	T
				T		F
				F		num
						3
			(	E	)
				T
			T	*	F
			F		num
		alphabet		2
			x
  • This tree is then processed to finally generate the Sympy tokens by sympify.
(OP, (), (Alphbet, 'x'), (OP, *), (num, 2), (OP, )), (OP, +), (num, 3)
  • These tokens are finally processed to generate the final SymPy expression
(Symbol('x')*2) + 3

###Final Implementations Using the above mentioned framework, I intend to create parsers for following

  1. Mathematica
  2. MathML
  3. Latex
  4. Natural Langauge (English)

##Timeline

##Post GSoC I intend to continue working in the field of parsing with SymPy after GSoC. I am particularly drawn to the study of Natural Language Processing and Natural Language Query analysis. I intend to contribute to the Natural Language Parser that I design during the GSoC project to cover more possibilities and take care of as many contingencies as possible.

##References

Clone this wiki locally