Join GitHub today
GitHub is home to over 28 million developers working together to host and review code, manage projects, and build software together.Sign up
GSoC 2013 Application Angus Griffith: Parsing
Clone this wiki locally
Name: Angus Griffith
Enrolment: 4th year Mathematics at the Australian National University.
IRC Nick: sn6uv
Email is probably the best way to contact me since I'm not always on IRC. Living in Australia, I'll also probably be sleeping when the rest of the world awake.
Short Bio / Background overview
The majority of my university courses are in mathematics with focuses on real/complex analysis and computational maths, I've also done some courses and research projects in theoretical physics and computational chemistry. I'm currently the main developer and maintainer of Mathics (an open source Mathematica clone written in Python),
Me the Person
- I've been using Linux as my primary desktop OS for 5 years now. Arch Linux is my distro of choice.
- Editor: vim, because I find it much faster to code with and it's preinstalled on most of the platforms I use. In particular I like the syntax highlighting, and macros. my vimrc.
- I've contributed almost 30K lines of (mostly Python) code (see here) to Mathics including the Mathics parser, PyPy compatibility, and various other improvements.
- One feature of Python I really like is list comprehension.
- Cool example: Sympy's integrate function allows Mathics to perform an integral Mathematica can't live demo.
- I've used svn, cvs and git. I prefer git over the others.
Me and the Project
- I hope to drastically improve Sympy's parsing module. In particular I want to implement a fast parser for Mathematica along with some subset (up for debate) of C, Latex, Fortran, Matlab and Python, MathML. Time permitting I'd also like to make some improvement to the
sympy_parsersuch that all code generated by Sympy can be parsed.
- Allowing Sympy to parse the Mathematica Language (Wolfram are yet to come up with a good name for it) in particular would be very useful for projects such as Sympy Gamma.
- I recently wrote a LALR(1) Mathematica parser for Mathics (available here) using python lex/yacc (ply) because the previous Earley parser (using spark) was too slow. This parser will be deployed with the next version of Mathics. I've contributed to the
TeXFormfunction in Mathics (converts Mathics expressions to LaTeX code). In particular I wrote the code that converts arbitrary 3D graphics to LaTeX.
- I'll be moving house in mid July which may impact my progress for that week only. I will be returning to university part time from the 27th of July. Through prior experience (working 20+ hrs/week on Mathics while studying full time) I am confident that I will be able to maintain the 40 hours a week required of me throughout the entire project's duration.
- Rough Timeline: I've divided the project up into 4 sections I expect to make a pull request at the end of each.
- Week 1 : Refactor the parser framework so that it is more usable (plan the API in conjunction with the Sympy|Gamma developers and other stakeholders).
- Weeks 2-5 : Adapt Mathics' Mathematica parser for Sympy. (See Implementation details) This will take longer depending on how much of the Mathematica language we intend to support.
- Weeks 6-8 : Implement a LaTeX parser. It might be possible to reuse mathtex for some of this (in particular the list of binary_operators etc.)
- Weeks 9-12 : Implement a MathML parser (this will essentially be the same process as the LaTeX parser).
This timeline will probably change, in particular I'm open for discussion on what languages would be most beneficial for Sympy to parse. The LaTeX/MathML parsers may be replaced by a single more complex parser e.g. a Fortran parser.
- The first step will be to convert all the Mathics expressions to their corresponding Sympy counterparts in the Mathics parser. This will involve compiling a map of function conversions (again some Mathics code can probably be recycled). For example
LaguerreL[n, a, x]becomes
laguerre_poly(n, x, a).
- Some language forms, in particular Mathematica's patterns here's an introduction will be potentially difficult to convert to Sympy.
- Many of the Mathics tests can be adapted when writing tests for Sympy's Mathematica parser.
- Implementing the full Mathematica language will require many additions to Sympy. The parser should fail gracefully when an expression is given that has no Sympy counterpart (raising
NotImplementedErroris a possibility).
- Licensing (non)issues: Mathics is released under the GPL license. As the author of the Mathics parser I'm more than happy to relicence it for use by Sympy.
- If I'm writing a parser from scratch I'll probably use
plysince it's fast and it's what I'm most familiar with. My preference would be to write a MathML parser since that will probably be of most use for the web applications using Sympy (SympyLive and Sympy|Gamma).
- I'll aim to keep the dependencies to a minimum (hopefully just
ply). If needed this could be an optional dependency since the majority of users are probably not interested in parsing.
What languages to parse
- Mathematica - This will be the easiest since I'm already familiar with the Mathics parser. (see Matlab/Maxima).
- Matlab/Maxima - Of most use to Sympy|Gamma and anyone wanting to run simple Matlab/Maxima scripts with sympy. Potentially time consuming because these languages are quite large and there are many functions to convert.
- LaTeX - very useful for Sympy|Gamma. Parsing a subset will probably not be too difficult.
- C/Fortran - possibly difficult. I'm not fully convinced this is a good idea, we're probably better of just using
- MathML - probably not too difficult. Potentially very useful for Sympy|Gamma.
- Python - should be easy since we only need to convert math functions and numbers.
#1582, #1512, #1555. These are old so I made #2070 which hasn't been merged yet.