GSoC 2013 Application Angus Griffith: Parsing

Sean Vig edited this page Jul 18, 2013 · 1 revision
Clone this wiki locally

Name: Angus Griffith

Enrolment: 4th year Mathematics at the Australian National University.

Contact details:

Email: 16sn6uv@gmail.com

IRC Nick: sn6uv

Github: sn6uv

Email is probably the best way to contact me since I'm not always on IRC. Living in Australia, I'll also probably be sleeping when the rest of the world awake.

Short Bio / Background overview

The majority of my university courses are in mathematics with focuses on real/complex analysis and computational maths, I've also done some courses and research projects in theoretical physics and computational chemistry. I'm currently the main developer and maintainer of Mathics (an open source Mathematica clone written in Python),

Me the Person

  • I've been using Linux as my primary desktop OS for 5 years now. Arch Linux is my distro of choice.
  • Editor: vim, because I find it much faster to code with and it's preinstalled on most of the platforms I use. In particular I like the syntax highlighting, and macros. my vimrc.
  • My programming knowledge is mostly self-taught after and I'm familiar with C/C++, Javascript, Fortran90/95, Mathematica, but Python is by far my favourite language. One project I'm proud of is the Plotting in Mathics live demo. In particular the adaptive sampling algorithm and the interactive 3D plot.
  • I've contributed almost 30K lines of (mostly Python) code (see here) to Mathics including the Mathics parser, PyPy compatibility, and various other improvements.
  • One feature of Python I really like is list comprehension.
  • Cool example: Sympy's integrate function allows Mathics to perform an integral Mathematica can't live demo.
  • I've used svn, cvs and git. I prefer git over the others.

Me and the Project

  • I hope to drastically improve Sympy's parsing module. In particular I want to implement a fast parser for Mathematica along with some subset (up for debate) of C, Latex, Fortran, Matlab and Python, MathML. Time permitting I'd also like to make some improvement to the sympy_parser such that all code generated by Sympy can be parsed.
  • Allowing Sympy to parse the Mathematica Language (Wolfram are yet to come up with a good name for it) in particular would be very useful for projects such as Sympy Gamma.
  • I recently wrote a LALR(1) Mathematica parser for Mathics (available here) using python lex/yacc (ply) because the previous Earley parser (using spark) was too slow. This parser will be deployed with the next version of Mathics. I've contributed to the TeXForm function in Mathics (converts Mathics expressions to LaTeX code). In particular I wrote the code that converts arbitrary 3D graphics to LaTeX.
  • I'll be moving house in mid July which may impact my progress for that week only. I will be returning to university part time from the 27th of July. Through prior experience (working 20+ hrs/week on Mathics while studying full time) I am confident that I will be able to maintain the 40 hours a week required of me throughout the entire project's duration.
  • Rough Timeline: I've divided the project up into 4 sections I expect to make a pull request at the end of each.
    • Week 1 : Refactor the parser framework so that it is more usable (plan the API in conjunction with the Sympy|Gamma developers and other stakeholders).
    • Weeks 2-5 : Adapt Mathics' Mathematica parser for Sympy. (See Implementation details) This will take longer depending on how much of the Mathematica language we intend to support.
    • Weeks 6-8 : Implement a LaTeX parser. It might be possible to reuse mathtex for some of this (in particular the list of binary_operators etc.)
    • Weeks 9-12 : Implement a MathML parser (this will essentially be the same process as the LaTeX parser).

This timeline will probably change, in particular I'm open for discussion on what languages would be most beneficial for Sympy to parse. The LaTeX/MathML parsers may be replaced by a single more complex parser e.g. a Fortran parser.

Implementation details

Mathematica parser

  • The first step will be to convert all the Mathics expressions to their corresponding Sympy counterparts in the Mathics parser. This will involve compiling a map of function conversions (again some Mathics code can probably be recycled). For example LaguerreL[n, a, x] becomes laguerre_poly(n, x, a).
  • Some language forms, in particular Mathematica's patterns here's an introduction will be potentially difficult to convert to Sympy.
  • Many of the Mathics tests can be adapted when writing tests for Sympy's Mathematica parser.
  • Implementing the full Mathematica language will require many additions to Sympy. The parser should fail gracefully when an expression is given that has no Sympy counterpart (raising NotImplementedError is a possibility).
  • Licensing (non)issues: Mathics is released under the GPL license. As the author of the Mathics parser I'm more than happy to relicence it for use by Sympy.

Other parsers

  • If I'm writing a parser from scratch I'll probably use ply since it's fast and it's what I'm most familiar with. My preference would be to write a MathML parser since that will probably be of most use for the web applications using Sympy (SympyLive and Sympy|Gamma).
  • I'll aim to keep the dependencies to a minimum (hopefully just ply). If needed this could be an optional dependency since the majority of users are probably not interested in parsing.

What languages to parse

  • Mathematica - This will be the easiest since I'm already familiar with the Mathics parser. (see Matlab/Maxima).
  • Matlab/Maxima - Of most use to Sympy|Gamma and anyone wanting to run simple Matlab/Maxima scripts with sympy. Potentially time consuming because these languages are quite large and there are many functions to convert.
  • LaTeX - very useful for Sympy|Gamma. Parsing a subset will probably not be too difficult.
  • C/Fortran - possibly difficult. I'm not fully convinced this is a good idea, we're probably better of just using autowrap.
  • MathML - probably not too difficult. Potentially very useful for Sympy|Gamma.
  • Python - should be easy since we only need to convert math functions and numbers.

Patch requirement:

#1582, #1512, #1555. These are old so I made #2070 which hasn't been merged yet.