Sympy : Parsing and codegen

Satya Prakash Dwibedi edited this page Apr 3, 2017 · 8 revisions

Proposal

Abstract

Sympy a Computer Algebra System written in python has parsing as sub modules inside it, which converts python like expressions to instances of Sympy. My proposal aims to add some functionalities in parsing module by adding parsers to parse C-like expressions to instances of Sympy as well as a parser to parse text-based functions to instances of Sympy.
Sympy also has a submodule printing and utilities.codegen which interacts with each other and generate routines in different languages from a given Sympy expression.My Proposal aims to generate optimized and enhanced code by the use of some optimization tricks and add some supports which will contribute to the efficiency of generated code.

Deliverables

  1. Parsing of C-like expressions to instances of sympy
    I would like to add a parser inside parsing module to parse C-like expressions to the instances of Sympy, which will help C programmers to acess features of Sympy without worrying about expression structures of Sympy or Python.Parser will parse C-like expression to the instances of sympy thus will be able to calculate from given C-like expression and generate corrsponding routines in C using codegen.
    i.e
>>> expr =  sympify('x++ + ++x + x++',type='c')# here to pass a type of expression 
>>> expr.subs(x,5)#here it converts the C-like expression to instances of sympy and returns result
19  
>>> x
8
>>> [(c,d),(e,f)]=codegen(('complex_sin',S(sin(sin(x++ + ++x + x++)),type='c')),'c',header=False,empty=True)
>>> print d
#include "complex_sin.h"
#include <math.h>

double complex_sin(double x) {

   double f_result;
   f_result = sin(sin(x++ + ++x + x++));#here it generates equivalent C code containing C-like expressions.
   return f_result;

}

above I demonstrated the simple use of C-like expressions and similarly, we can also manipulate complex C-like expressions.
The addition of this feature will help C programmers to manipulate previously written C files and Sympy can also read and write routine generated by it.
In code specified above I have used an attribute type inside sympify() which I plan to implement will help sympify function to call corresponding parsers.Here by specifying type='c', it will call parser for C, which can parse C-like expressions to instances of Sympy.
2. Implementation of Text-based functions
Implementation of this will help regular users a lot i.e they can take benefit of features of sympy by writing complex functions written in textual format, e.g to perform integration of x^2 w.r.t x he won't have to know sympy syntax for calculating integration which is integrate(x**2,x); he can calculate it by writing S('integrate x^2 w.r.t x',type='TextBased').
Here also I would like to introduce a type attribute inside sympify function which will specify it to call specific parsers.
i.e.to integrate

>>>  integrate(cos(x),x)
sin(x)

the user can also use:

>>>  S('integrate cos(x) w.r.t. x',type='TextBased')
sin(x)
  1. Resolving the anomaly between Logical operators and Bitwise operators
    consider:
>>> ccode('x & y')
'x && y'
>>> ccode('And(x,y)')
'x && y'

The above 2 executed statement shows that either the use of bit wise operator & or logical operator And(x,y) leads to same code in C which is an ambiguity.The main issue is when a programmer would like to generate C routine containing bit wise operator. The code should be executed like this:

>>> ccode('x & y')
'x & y'
>>> ccode('And(x,y)')
'x && y'

It is related to issue 5353, as discussed there the error lies inside the sympy_parser which parses x & y as And(x,y). So I would like to work on resolving this issue.
4. Generated C codes should have support for expressions involving Complex variables

>>> x=symbols('x',complex=True)
>>> ccode('x**2')
'pow(x, 2)'

should be cpow(x,2) because pow() in C supports for real variable types only and cpow() is function defined specifically for calculation of complex powers. Similarly:

>>> x=symbols('x',complex=True)
>>> [(c,d),(e,f)]=codegen(("f",x**2),"c",header=False,empty=True)
>>> print d
#include "f.h"
#include <math.h>

double f(double x) {

   double f_result;
   f_result = pow(x, 2);
   return f_result;

}

here it includes header file math.h whereas it should be a combination of header file complex.h, inbuilt function cpow() inside it, the function return type should be double complex and the result i.e. f_result should also be of double complex type as the result of complex power calculation is complex in most of the cases. This also involves solving the issue 12453; which is actually issued by me.
5. Rewrite expressions using fma() i.e. fused multiplication addition
Consider the expression x*y+z, calculating the expression value with integers is easy but if floating point values are involved then accuracy in the result is low because according to IEEE standard; after multiplication, the multiplied value should be rounded and checked for exceptions and then addition should be performed.Which may lead to loosing accuracy.
What fma() does is, it skips the rounding part and calculates the total value and then returns the result thus saving circuit time and gives us more accuracy.
6. Implementing evaluation of polynomials using horner's rule

>>> [(c,d),(e,f)]=codegen(("f",poly(x*(x**2 + x - 1)**2).eval(y)),"c",header=Fal
se,empty=True)
>>> print d
#include "f.h"
#include <math.h>

double f(double y) {

   double f_result;
   f_result = pow(y, 5) + 2*pow(y, 4) - pow(y, 3) - 2*pow(y, 2) + y;
   return f_result;

}

From above-generated routine, it can be seen that the polynomial is evaluated using the naive algorithm or traditional algorithm which has multiple calls to pow and a time complexity of O(n^2) but using horner's rule the polynomial can be evaluated easily with a time complexity of O(n).
should be:

#include "f.h"
#include <math.h>

double f(double y) {

   double f_result;
   f_result = y*(y*(y*(y*(y+2)-1)-2)+1);
   return f_result;

}
  1. Implementing Sum in code printers
    Consider the following code snippet:
>>> ccode(Sum(x**2,(x,0,9)))
'// Not supported in C:\n// tuple\nSum(pow(x, 2), (x, 0, 9))'
>>> octave_code(Sum(x**2,(x,0,9)))
'Sum(x.^2, {x, 0, 9})'

As seen above code printers can't print Sumand if they are printing they are printing wrong as in case of octave. This is regarding the issue 10194 opened by asmeurer.Here main issue arises when it is needed to included inside another expression i.e:

>>> [(c,d),(e,f)]=codegen(('f',sin(Sum(x,(x,0,9)))),'c',header=False,empty=True)
>>> print d
#include "f.h"
#include <math.h>

double f() {

   double f_result;
   f_result = sin(Sum(x, (x, 0, 9)));
   return f_result;

}

As you can see above in generated C routine it uses a function Sum() which is neither defined inside math.h nor an inbuilt in C, which is a bug and I would like to work on this as a part of my project.
8. Implementing exp2 in place of pow(2,x)

>>> ccode('2**x')
'pow(2, x)'

Though above-generated code works well, but using exp2() instead of pow() will be efficient because the power of 2 can be easily calculated by the use of bit shift operator and that is implemented in exp2() function defined in math.h header file.
Similarly, I would like to add functions:
exp2f() - for floating point function evaluation.
expm1(x) - for calculating exp(x)-1
9. Making Sympy more interactive using NLP techniques
Note:- If time permits
Here my idea is to make sympy more interactive using NLP techniques:
for e.g.

>>> sympify('(x+2)**2').subs(x,8)
100
>>> why
8 + 2 = 10 
10**2 = 100

i.e user can interact with Sympy using natural languages and can also know the flow of calculation.

Planning

I am willing to give approx. 4 hrs daily during the pre-GSoC period to get in touch with the mentors and get deeper into parsing codebase and pycparser codebase.During GSoC period I am willing to give 8hrs. daily.
Before May 4
* Familiarize me completely with the parsing module's code base.
* Revise Theory of Computation and Compiler Design courses.
* Study of implementation of parsing techniques.
May 5 - May 30 (Before Official coding time)
* self-coding and get my doubts cleared from gitter.
* Solve some parsing issues.
* With the help of my mentor I will get my goals and implementation techniques finalized.
May 30 - June 6 (Official coding period starts)
Start working on following issues, as I have already worked with code gen codebase implementing this will be relatively easy for me.
* Implementing exp2() instead of pow() inside generated C routine.
* Implementing printer for printing fma() inside generated codebase.
* implementing printers for addingheader file complex.h inside C codebase involving the complex variables and
printing of corresponding functions for complex values defined inside complex.h header file.
* Implementing Sum() printing ability for code printers using functions.
June 6 - June 26
* Implementing horner's rule for Polynomials evaluation inside generated code base.
* Here I will switch to working on parsing module and work on resolving parsing issue of bitwise and logical
operators.
MID-TERM EVALUATION
June 27 - August 15
As working on parsing module requires a great deal of thought, So I would like to use most time on it.
* Start working on adding the parser for C-like expressions inside parsing module.
* Start working on developing the parser for text based functions.
August 15 - August 21
Start work on debugging and documentation.
If some time remains I will start working on interactive sympy.

Related Work

Related works include PLY which implements lex and Yacc tools for parsing purpose in python and pycparser a parser to parse C language implementing this tool by the implementation of LR parsing technique.
My work in pre-GSoC period will be to get into their source code and getting an idea of how it is implemented in python and with reference to that I will implement the parser for C expressions only and convert to instances of Sympy and after completion of them, I will work on the parser for Text based function implementation.

About me

Name : - Satya Prakash Dwibedi
E-mail :- akash581050@gmail.com
GitHub :- SatyaPrakashDwibedi
Ph.No :- +917751844623

Biographical Information

I am a 3rd. year student at College of Engineering and Technology, BBSR, India in Information Technology. I am an intermediate python programmer. I am passionate about making things work fast, efficient and make them as general as possible.
Note: - I belong to TimeZone GMT+5.30.So I would request assigned mentor to give me complete details of his schedule so that I would cope up with him and adjust my time accordingly.

How do I fit in

Me as the programmer

  • Platform and Editor
    I work on windows platform and prefer IDLE(Python GUI) as primary editor for python and sometimes Sublimetext3. I prefer IDLE because it gives python type look and I prefer it over other interpreters because it's light and doesn't take much CPU space.
  • Programming Experience
    I started coding in C in my first year, then took C++ in the second year and I love C because it has the powerful thing called pointers using which we can manipulate hardware and memory blocks.Then in my third year I just tried to learn this new language called Python but after writing 3-4 lines of code I fell in love with it because as it's said: "SIMPLICITY IS BEAUTY". I also have done projects in python and the projects include:
    1. https://github.com/SatyaPrakashDwibedi/File_Reader: - Reads a text file choose by user; designed using Python
    2. https://github.com/SatyaPrakashDwibedi/TEXTEDITOR: - A simple Text Editor
    3. https://github.com/SatyaPrakashDwibedi/Proposal_card: - A proposal card to ask a girl out.
    I love open source coding as it helps to know other developers and improve one’s coding abilities as well as innovating ideas to solve problems and lets students and developers like us to manipulate codes according to our requirement.
  • Flavours of sympy
    Consider the following code snippet
>>> x,y=symbols('x y',integer=True)
>>> from sympy.utilities.codegen import codegen
>>> [(c,d),(e,f)]=codegen(("f",x**y),"C",header=False,empty=True)
>>> print d
#include "f.h"
#include <math.h>

double f(int x, int y) {

   double f_result;
   f_result = pow(x, y);
   return f_result;

}

this is cool because instead of 6 lines in C, I just had to write 4 line in python using Sympy package. This is just a glimpse of how marvelously it can help to solve complicated mathematical statements. This is not just one and It can solve complicated mathematical statements

>>> i=symbols('i')
>>> expr=Sum(sin(cos(sin(i)))**2,(i,0,9))
>>> expr.doit()
sin(cos(sin(8)))**2 + sin(cos(sin(5)))**2 + sin(cos(sin(2)))**2 + sin(cos(sin(1)
))**2 + sin(cos(sin(4)))**2 + sin(cos(sin(7)))**2 + sin(cos(sin(9)))**2 + sin(co
s(sin(6)))**2 + sin(cos(sin(3)))**2 + sin(1)**2

So, basically it's awsome.

  • Qualifications I do have to implement the idea
    I have done programming in python and made myself familiarized with code-printer codebase and codegen codebase and solved some issues:
  1. https://github.com/sympy/sympy/pull/12298 (merged)
  2. https://github.com/sympy/sympy/pull/12433 (to be merged)
    also Opened some issues:
  3. https://github.com/sympy/sympy/issues/12423 (solved)
  4. https://github.com/sympy/sympy/issues/12453 (work to be carried on during GSoC period)
  5. https://github.com/sympy/sympy/issues/12356 (found duplicate and work to be crried on during GSoC period)
    So I will have less problem in fixing issues I have mentioned and as I have some experience in C,So I can easily detect my bugs while fixing it.
    I also have thorough knowledge about how Git, Gitter, Mailing list works and also the code of conduct.
    As I have made myself familiarized with reading source code, so it won't be difficult for me to read other's source codes so I will face less problem while reading parsing source code.
    I have taken the compiler design and Theory of computation course at the college level and will get deeper into it during the pre-GSoC period and focus mainly on the application part so, knowledge of Theory of Computation and Compiler design will help me to understand parsers deeply and implement them.
    The courses I have taken include:
    1. Compiler Dsign
    2. Theory of Computation
      Working on developing parsers will help me a lot as I am planning on designing a compiler for my semester project.
      I am willing to complete my project successfully and promise not to leave in between by disappointing you.
      I am promising to put my best.

References

  1. Codegen Ideas
  2. Codegen Notes
  3. Parsing Wiki
  4. Aaron Meurer GSoC application 2009
  5. Cpp manuals.
Clone this wiki locally
You can’t perform that action at this time.
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.
Press h to open a hovercard with more details.