Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP fix codegen #14079

Closed
wants to merge 31 commits into from
Closed

WIP fix codegen #14079

wants to merge 31 commits into from

Conversation

gouarin
Copy link

@gouarin gouarin commented Feb 4, 2018

This PR is a fix and improvement of codegen.

  • codegen
    • fix the recursion when you have mutliple CodeBlocks
    • support other Matrix types
    • remove datatype function and define each datatype inside codegen classes for more flexibility
    • add settings in Routine to manage the printer for each routine when you have multiple
    • add idx_vars in Routine that stores the indexes contained in For (they need particular treatment sometimes)
  • utilities
    • autowrap: wrap several routines in one go and export the module
    • autowrap: add NumPy and Pure Cython wrapper

This short example shows the issue for the recursion of For

x, y = symbols('x,y', real=True) 
i = Idx('i') 
test = For(i, Range(10), [Assignment(x, y)]) 
routine = make_routine('test', test) 
code_gen = C89CodeGen() 
source = get_string(code_gen.dump_c, [routine]) 
print(routine) 
print(source)

The output is

Routine('test', [InputArgument(x), InputArgument(y)], [Result(For(i, Range(0, 10, 1), CodeBlock(Assignment(x, y))), result_6824502057046278987, result_6824502057046278987)], {i}, set())
#include "file.h"
#include <math.h>
double test(double x, double y) {
   double test_result;
   test_result = for (i = 0; i < 10; i += 1) {
      x = y;
   };
   return test_result;
}

The problem is that For expression has a CodeBlock (list of statements) which can contain important information for output parameters and return values.

So, the idea is to inspect output parameters and result values recursively and not to see a routine as a result but as a list of statements where you can have results, input, output, input/output.

Now, the main issue is to change the sympy expression where you have a result parameter like in this expression

For(i, Range(10), [x*y])

The idea would be to replace x*y by Assignment(result.name, x*y). In that way, all the work done for printing in a target language should be the same.

I don't know how to do this correctly because each method subs, replace, ... returns a copy and we need to change tuple element of CodeBlock for example.

If you have some ideas ...

@bjodah
Copy link
Member

bjodah commented Feb 4, 2018

I guess this should have a test added for the new capability?
Fixing sympy.utilities.codegen is of course great since sympy.codegen.ast doesn't offer a polished
interface yet. I just pushed a commit to #13100 where a test case builds and calls into a C-function with in-/output arguments. But it still requires a bit of boilerplate since .codegen.ast is mostly building blocks.

@gouarin
Copy link
Author

gouarin commented Feb 4, 2018

@bjodah the codegen utilities code will be remove after your PR will be merged ? Is that the idea ?

@bjodah
Copy link
Member

bjodah commented Feb 4, 2018

No, not in the foreseeable future, first sympy.codegen would need to reach feature completeness, and then there would be a long deprecation cycle, and even then it might not even be removed if the APIs differ so much that users cannot be expected to easily transition.

@gouarin
Copy link
Author

gouarin commented Feb 4, 2018

Ok. For now, I have something which works for pyLBM (http://pylbm.readthedocs.io/en/develop/) but with an old version of sympy and I would like all the stuff I added to be in sympy to not have multiple versions and restart from the beginning.

I have to improve the packaging of pyLBM and for that I will need to have a dependency of a release of sympy with a working codegen.

@gouarin
Copy link
Author

gouarin commented Feb 4, 2018

I also have a lot of numerical examples which can improve the idea of using sympy to build numerical codes and be able to have a robust codegen for multiple languages.

@asmeurer
Copy link
Member

asmeurer commented Feb 5, 2018

I have no comments on the changes here, but as a general comment, it would be useful for the codegen ast nodes to be able to specify how they should be interpreted as the rhs of an assignment. I guess this has to be done at the printer level, since different languages do different things. For a CodeBlock you probably want to assign the last line. For For you may want something similar (or maybe it should be an error, I don't know).

Forgive me if we already discussed this before, I can't remember.

@gouarin
Copy link
Author

gouarin commented Feb 5, 2018

I'm not sure I understand what you mean. The assignment is not necessary after the last line of a CodeBlock. For example, you can have

For(i, Range(10), [x*y, For(j, ... do something on x)])

where x could be a local variable. I had a lot of problem with this result term when I hacked for the first time codegen. So, I decided to remove it but I know it is not the solution.

The best solution for me is to ask to the user to declare the result variables as Result and then, doing the assignment like any other variables. Codegen doesn't have to do that since it can be difficult to catch it for every cases.
@asmeurer @bjodah tell me if you are agree with that.

The other problem that I encountered was the index variables in a For loop: they must be local variables which is not the case for now. I can also fix that.

For me, ast should be like an intermediate representation (like LLVM does) where it is easy to add any printing language class on it.

@gouarin
Copy link
Author

gouarin commented Feb 7, 2018

@asmeurer @bjodah could you give me your opinion about let user to define Result and not to try to predict it as mentioned in my last message ?

@bjodah
Copy link
Member

bjodah commented Feb 7, 2018

@gouarin you are right that we have quite a bit of degree of freedom to assume things in codegen.
But your example:

x, y = symbols('x,y', real=True) 
i = Idx('i') 
test = For(i, Range(10), [Assignment(x, y)]) 

in this case I believe inferring x and y being arrays is taking those assumptions one step too far.
We have an Indexed class in sympy.tensor.indexed that maps quite well to C/Fortran-arrays.

@gouarin
Copy link
Author

gouarin commented Feb 8, 2018

@bjodah my example was not a good one.

I will modify codegen with the assumption that a user must write if he want a Result and not to give just an expression without assignment.

Then, I will let you tell me what do you think about this modifications.

@gouarin
Copy link
Author

gouarin commented Mar 22, 2018

@asmeurer @bjodah I updated this PR to have codegen more readable and more flexible.

Now, I will add tests for loop code using For but it should work easily with what I have done.

Tell me if you are ok with theses changes.

@Abdullahjavednesar
Copy link
Member

ping @bjodah.

@moorepants
Copy link
Member

What is the status on this PR? It is labeled as WIP, so maybe still there.

@gouarin
Copy link
Author

gouarin commented Oct 7, 2018 via email

@moorepants
Copy link
Member

@gouarin Thanks for the update. We will leave it open.

@moorepants moorepants added PR: author's turn The PR has been reviewed and the author needs to submit more changes. codegen labels Oct 7, 2018
@sympy-bot
Copy link

sympy-bot commented Oct 29, 2018

Hi, I am the SymPy bot (v160). I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

Your release notes are in good order.

Here is what the release notes will look like:

  • codegen

    • fix the recursion when you have mutliple CodeBlocks (#14079 by @gouarin)

    • support other Matrix types (#14079 by @gouarin)

    • remove datatype function and define each datatype inside codegen classes for more flexibility (#14079 by @gouarin)

    • add settings in Routine to manage the printer for each routine when you have multiple (#14079 by @gouarin)

    • add idx_vars in Routine that stores the indexes contained in For (they need particular treatment sometimes) (#14079 by @gouarin)

  • utilities

    • autowrap: wrap several routines in one go and export the module (#14079 by @gouarin)

    • autowrap: add NumPy and Pure Cython wrapper (#14079 by @gouarin)

This will be added to https://github.com/sympy/sympy/wiki/Release-Notes-for-1.7.

Note: This comment will be updated with the latest check if you edit the pull request. You need to reload the page to see it.

Click here to see the pull request description that was parsed.

This PR is a fix and improvement of codegen.

<!-- BEGIN RELEASE NOTES -->
* codegen
  * fix the recursion when you have mutliple CodeBlocks
  * support other Matrix types
  * remove datatype function and define each datatype inside codegen classes for more flexibility
  * add settings in Routine to manage the printer for each routine when you have multiple
  * add idx_vars in Routine that stores the indexes contained in For (they need particular treatment sometimes)
* utilities
    * autowrap: wrap several routines in one go and export the module
    * autowrap: add NumPy and Pure Cython wrapper
<!-- END RELEASE NOTES -->

This short example shows the issue for the recursion of For

```
x, y = symbols('x,y', real=True) 
i = Idx('i') 
test = For(i, Range(10), [Assignment(x, y)]) 
routine = make_routine('test', test) 
code_gen = C89CodeGen() 
source = get_string(code_gen.dump_c, [routine]) 
print(routine) 
print(source)
```

The output is

```
Routine('test', [InputArgument(x), InputArgument(y)], [Result(For(i, Range(0, 10, 1), CodeBlock(Assignment(x, y))), result_6824502057046278987, result_6824502057046278987)], {i}, set())
#include "file.h"
#include <math.h>
double test(double x, double y) {
   double test_result;
   test_result = for (i = 0; i < 10; i += 1) {
      x = y;
   };
   return test_result;
}
```

The problem is that For expression has a CodeBlock (list of statements) which can contain important information for output parameters and return values.

So, the idea is to inspect output parameters and result values recursively and not to see a routine as a result but as a list of statements where you can have results, input, output, input/output.

Now, the main issue is to change the sympy expression where you have a result parameter like in this expression

```
For(i, Range(10), [x*y])
```

The idea would be to replace `x*y` by `Assignment(result.name, x*y)`. In that way, all the work done for printing in a target language should be the same. 

I don't know how to do this correctly because each method `subs`, `replace`, ... returns a copy and we need to change tuple element of CodeBlock for example.

If you have some ideas ...

@gouarin
Copy link
Author

gouarin commented Oct 29, 2018

@bjodah @asmeurer @moorepants

I think the improvement of codegen is done for me. Before I go further, I have some questions

  • codegen and autowrap are now difficult to read and I would like to split them, what do you think ?
  • I don't like these lines

https://github.com/sympy/sympy/blob/master/sympy/utilities/autowrap.py#L504-L507
https://github.com/sympy/sympy/blob/master/sympy/utilities/autowrap.py#L611-L615
https://github.com/sympy/sympy/blob/master/sympy/utilities/codegen.py#L1956-L1959

because we loose the flexibility that we could have with codegen to add other wrappers. I have no idea at this time to have a better format.

and I will probably add Pythran and Numba wrappers since I started to put NumPy wrapper. Do I add these benchmarks in SymPy repo or do you prefer that I create a new one ?

I made a lot of changes so tell me also if it's ok for you. I think we can do a lot of things now with codegen.

Copy link
Member

@bjodah bjodah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments on things that stood out to me. In general I think it's fine to refactor these modules heavily given that we don't break users code, i.e. we shouldn't modify existing tests (I hope that the test suite has sufficient coverage but that's another issue).

@@ -571,7 +571,7 @@ def test_ccode_Assignment():

def test_ccode_For():
f = For(x, Range(0, 10, 2), [aug_assign(y, '*', x)])
assert ccode(f) == ("for (x = 0; x < 10; x += 2) {\n"
assert ccode(f) == ("for (int x = 0; x < 10; x += 2) {\n"
Copy link
Member

@bjodah bjodah Oct 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a breaking change (in general we shouldn't change existing tests, they are not only there to ensure correctness, but also to guarantee a stable API for end users). Besides, I don't think that generating code for For should not imply declaration of the loop variable in C (it's not even valid C89, which is still what e.g. msvc targets and what most embedded systems use). You'll have to generate the declaration and initialization outside the printer...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is still a breaking change...

def autowrap(expr, language=None, backend='f2py', tempdir=None, args=None,
flags=None, verbose=False, helpers=None, code_gen=None, **kwargs):
def autowrap(expr_or_routines, language=None, backend='f2py', tempdir=None, args=None,
flags=None, verbose=False, helpers=None, code_gen=None, user_local_vars=None, settings={}, **kwargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should not use mutable default values keyword arguments (they are essentially global variables in python, unfortunately). You can set de default to None and then upon first access use the idiom settings or {}.

sympy/utilities/tests/test_autowrap.py Show resolved Hide resolved
@gouarin
Copy link
Author

gouarin commented Oct 29, 2018

I know that this PR begins to be huge and probably difficult to review. All I have done here, I use it in a real application https://github.com/pylbm/pylbm where the numerical code is generated by codegen.

Tell me how we can make progress to add these features in SymPy. We can merge (when you will be agree) and open an other PR that will split the codegen files.

I can also illustrate what I have done with some examples.

@bjodah
Copy link
Member

bjodah commented Nov 10, 2018

You're not really just splitting files here (which is what I thought you were asking about on gitter), you're moving whole submodules. That might be ok, but I'd rather review such changes in a separate PR. I don't know what @asmeurer thinks though.

@sympy-bot
Copy link

sympy-bot commented Jun 12, 2020

🟠

Hi, I am the SymPy bot (v160). I've noticed that some of your commits add or delete files. Since this is sometimes done unintentionally, I wanted to alert you about it.

This is an experimental feature of SymPy Bot. If you have any feedback on it, please comment at sympy/sympy-bot#75.

The following commits add new files:

  • b255705:
    • sympy/printing/cython.py

If these files were added/deleted on purpose, you can ignore this message.

@gouarin
Copy link
Author

gouarin commented Jan 30, 2021

I finally decided to take a new start and split this PR into smallest to facilitate the review.

I plan to make the following PR

  • fix codegen for nested blocks (Codegen nested blocks #20877)
  • add the possibility to have several routines constructed by autowrap
  • add NumPy and pure Cython codegen
  • add tests into external directory to test both NumPy and pure Cython generator

@gouarin gouarin closed this Jan 30, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
codegen PR: author's turn The PR has been reviewed and the author needs to submit more changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants