Use sum/fsum & prod/fprod in some python related printers #24685

bjodah · 2023-02-08T09:16:16Z

Tentative fix for recusion error when using lambdify.

References to other Issues or PRs

Brief description of what is fixed or changed

Other comments

Perhaps this needs to be opt-in with some printer setting for backwards compatibility. (lambdify can then use that setting by default). Also, I have found functools.reduce(operators.add, [...]) to be a better replacement for sum in generic python code (but of course I can't remember an example right now).

Release Notes

NO ENTRY

sympy-bot · 2023-02-08T09:16:18Z

✅

Hi, I am the SymPy bot. I'm here to help you write a release notes entry. Please read the guide on how to write release notes.

No release notes entry will be added for this pull request.

Click here to see the pull request description that was parsed.

Tentative fix for recusion error when using lambdify.

#### References to other Issues or PRs
Fixes #24673

#### Brief description of what is fixed or changed


#### Other comments
Perhaps this needs to be opt-in with some printer setting for backwards compatibility. (`lambdify` can then use that setting by default). Also, I have found `functools.reduce(operators.add, [...])` to be a better replacement for `sum` in generic python code (but of course I can't remember an example right now).

#### Release Notes

<!-- Write the release notes for this release below between the BEGIN and END
statements. The basic format is a bulleted list with the name of the subpackage
and the release note for this PR. For example:

* solvers
  * Added a new solver for logarithmic equations.

* functions
  * Fixed a bug with log of integers. Formerly, `log(-x)` incorrectly gave `-log(x)`.

* physics.units
  * Corrected a semantical error in the conversion between volt and statvolt which
    reported the volt as being larger than the statvolt.

or if no release note(s) should be included use:

NO ENTRY

See https://github.com/sympy/sympy/wiki/Writing-Release-Notes for more
information on how to write release notes. The bot will check your release
notes automatically to see if they are formatted correctly. -->

<!-- BEGIN RELEASE NOTES -->
NO ENTRY
<!-- END RELEASE NOTES -->

asmeurer · 2023-02-08T09:19:39Z

sympy/printing/numpy.py

@@ -63,6 +63,14 @@ def _print_seq(self, seq):
        delimiter=', '
        return '({},)'.format(delimiter.join(self._print(item) for item in seq))

+    def _print_Add(self, e):
+        return '{}({})'.format(self._module_format(self._module + '.sum'),


Doesn't this have the problem with #5642?

maybe? would reduce(add, ...)) make a difference?

Not for the particular issue

@asmeurer I tried to figure out what a regression test would look like. You don't happen to able to construct one?

oscarbenjamin · 2023-02-08T11:21:33Z

Also, I have found functools.reduce(operators.add, [...]) to be a better replacement for sum in generic python code (but of course I can't remember an example right now).

In a situation where x + 0 would give a TypeError you would also get that error from sum([x, ...]) because it implicitly adds everything starting from zero. Using reduce is equivalent to sum(items[1:], items[0]) so it does not try to add the first item to 0.

ThePauliPrinciple · 2023-02-08T11:38:50Z

I think it would make sense if a setting is added to the python printer which holds the value of the recursion limit and then the behaviour is switched based on whether you would hit that limit. (then you can change that limit depending on what you want)

Not sure how easy that would be though. E.g. is the limit reached also when you have a combination of multiplications and additions?

As a side node, sum also seems to work correctly on something like this:

>>> import numpy as np
>>> A = np.array([1,2])
>>> sum([1, A])
array([2, 3])
>>> 1 + A
array([2, 3])

bjodah · 2023-02-08T11:55:18Z

In a situation where x + 0 would give a TypeError you would also get that error from sum([x, ...]) because it implicitly adds everything starting from zero. Using reduce is equivalent to sum(items[1:], items[0]) so it does not try to add the first item to 0.

Right, my point is that we cannot replace our current generated x + y + z with sum([x, y, z]). So we either need to use reduce(add, [x, y, z]) or like you propose(?): sum([y, z], x)

Here's an example adapted from SO:

In [1]: class p():
   ...:     def __init__(self, x, y):
   ...:         self.x=x ; self.y=y
   ...:     def __repr__(self):
   ...:         return "(%r,%r)"%(self.x,self.y)
   ...:     def __add__(self, P):
   ...:         return p(self.x+P.x, self.y+P.y)
   ...:     

In [2]: pts=[p(1,0), p(2,1), p(-3,4)]

In [3]: pts[0] + pts[1] + pts[2]
Out[3]: (0,5)

In [4]: sum(pts)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-45922890d470> in <module>()
----> 1 sum(pts)

TypeError: unsupported operand type(s) for +: 'int' and 'instance'

In [5]: import operator

In [6]: import functools

In [7]: functools.reduce(operator.add, pts)
Out[7]: (0,5)

In [8]: sum(pts[1:], pts[0])
Out[8]: (0,5)

bjodah · 2023-02-08T12:00:17Z

I think it would make sense if a setting is added to the python printer which holds the value of the recursion limit and then the behaviour is switched based on whether you would hit that limit. (then you can change that limit depending on what you want)

It sounds expensive to try/catch a RecursionError, no?

Another option might be to add a limit on the number of arguments, above which Add is printed using sum(...) instead of a + b + .... The question is then what a reasonable default for this value would be? 2? 3? 10? 100?

oscarbenjamin · 2023-02-08T12:06:41Z

I don't think RecursionError should be caught. Using reduce seems fine to me though.

oscarbenjamin · 2023-02-08T12:19:52Z

The question is then what a reasonable default for this value would be?

Timings with numpy arrays:

In [32]: a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z = [np.ones(100)] * 26

In [33]: %timeit a+b+c+d+e+f+g+h+i+j+k+l+m+n+o+p+q+r+s+t+u+v+w+x+y+z
20.2 µs ± 280 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [34]: %timeit reduce(add, [a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z])
22.6 µs ± 217 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [35]: %timeit sum([b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z], a)
20 µs ± 190 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [36]: %timeit a+b+c
1.71 µs ± 4.94 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [37]: %timeit reduce(add, [a,b,c])
2.38 µs ± 21.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

In [38]: %timeit sum([b,c], a)
1.99 µs ± 20.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

With individual floats:

In [39]: a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z = [1.0] * 26

In [40]: %timeit a+b+c+d+e+f+g+h+i+j+k+l+m+n+o+p+q+r+s+t+u+v+w+x+y+z
903 ns ± 5.48 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [41]: %timeit reduce(add, [a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z])
1.59 µs ± 8.42 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [42]: %timeit sum([b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z], a)
829 ns ± 3.05 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [43]: %timeit a+b+c
117 ns ± 0.29 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

In [44]: %timeit reduce(add, [a,b,c])
361 ns ± 1.96 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [45]: %timeit sum([b,c], a)
233 ns ± 1.07 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

I guess that with enough terms sum becomes faster than just using +. It looks like reduce is always slower.

asmeurer · 2023-02-08T21:21:13Z

This is the sort of issue you will run into with the NumPy printer:

>>> np.array([1, 2]) + np.array([[1, 2], [1, 3]])
array([[2, 4],
       [2, 5]])
>>> np.sum([np.array([1, 2]), np.array([[1, 2], [1, 3]])])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 180, in sum
  File "/Users/aaronmeurer/anaconda3/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2296, in sum
    return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
  File "/Users/aaronmeurer/anaconda3/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: could not broadcast input array from shape (2,2) into shape (2,)

NumPy does implicitly treat a list of arrays as a larger array, but it doesn't implicitly broadcast those arrays. This is essentially the same problem as #5642 where we do this with matrices. It most often occurs with scalar values because those sort of happen automatically when an expression doesn't have any variables in it, but it applies to any broadcasting.

Even when it sometimes does work, you get VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray. from NumPy.

By the way, the code for NumPy should be passing axis=0 to get the equivalent result as +.

>>> np.array([1, 2]) + np.array([1, 2])
array([2, 4])
>>> np.sum((np.array([1, 2]), np.array([1, 2])))
6

I think we can do this in this case by calling broadcast_arrays on the input. That's been an issue for #5642 because it's overhead to do that in every single lambdify call, but here we would only be doing it for sums that are above a certain threshold of terms.

asmeurer · 2023-02-08T21:23:46Z

sympy/utilities/tests/test_lambdify.py

+        def __exit__(self, exc_t, exc_v, exc_tb):
+            sys.setrecursionlimit(self._ori_limit)
+
+    # TODO, reproduce using a smaller expression. (this takes too long...)


Something like this works:

lambdify(x, Add(*[x**i for i in range(10000)]))

asmeurer · 2023-02-08T21:25:51Z

mpmath has fsum https://mpmath.org/doc/current/general.html#fsum. According to its docstring it's faster and more accurate for more than 2 terms.

ThePauliPrinciple · 2023-02-08T21:37:49Z

This is the sort of issue you will run into with the NumPy printer:

Can't you avoid this by simply using the python buildin sum instead of the numpy one?

>>> sum([np.array([1, 2]), np.array([[1, 2], [1, 3]])])
array([[2, 4],
       [2, 5]])

ThePauliPrinciple · 2023-02-08T21:39:36Z

sympy/printing/pycode.py

@@ -94,6 +94,8 @@ class AbstractPythonCodePrinter(CodePrinter):
        fully_qualified_modules=True,
        contract=False,
        standard='python3',
+        num_terms_sum=42,


honestly, 42 sounds pretty reasonable here, but it could also be just the actual limit such that the printer writes exactly the same code now as it did before except when that code would not be valid

asmeurer · 2023-02-08T21:47:04Z

Can't you avoid this by simply using the python buildin sum instead of the numpy one?

Sure but sum is just going to call + a bunch under the hood. np.sum actually is more efficient, but it just needs to be done in a way that properly mimics + in all cases.

ThePauliPrinciple · 2023-02-08T21:47:18Z

sympy/printing/pycode.py

+        # is already better served by the mpmath printer. And finally, subclassing and
+        # overriding this remains an option.
+        if len(expr.args) >= self._settings['num_terms_sum']:
+            return 'sum([%s], start=%s)' % (


this should use module_format and builtins.sum like here:

sympy/sympy/printing/pycode.py

Lines 271 to 280 in 677fc6b

def _print_Sum(self, expr):

loops = (

'for {i} in range({a}, {b}+1)'.format(

i=self._print(i),

a=self._print(a),

b=self._print(b))

for i, a, b in expr.limits)

return '(builtins.sum({function} {loops}))'.format(

function=self._print(expr.function),

loops=' '.join(loops))

(e.g. clashing import names/definitions)

Actually, now that I look at it, that code looks bugged since it doesn't use module_format :)

asmeurer · 2023-02-08T21:52:40Z

Can't you avoid this by simply using the python buildin sum instead of the numpy one?

Sure but sum is just going to call + a bunch under the hood. np.sum actually is more efficient, but it just needs to be done in a way that properly mimics + in all cases.

Although to be honest, the difference isn't as much as I expected, even for a huge number of arrays

In [15]: arrs = [np.empty((100,)) for i in range(10000)]

In [16]: %timeit sum(arrs)
4.91 ms ± 11.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [17]: %timeit np.sum(arrs, axis=0)
3.66 ms ± 52 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

bjodah · 2023-02-13T09:09:17Z

~~I'm not sure what to do with numpy.sum:~~

>>> (arr_5_3_4 + arr_4 + arr_3_4).shape
(5, 3, 4)
>>> numpy.sum([arr_5_3_4, arr_4, arr_3_4], axis=0).shape
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (3,) + inhomogeneous part.

~~I tried wrapping the arguments with a numpy.broadcast_arrays call inside numpy.sum(...) but I didn't manage to reproduce the operator.add behavior.~~

EDIT: numpy.sum(numpy.broadcast_arrays(...)) seems to work in cd83991.

bjodah · 2023-02-13T09:10:38Z

mpmath has fsum https://mpmath.org/doc/current/general.html#fsum. According to its docstring it's faster and more accurate for more than 2 terms.

It's already there, unless I missed something?

bjodah · 2023-02-20T15:35:00Z

I won't have the time to pursue fixing this PR for the foreseeable future.

asmeurer reviewed Feb 8, 2023

View reviewed changes

Use sum/fsum in some python related printers

4a2db98

bjodah force-pushed the printing-code-sum branch from f48a7c5 to 4a2db98 Compare February 8, 2023 09:38

New python printer settings: num_terms_sum, num_factors_prod

677fc6b

bjodah changed the title ~~Use sum/fsum in some python related printers~~ Use sum/fsum & prod/fprod in some python related printers Feb 8, 2023

asmeurer reviewed Feb 8, 2023

View reviewed changes

ThePauliPrinciple reviewed Feb 8, 2023

View reviewed changes

bjodah mentioned this pull request Feb 9, 2023

Recursion error when lambdifying an expression with 108k operations #24673

Open

Prefer builtins.sum over sum

836209b

bjodah added 2 commits February 14, 2023 20:54

use numpy.broadcast_arrays in numpy.sum(...) call

cd83991

drop unused import

6ec3ea1

bjodah added the Please take over label Feb 20, 2023

github-actions bot added the Merge conflict label Feb 22, 2023

bjodah closed this May 8, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use sum/fsum & prod/fprod in some python related printers #24685

Use sum/fsum & prod/fprod in some python related printers #24685

bjodah commented Feb 8, 2023 •

edited

sympy-bot commented Feb 8, 2023 •

edited

asmeurer Feb 8, 2023

bjodah Feb 8, 2023

oscarbenjamin Feb 8, 2023

bjodah Feb 8, 2023

oscarbenjamin commented Feb 8, 2023

ThePauliPrinciple commented Feb 8, 2023

bjodah commented Feb 8, 2023 •

edited

bjodah commented Feb 8, 2023

oscarbenjamin commented Feb 8, 2023

oscarbenjamin commented Feb 8, 2023

asmeurer commented Feb 8, 2023

asmeurer Feb 8, 2023

asmeurer commented Feb 8, 2023

ThePauliPrinciple commented Feb 8, 2023

ThePauliPrinciple Feb 8, 2023

asmeurer commented Feb 8, 2023

ThePauliPrinciple Feb 8, 2023

asmeurer commented Feb 8, 2023

bjodah commented Feb 13, 2023 •

edited

bjodah commented Feb 13, 2023

bjodah commented Feb 20, 2023

	def _print_Sum(self, expr):
	loops = (
	'for {i} in range({a}, {b}+1)'.format(
	i=self._print(i),
	a=self._print(a),
	b=self._print(b))
	for i, a, b in expr.limits)
	return '(builtins.sum({function} {loops}))'.format(
	function=self._print(expr.function),
	loops=' '.join(loops))

Use sum/fsum & prod/fprod in some python related printers #24685

Use sum/fsum & prod/fprod in some python related printers #24685

Conversation

bjodah commented Feb 8, 2023 • edited

References to other Issues or PRs

Brief description of what is fixed or changed

Other comments

Release Notes

sympy-bot commented Feb 8, 2023 • edited

asmeurer Feb 8, 2023

Choose a reason for hiding this comment

bjodah Feb 8, 2023

Choose a reason for hiding this comment

oscarbenjamin Feb 8, 2023

Choose a reason for hiding this comment

bjodah Feb 8, 2023

Choose a reason for hiding this comment

oscarbenjamin commented Feb 8, 2023

ThePauliPrinciple commented Feb 8, 2023

bjodah commented Feb 8, 2023 • edited

bjodah commented Feb 8, 2023

oscarbenjamin commented Feb 8, 2023

oscarbenjamin commented Feb 8, 2023

asmeurer commented Feb 8, 2023

asmeurer Feb 8, 2023

Choose a reason for hiding this comment

asmeurer commented Feb 8, 2023

ThePauliPrinciple commented Feb 8, 2023

ThePauliPrinciple Feb 8, 2023

Choose a reason for hiding this comment

asmeurer commented Feb 8, 2023

ThePauliPrinciple Feb 8, 2023

Choose a reason for hiding this comment

asmeurer commented Feb 8, 2023

bjodah commented Feb 13, 2023 • edited

bjodah commented Feb 13, 2023

bjodah commented Feb 20, 2023

bjodah commented Feb 8, 2023 •

edited

sympy-bot commented Feb 8, 2023 •

edited

bjodah commented Feb 8, 2023 •

edited

bjodah commented Feb 13, 2023 •

edited