Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

numpy lambdify of piecewise doesn't work for invalid values #11306

Closed
asmeurer opened this issue Jun 27, 2016 · 32 comments
Closed

numpy lambdify of piecewise doesn't work for invalid values #11306

asmeurer opened this issue Jun 27, 2016 · 32 comments

Comments

@asmeurer
Copy link
Member

This example is given at https://stackoverflow.com/questions/38040846/error-with-sympy-lambdify-for-piecewise-functions-and-numpy-module

In [16]: p = Piecewise((1/x, y < -1), (x, y <= 1), (1/x, True))

In [17]: lambdify([x, y], p, modules='numpy')(0, 1)
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-17-4c592d95c008> in <module>()
----> 1 lambdify([x, y], p, modules='numpy')(0, 1)

/Users/aaronmeurer/anaconda3/lib/python3.5/site-packages/numpy/__init__.py in <lambda>(_Dummy_24, _Dummy_25)

ZeroDivisionError: division by zero

The Piecewise is designed to avoid zero division for (0, 1), but the problem is that the numpy lambdify printer uses select

In [18]: from sympy.printing.lambdarepr import *

In [19]: NumPyPrinter().doprint(Piecewise((1/x, y < -1), (x, y <= 1), (1/x, True)))
Out[19]: 'select([less(y, -1),less_equal(y, 1),True], [1/x,x,1/x], default=nan)'

which evaluates all values, and then picks the one corresponding to the first true selection.

ZeroDivisionErrors aside, this is also somewhat inefficient, and one could see this causing an unreasonable slowdown for a large piecewise, where all values are evaluated instead of just the one for the first true condition.

@richardotis you changed it to use select at #9749. Any thoughts?

@asmeurer
Copy link
Member Author

Wrong PR. It was changed at #9657.

@asmeurer
Copy link
Member Author

Why was it changed to use select instead of if/then/else chains, which are the default from LambdaPrinter? Those work just fine, because they don't evaluate the value unless the condition is true

In [31]: class MyNumPyPrinter(NumPyPrinter):
   ....:     def _print_Piecewise(self, expr):
   ....:         return LambdaPrinter._print_Piecewise(self, expr)
   ....:

In [33]: lambdify([x, y], p, modules='numpy', printer=MyNumPyPrinter)(0, 1)
Out[33]: 0

@richardotis
Copy link
Contributor

The reason we don't use if/then/else is that it breaks as soon as you try to pass array arguments:

lambdify([x, y], p, modules='numpy', printer=MyNumPyPrinter)(0, np.array([1, 0]))
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

However you are correct to point out that using select is inefficient, which is why in my fork of NumPyPrinter I changed it to use where, which evaluates conditions lazily, not too long ago:

    def _print_Piecewise(self, expr):
        "Piecewise function printer"
        exprs = [self._print(arg.expr) for arg in expr.args]
        conds = [self._print(arg.cond) for arg in expr.args]
        if (len(conds) == 1) and exprs[0] == '0':
            return '0'
        # We expect exprs and conds to be in order here
        # First condition to evaluate to True is returned
        # Default result: float zero
        result = 'where({0}, {1}, 0.)'.format(conds[-1], exprs[-1])
        for expr, cond in reversed(list(zip(exprs[:-1], conds[:-1]))):
            result = 'where({0}, {1}, {2})'.format(cond, expr, result)

        return result

I don't remember exactly why I chose to use select; the original motivation may have had to do with compatibility with numba, but that is moot since numba natively supports where now.

I'll put together the PR to fix this.

@richardotis
Copy link
Contributor

(Yes, it had to do with trying to make piecewise broadcasting work in numba, before I fully understood how numba.vectorize worked. Anyway, it doesn't matter now.)

@asmeurer
Copy link
Member Author

Your printer that uses where doesn't fix the problem. It prints this Piecewise as

where(less(y, -1), 1/x, where(less_equal(y, 1), x, where(True, 1/x, 0.)))

which still evaluates every instance of 1/x.

@asmeurer
Copy link
Member Author

In fact, given that Python is an eagerly evaluated language, I don't see how it's more efficient either.

@richardotis
Copy link
Contributor

Of course you're correct that all the arguments to where will still get evaluated before it gets called. Let me think about this for a bit.

@richardotis
Copy link
Contributor

(The lazy evaluation misconception must have come from something I was doing with numba.)

I can't think of a good solution to this until we have the ability to do array allocations in the code generator. NumPy, like Python, evaluates all function arguments eagerly. The only way to prevent a Piecewise represented in NumPy from evaluating an expression which will raise is to use Python's built-in branching (if/else), but that branching is not going to handle array argument broadcasting.

The NumPy docs seem to indicate that you should be able to use np.seterr to turn off raising on floatdiv-by-zero, but I haven't been able to get it to work locally yet.

@asmeurer
Copy link
Member Author

Are we already setting that flag? The default numpy behavior is

>>> 1/numpy.array([0])
__main__:1: RuntimeWarning: divide by zero encountered in true_divide
array([ inf])

@asmeurer
Copy link
Member Author

Oh, it evaluates 1/0 in Python. So is the solution to wrap any divided expressions with numpy.array?

@asmeurer
Copy link
Member Author

In other words, this works:

>>> from numpy import *
>>> l = lambda x, y: select([less(y, -1),less_equal(y, 1),True], [1/array(x),x,1/array(x)], default=nan)
>>> l(0, 1)
array(0.0)
>>> l(0, array([0, 1]))
array([ 0.,  0.])

@asmeurer
Copy link
Member Author

That doesn't change the inefficiency issue, though, but as far as I know no one has complained about that yet.

@asmeurer
Copy link
Member Author

I guess array(array) does a copy. So we want 1/array(x, copy=False).

@richardotis
Copy link
Contributor

Probably asarray is a better choice so we don't make an unnecessary copy, but otherwise I think that's the best call.

(I thought it was confusing that numpy's default behavior would be to raise on division by zero...)

@richardotis
Copy link
Contributor

Looks like I have this commented out in my fork, where apparently I ran into this problem before:

    def _print_Integer(self, expr):
        # Upcast to work around an integer overflow bug
        return 'asarray({0}, dtype=float)'.format(expr)

@richardotis
Copy link
Contributor

But the overflow issue is a good reminder. If something like Pow gets evaluated in pure Python, we could overflow and get this same issue. Would it be sufficient to wrap _print_Integer, _print_Float, and _print_Symbol?

@asmeurer
Copy link
Member Author

What was the integer overflow bug?

The right place to put this is in _print_Mul, so that denominators get wrapped. If there are other places where Python ops raise exceptions but numpy ops don't, we should wrap those as well.

@asmeurer
Copy link
Member Author

Interesting behavior:

In [22]: numpy.array(0)**-1
./bin/isympy:1: RuntimeWarning: divide by zero encountered in power
  #!/usr/bin/env python
./bin/isympy:1: RuntimeWarning: invalid value encountered in power
  #!/usr/bin/env python
Out[22]: -9223372036854775808

In [23]: 1/numpy.array(0)
./bin/isympy:1: RuntimeWarning: divide by zero encountered in true_divide
  #!/usr/bin/env python
Out[23]: inf

@asmeurer
Copy link
Member Author

Wrapping every symbol would do it, as then everything would be an array, but it's probably overkill. We really only need to protect denominators.

@richardotis
Copy link
Contributor

p = x**y
lambdify([x, y], p, modules='numpy', printer=NumPyPrinter)(50, 10e40)
OverflowError: (34, 'Numerical result out of range')

@asmeurer
Copy link
Member Author

Actually, a much cleaner solution is to wrap the input values as arrays:

In [27]: lambdify([x, y], p, modules='numpy')(array(0), array(1))
/Users/aaronmeurer/anaconda3/lib/python3.5/site-packages/numpy/__init__.py:1: RuntimeWarning: divide by zero encountered in true_divide
  """
Out[27]: array(0.0)

So we should make it so that lambdify(x, expr, 'numpy')(value) is the same as lambdify(x, expr, 'numpy')(asarray(value)). NumPy does this implicitly automatically, but it can't when the operations are Python.

An issue here is that doing this automatically would be a backwards compatibility break with the current way lambdify works, because currently it lets you mix and match modules, like lambdify(x, expr, modules=['numpy', 'sympy']). IMHO, this behavior is already broken, and not really useful. But it's worth noting.

@richardotis
Copy link
Contributor

My thought is we should probably do both (wrap inputs and arguments to Mul and Pow), even though it's overkill, in case users try to pass in SymPy objects containing constant expressions which raise when represented in pure Python. But perhaps this is too much of an edge case for us to be worth complicating the way NumPyPrinter works.

@asmeurer
Copy link
Member Author

Can you give an example of that? I'm unclear what you mean.

@richardotis
Copy link
Contributor

Oddly, I can't seem to create an example. I don't see why this works:

p = Piecewise((1/x, y < -1), (Pow(Integer(50), Integer(10e40), evaluate=False), y <= 1), (1/x, True))
lambdify([x, y], p, modules='numpy', printer=NumPyPrinter)(50, 0)
inf

@richardotis
Copy link
Contributor

Never mind, it doesn't actually work. I just had the Integer-wrapping thing in there by accident. If you run it normally, it just hangs.

@richardotis
Copy link
Contributor

richardotis commented Jun 27, 2016

To get back to the main point, do you think we should just add a decorator to the lambdify'd function when called with numpy, like

def array_wrap(func):
    def wrapper(*args, **kwargs):
        return func(*[np.asarray(i) for i in args], **kwargs)
    return wrapper

@asmeurer
Copy link
Member Author

Yes (or the equivalent).

Regarding the integer overflow thing, I'd consider that to be a separate issue. If it comes up in real problems, we can look at fixing it. The example you gave feels contrived, since evaluating 10**10e40 is going to either hang or overflow no matter how it is evaluated. But maybe there are real uses.

@richardotis
Copy link
Contributor

Agreed. I'll write the PR to address this issue like you suggested, unless you have another idea.

@caley
Copy link
Contributor

caley commented Jun 19, 2018

I was looking at this in relation to #14671, and found the following fails (assumes numpy is available):

>>> p = Piecewise((1/x, y < -1), (x, y <= 1), (1/x, True))
>>> lambdify([x, y], p)(0, 1)
...
ZeroDivisionError: division by zero

The problem is that numpy is used by default, but the input argument wrapping is only enabled if numpy is explicitly requested:

if module_provided and _module_present('numpy',namespaces):
if any(isinstance(arg, str) for arg in args):
apply_numpy_decorator = True
else:
wrap_numpy_args = True

Is there a reason for the module_provided part of the test?

@asmeurer
Copy link
Member Author

There's probably not a good reason. I think that change predates the inclusion of numpy as the default.

@caley
Copy link
Contributor

caley commented Jun 21, 2018

I tried removing the module_provided part of the test, but then this test case fails:

lam = lambdify(x, {f(x): x})
assert lam(3) == {103: 3}

with

TypeError: unhashable type: 'numpy.ndarray'

@asmeurer
Copy link
Member Author

I guess let's just leave it there for now. If we implement my suggestion from #14671 it won't matter because it would only affect the printing of Piecewise in the NumPyPrinter.

caley added a commit to caley/sympy that referenced this issue Jun 22, 2018
The automatic wrapping of input arguments was to fix the behaviour of
Piecewise expressions when passed to lambdify with the numpy module
(issue sympy#11306).  However this resulted in performance regressions for
all code using lambdify with numpy (issue sympy#14671).

This commit removes the automatic wrapping code, and updates the lambdify
documentation to explain how to handle Piecewise with numpy.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants