lambdify of constant functions #5642

raoulb · 2011-07-06T02:03:23Z

Using sympy.lambdify together with numpy arrays
has a severe issue when the lambdified function
is a constant. Let me show an example,
first I show what I would like to have
in any case:

x = sympy.Symbol("x")
f = sympy.lambdify(x, x**2, "numpy")

Now assume we want to evaluate f on a grid u:

u = numpy.linspace(-2,2,10)

where u.shape is (10,) and the result
f(u).shape has shape (10,) too. This is what I
expect when doing numerics. Now the example where
it goes wrong:

g = sympy.lambdify(x, sympy.diff(x**2,x,2), "numpy")

So the function is essentially a constant independent of x.
And this is the cause for the issue:

g(u) return a single float 2 and not a numpy array anymore!

This is not what I expect in numerics, although it's obvious
from a programmers point of view. (And the same issue is present
in plain python lambda expressions.)

Possible solutions:

use numpy.vectorize:

  h = numpy.vectorize(sympy.lambdify(x, sympy.diff(x**2,x,2), "numpy"))

h(u) then gives
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

Drawbacks: vectorize is SLOW, see also http://www.mail-archive.com/numpy-discussion@scipy.org/msg00587.html We should really only use it when absolutly necessary.
use an additional pseudo-dependence:

  k = sympy.lambdify(x, sympy.diff(x**2,x,2) + 0*x, "numpy")

This does not work because sympy simplifies the 0*x away
A possible starting point for a patch:

def lambdify(symbols, expression):
    """Specialized lambdify routine avoiding the unused argument issue.
    A common sympy/python lambda expression 'f = lambda x: 2' has the
    problem that f(x) returns a float 2 even if x was a numpy array.
    Thus we loose all information on the shape of x.
    """
    f = sp.lambdify(symbols, expression, "numpy")

    if not all([ i in expression for i in symbols ]):
        f = np.vectorize(f)

    return f

Original issue for #5642: http://code.google.com/p/sympy/issues/detail?id=2543
Original author: https://code.google.com/u/107490137238222069432/

The text was updated successfully, but these errors were encountered:

asmeurer · 2011-07-05T21:18:34Z

I can confirm this issue.

I think we could fix the issue better than using the ways that you suggest. Remember that you can edit the lambdify code to fix this (that's why you reported the issue to us, isn't it). lambdify(x, 2) basically returns a wrapper around lambda x: 2 (see lambdastr in sympy.utilities.lambdify). Thus, I think we could add vectorization in the constant case to the lambda expression itself. Something like

>>> lambdastr(x, x**2, printer=numpy_lambdastr)
lambda x: (x**2)
>>> lambdastr(x, 2, printer=numpy_lambdastr)
lambda x: numpy.vectorize(lambda x: 2)(x)

where printer=numpy_lambdastr would be sent to lambdastr when numpy is used in lambdify. I hope I am using vectorize properly here. It seems to work:

In [45]: (lambda x: numpy.vectorize(lambda x: 2)(x))(u)
Out[45]: [2 2 2 2 2 2 2 2 2 2]

numpy_lambdastr would recursively call the normal lambdastr for the internal lambda. I don't know much about numpy, so you or someone else should look to see if this looks like a good solution.

Also, I admit I don't know much about lambdify (I basically just read the code for the first time just now, and I've never used it in practice). Would it be better to have a lambda inside the vectorize (outer) lambda, or to have a lambdified function in there? The latter would involve recursively calling lambdify() instead of recursively calling lambdastr().

Status: Accepted

Original comment: http://code.google.com/p/sympy/issues/detail?id=2543#c1
Original author: https://code.google.com/u/asmeurer@gmail.com/

asmeurer · 2011-07-17T15:30:37Z

So I had a little chat with some people here at the SciPy conference, one of whom is a NumPy developer, and their opinion is that you should take advantage of this trivial case and of broadcasting and use the scalar answer returned to your advantage (it has less memory).

Original comment: http://code.google.com/p/sympy/issues/detail?id=2543#c2
Original author: https://code.google.com/u/asmeurer@gmail.com/

asmeurer · 2012-03-20T07:52:38Z

**Status:** Valid

Original comment: http://code.google.com/p/sympy/issues/detail?id=2543#c3
Original author: https://code.google.com/u/asmeurer@gmail.com/

cknoll · 2017-06-30T17:20:10Z

I am affected by this behavior since years. Should I try to submit a PR?

I documented my perspective to the problem:
https://github.com/cknoll/demo-notebooks/blob/master/sympy-lambdify-vectorization-problem.ipynb

asmeurer · 2017-06-30T19:22:58Z

Interesting. How would you propose to fix the problem. I wonder if there is a better numpy function you could use than the array constructor to merge the two arrays with broadcasting.

asmeurer · 2017-06-30T19:23:45Z

This looks relevant.

cknoll · 2017-07-04T06:39:11Z

That link solved my problem. Thank you! Still I consider this as workaround. IMHO it would be cleaner if lambdify (optionally) would return a function which is vectorized independently of the expression (which might depend on runtime data).

Idea (not veryfied):
lambdify(args, expr, vectorized=True) could do the following:

store original shape of expr
convert expr into flat list
apply lambdify -> orig_lambda_fnc
define wrapper:

.

def lambdify_wrapper(*args):
    tmp_res = orig_lambda_fnc(*args) # returns a list
    # use args as reference for broadcasting
    tmp_res.extend(args)
    # use the explicit broadcasting trick from above
    bc_res = numpy.array(numpy.broadcast_arrays(tmp_res))
    # remove args
    # use reshape to restore original shape of expr + extra dimensions due to
    # array valued arguments
    return reshaped_res

asmeurer · 2017-07-04T23:20:43Z

@bjodah @moorepants what do you think? Have you ever been affected by this issue?

bjodah · 2017-07-05T13:15:25Z

Kind of, if lambdify was only used to generate functions returning numpy arrays it would be easier to fix in a general manner.

But consider also that expr in lambdify may be multiple expressions with varying shape (e.g. a tuple of a scalar, a vector and a matrix).

a-price · 2018-04-02T20:27:19Z

Is there a recommended workaround for this issue? I'm attempting to plot some vector fields, and lambdify dies on fields with one dimension of constant value. (Both args and expr arguments to lambdify are sympy.Matrix-valued.)

asmeurer · 2018-04-02T20:41:12Z

Here is a simple example, for reference

>>> lambdify(Matrix([x, y]), Matrix([x, y]))(np.array([1, 2]), np.array([2, 3]))
array([[[1, 2]],

       [[2, 3]]])
>>> lambdify(Matrix([x, y]), Matrix([x, 2]))(np.array([1, 2]), np.array([2, 3]))
array([[array([1, 2])],
       [2]], dtype=object)

asmeurer · 2018-04-02T20:48:13Z

I think to fix this we just need to modify the NumPyPrinter for Matrix. Right now it just prints it directly as an array:

sympy/sympy/printing/pycode.py

Lines 405 to 409 in 09a1cb3

    
           def _print_MatrixBase(self, expr): 
        
               func = self.known_functions.get(expr.__class__.__name__, None) 
        
               if func is None: 
        
                   func = self._module_format('numpy.array') 
        
               return "%s(%s)" % (func, self._print(expr.tolist()))

But it needs to be smarter. I'm not sure what the proper way to print this in general should be.

As a workaround, you might be able to use a wrapper. See the above comments.

a-price · 2018-04-02T21:20:04Z

Unfortunately, I'm not able to get a jagged array out of lambdify in order to do the broadcasting recommended earlier, as it crashes during the evaluation. Here's a short working example:

import sympy as sp
import numpy as np
import matplotlib.pyplot as plt

def draw_vector_field(f, x, **kwargs):
    sim_dim = 20j
    X2, X1 = np.mgrid[-2:2:sim_dim, -2:2:sim_dim]

    # Create a computational function from the symbolic one
    f_comp = sp.lambdify(x, f)

    # Compute the numerical values of the function
    F = f_comp(X1, X2)
    print(F.shape)
    F1, F2 = F[0], F[1]

    # Eliminate excess dimension of size 1
    F1 = np.squeeze(F1, axis=0)
    F2 = np.squeeze(F2, axis=0)
   
    fig0, ax0 = plt.subplots()
    ax0.quiver(X1, X2, F1, F2, **kwargs)

x = sp.Matrix(sp.symbols('x_0, x_1'))
f1 = sp.Matrix([[x[0]],[x[1]]])
f2 = sp.Matrix([[0],[-x[1]]])

# This works
draw_vector_field(f1, x)
plt.show()

# This fails
draw_vector_field(f2, x)
plt.show()

Outputs:

(2, 1, 20, 20)
Traceback (most recent call last):
  File "test_code.py", line 33, in <module>
    draw_vector_field(f2, x)
  File "test_code.py", line 13, in draw_vector_field
    F = f_comp(X1, X2)
  File "<string>", line 1, in <lambda>
ValueError: setting an array element with a sequence.

I guess one workaround might be to loop through all entries of f and lambdify them individually?

retsyo · 2018-08-27T01:55:13Z

I met this problem today, and the search lead me here. Is there any conclusion? thanks
my demo is a circle in 3D, where z should be array of 0, but the following code

import numpy as np
import sympy as sp


def lambdify(symbols, expression):
    """Specialized lambdify routine avoiding the unused argument issue.
    A common sympy/python lambda expression 'f = lambda x: 2' has the
    problem that f(x) returns a float 2 even if x was a numpy array.
    Thus we loose all information on the shape of x.
    """
    if type(symbols) not in [type((1,)), type([])]:
        symbols = [symbols]

    f = sp.lambdify(symbols, expression, "numpy")

    if not all([ i in expression for i in symbols ]):
        f = np.vectorize(f)

    return f


t = sp.symbols('t', real=True)

x = sp.cos(t)
y = sp.sin(t)
z = t - t
R = sp.Matrix([x, y, z])


lambdaR = lambdify(t, R)

s = np.arange(0, 2*np.pi, np.pi/4)

cor = lambdaR(s)

print(cor)

says

Traceback (most recent call last):
  File "test_lambdify.py", line 34, in <module>
    cor = lambdaR(s)
  File "E:\prg\py\Anaconda3_64\lib\site-packages\numpy\lib\function_base.py", line 2576, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "E:\prg\py\Anaconda3_64\lib\site-packages\numpy\lib\function_base.py", line 2655, in _vectorize_call
    res = array(outputs, copy=False, subok=True, dtype=otypes[0])
ValueError: setting an array element with a sequence.

jjramsey · 2019-03-14T12:45:21Z

I got caught by this issue recently myself. I used lambdify on a symbolic expression of a gradient vector, that is I did something like this:

my_grad = sympy.zeros(len(params), 1)
for i in range(len(params)):
    my_grad[i] = sympy.diff(my_func, params[i])

If all elements of my_grad are functions, then lambdifying my_grad leads to a function that, when given vector inputs of length N, returns an array of dimensions len(params) x 1 x N. However, if one of the elements is a constant, then lambdifying my_grad leads to a function that, when given vector inputs of length N, returns a ragged array where the "row" corresponding to the constant is a single value, while the other rows are arrays of length N. This is painfully inconsistent.

oscarbenjamin · 2019-04-23T00:58:15Z

Judging from the comments here (over many years) it seems that this is an ongoing problem for many users.

I don't really know what lambdify is or what it is used for though. The docs say

The primary purpose of this function is to provide a bridge from SymPy expressions
to numerical libraries such as NumPy, SciPy, NumExpr, mpmath, and tensorflow

https://docs.sympy.org/latest/modules/utilities/lambdify.html#sympy.utilities.lambdify.lambdify

However @bjodah's comment above makes it seem like we can't just assume we are dealing with numpy which makes it difficult to implement a numpy-specific fix.

Does anyone have a clear idea of what the fix should be?

asmeurer · 2020-11-09T08:00:55Z

This is also related to #18824

asmeurer · 2020-11-09T08:09:43Z

I wonder if lambdify should use a wrapper like the one given by @rouckas automatically whenever numpy is a module. lambdify is supposed to create a NumPy-like function, and NumPy functions are expected to broadcast their return value based on the input shapes. My concern is that it would probably affect performance. We already had performance issues in the past by putting a wrapper in lambdify to automatically convert the input into an array, which we had to revert. So we should check the lambdify benchmarks in the benchmark suite. If it the slowdown is too much, then I don't think it's worth changing. Actually the wrapper would only need to go on functions that are constant, but I think performance still is something to consider.

EmilMadsen90 · 2021-02-03T11:00:40Z

I got caught by this issue as well. With

a, b = sympy.symbols(“a b”)
M = sympy.Matrix([[a, a], [b, b]])
N = sympy.Matrix([[a, a], [b, 0]])
f = sympy.lambdify([a, b], M, 'numpy')
g = sympy.lambdify([a, b], N, 'numpy')
a_num = numpy.array([1.0, 2.0])
b_num = numpy.array([1.0, 2.0])

doing f(a_num, b_num) gives

array([[[1., 2.],
        [1., 2.]],
       [[1., 2.],
        [1., 2.]]])

while doing g(a_num, b_num) gives

<lambdifygenerated-3>:2: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return (array([[a, a], [b, 0]]))
array([[array([1., 2.]), array([1., 2.])],
       [array([1., 2.]), 0]], dtype=object)

Any suggested workaround?
The broadcast workaround by @rouckas unfortunately does not work - it gives the same error as above.

rouckas · 2021-02-03T13:18:10Z

@EmilMadsen90 The problem in your case is in the definition of N

N = sympy.Matrix([[a, a], [b, 0]])

I assume that by 0 you mean a matrix with the same size as b filled with zeros? In numpy, I would just use zeros_like(b) or b*0, but sympy doesn't know that symbol b will be a matrix and b*0 gets evaluated to scalar 0. To prevent this evaluation, you can write b*0 as sympy.Mul(b, 0, evaluate=False). So, my solution would be to define N as
N = sympy.Matrix([[a, a], [b, sympy.Mul(b, 0, evaluate=False)]])

EmilMadsen90 · 2021-02-03T14:25:18Z

Thanks for the fast answer. I see that my examble was maybe a bit too simple based on what I'm actually doing. Sorry for that.

I assume that by 0 you mean a matrix with the same size as b filled with zeros?

b is a symbol/scalar, not a matrix. I want to evaluate g (the lambdified N) in a set of numerical values for a and b, namely a_num and b_num. Then I want to concatenate the numerical matrices for each entry/value in a_num and b_num along axis 0 in a big matrix.
In my actual work/usecase N is a matrix of (very long trigonometric) expressions obtained via partial differentiation of another matrix, and the location of zeros/constants in the matrix could change at runtime.

rouckas · 2021-02-03T15:40:08Z

b is a symbol/scalar, not a matrix. I want to evaluate g (the lambdified N) in a set of numerical values for a and b, namely a_num and b_num. Then I want to concatenate the numerical matrices for each entry/value in a_num and b_num along axis 0 in a big matrix.

If I understand your description, you want to evaluate g(a_num[0], b_num[0]), g(a_num[1], b_num[1]).... and concatenate these matrices? Your code is actually creating a "block" matrix res = [[a_num, a_num], [b_num, b_num]], where a and b are arrays, but numpy is clever and when it sees that a_num and b_num have the same dimension, it turns res into a 3D array, so you can get the first matrix for a_num[0], b_num[0] as res[:,:,0], etc...

I see two possible solutions: first, you can apply the "hack" I suggested before to each element of N as follows:
N = N.applyfunc(lambda x:x+sympy.Mul(b, 0, evaluate=False))

or, I would suggest reorganizing the arrays so that f and g are always applied to pairs of values and use numpy.vectorize to do the vectorization.

a, b = sympy.symbols("a b")
M = sympy.Matrix([[a, a], [b, b]])
N = sympy.Matrix([[a, a], [b, 0]])

f = sympy.lambdify([(a, b)], M, 'numpy')
g = sympy.lambdify([(a, b)], N, 'numpy')

fv = numpy.vectorize(f, signature="(m)->(m,n)")
gv = numpy.vectorize(g, signature="(m)->(m,n)")

a_num = numpy.array([1.0, 2.0, 3.0])
b_num = numpy.array([1.0, 2.0, 3.0])
ab = numpy.vstack((a_num, b_num)).T

now gv(ab) gives

array([[[1., 1.],
        [1., 0.]],

       [[2., 2.],
        [2., 0.]],

       [[3., 3.],
        [3., 0.]]])

so the first matrix is at index gv(ab)[0], which seems more logical to me.

oscarbenjamin · 2021-02-03T20:10:29Z

I guess that the reason this issue remains unresolved is that it is not generally possible to apply numpy's broadcasting rules in lambdify. Otherwise the broadcast suggestion about would be the obvious approach:

def broadcast(fun):
    return lambda *x: numpy.broadcast_arrays(fun(*x), *x)[0]

That should solve all the simple cases where lambdify is used to apply a scalar function to an array of values efficiently which I think is what most complaints above are about.

The problem though is that lambdify is not always used to broadcast a scalar function and we can have things like:

In [17]: X = IndexedBase('X')

In [18]: expr = X[1] + X[2]

In [19]: expr
Out[19]: X[1] + X[2]

In [20]: f = lambdify(X, expr, modules='numpy')

In [21]: A = np.array([1, 2, 3])

In [22]: f(A)
Out[22]: 5

Here although the input is an array of shape (3,) the output is a scalar. Likewise we also have things like:

In [31]: B = MatrixSymbol('B', 3, 1)

In [32]: expr = B.T * B

In [33]: f = lambdify(B, expr, modules='numpy')

In [34]: f(np.array([1, 2, 3]))
Out[34]: 14

In [35]: f(np.array([[1], [2], [3]]))
Out[35]: array([[14]])

This means that where we allow an array to be substituted for an IndexedBase or a MatrixSymbol it is no longer possible for the lambdified function to infer the shape of the outputs purely from the shape of the inputs.

This mixed interpretation of array inputs as sometimes being for broadcasting a scalar expression and other times to replace a non-scalar component of the expression seems quite poorly defined to me. I'm sure that there must be other edge cases where it isn't really possible to come up with a unique sensible choice of output. To me it would seem cleaner if there were separate functions for broadcasting scalar expressions vs substituting arrays for non-scalars rather than having both actions in the same function lambdify.

asmeurer · 2021-02-03T20:21:29Z

The matrix question has come up before #19259. I wonder if there is something we can do to improve that, or at least document the correct way to achieve it.

rlaugier · 2021-02-24T13:33:28Z

With my use of lambdify in the past years I'm surprised not to have ran into this issue sooner. Now that I have, and find no practical workaround (for my application), I am extremely frustrated.
I agree with the conclusion of @oscarbenjamin that this seems to call for separate functions, or at least an option in the lambdify call.
In the ideal case to get the best of both worlds it would leave the user with the choice of which variable would broadcast/vectorize and which are instead subscriptable in the expression... To be honest I am (almost) surprised by the existence of this possibility of subscripting because I find that broadcasting your mathematical expression to datasets is the main appeal of lambdify.
This was just my input and suggestion as a user.

oscarbenjamin · 2021-02-24T15:56:58Z

Does the broadcast workaround not work?

def broadcast(fun):
    return lambda *x: numpy.broadcast_arrays(fun(*x), *x)[0]

oscarbenjamin · 2021-02-24T21:20:30Z

Maybe if all symbols given to lambdify are ordinary symbols then a broadcast could be used.

akirakyle · 2021-02-25T01:04:19Z

Since I've been living with this issue for a little bit, I thought I'd share one of my work-arounds for the matrix case as a third alternative to the two already shared by @rouckas

def lambdify_matrix(M, syms, xs):
    assert(isinstance(M, Matrix))
    return np.block([[[np.broadcast_to(lambdify(syms, M[i,j])(xs), xs.shape)]
                      for j in range(M.cols)] for i in range(M.rows)])

The second of @rouckas isn't really usable for me due to the performance impact of using np.vectorize, especially since for me it's usually that the sympy matrix dimensions are small while the datasize is large. This issue became a lot less frustrating when I realized that sympy essentially just calls lambdify on each matrix element and then returns np.array() of the result so it's not so different than the above function and should have similar performance.

rlaugier · 2021-02-25T07:03:37Z

Thanks @oscarbenjamin for this quick answer. Here is a minimal example for my application:

import numpy as np
import sympy as sp
sigma = sp.symbols("sigma")
mymatrix = sp.Matrix([[1, 0, sigma],
                     [0, 1, 1+sigma]])
f = sp.lambdify(sigma, mymatrix, modules="numpy")
sigmas = np.linspace(-10,10, 100)
b = f(sigmas)

Here it just builds a collection of parametric matrices, which I would ultimately want in a (n,m,nx) or (nx,n,m) array. As written here, I end up with an array of objects containing some integers and some arrays of floats. The simple broadcast workaround fails when the function is called because of dimensions mismatch.
Luckily @akirakyle seems to have read my mind, and his workaround works in my case. Yet, since the lambdified functions for each cell is built, applied and discarded at each call, I'm not sure if it would scale nicely for multiple calls on complicated expressions.
A while back, I built a little class that did a similar thing but stored the functions, and provided a __call__ method that would build the appropriate result. Back then it was to smooth-out some discrepancies between the behaviors of "numpy" and "numexpr" lambdified functions when called on 1D sympy matrices. I suspect if I recycle this structure, upgrade it to 2D, and use @akirakyle 's solution inside, it may work and scalability should be improved. I'll try to share this here when (if) I have the opportunity to do it.
Thanks for the support.

asmeurer · 2021-02-25T09:25:34Z

It should be possible to make this work by modifying the lambdify printer for Matrix so it gives @akirakyle's code (except just include the recursive lambdified printed code for each element).

rlaugier · 2021-03-17T10:29:45Z

I tried to implement a class like I suggested 20 days ago, but the result is disappointing. It doesn't really scale to something with more than one variable, and may become complicated.
However, I would like to suggest a different avenue for a workaround. The whole problem originates from the fact that sympy automatically simplifies the expressions, leading to the disappearance of the symbols that need to be broadcasted. Instead of that, maybe we should try to have clever ways to force sympy to preserve the mention of those symbols in the expression throughout the process. I have tweaked my example to demonstrate a way to do that "from the outside" by adding a free symbol:

import numpy as np
import sympy as sp
sigmas = np.linspace(-2,2, 100)
sigma = sp.symbols("sigma")
mymatrix = sp.Matrix([[1, 0, sigma],
                     [0, 1, 1+sigma]])
#What we want to do:
f = sp.lambdify(sigma, mymatrix, modules="numpy")
b = f(sigmas)
#What we do instead:
z = sp.symbols("z")
mymatrixz = mymatrix + z*sp.prod(mymatrix.free_symbols)*sp.ones(mymatrix.shape[0], mymatrix.shape[1])
fz = sp.lambdify((sigma, z), mymatrixz, modules="numpy")
bz = fz(sigmas, 0)
bz

This approach seems to work and appears to be easier to implement for a flexible context.
I wonder if there could be a less barbarian way to force sympy to preserve some (or all) free symbols in specific situations?

oscarbenjamin · 2021-03-17T19:39:49Z

The whole problem originates from the fact that sympy automatically simplifies the expressions, leading to the disappearance of the symbols that need to be broadcasted.

I'm not sure I understand what you mean. The title of the issue is about constant functions like lambdify((x,), 2).

asmeurer · 2021-03-17T21:11:39Z

I think the point is that in practice, people are using lambdify(x, expr) where expr is generated in some way, where it is usually nonconstant but it sometimes can cancel out and be constant.

It's true if you do something like lambdify(x, Add(expr, x, -x, evaluate=False)), or something similar when expr is a matrix, then it will always produce the same value but the presence of the cancelled x in the lambdified function will force the the function to broadcast to the input shape the same as if it weren't constant. This is probably a viable workaround for many user codes, assuming the extra x - x computation doesn't bother you. I wouldn't really consider it a good fix for the problem to implement in lambdify itself, however.

asmeurer · 2021-03-17T22:42:52Z

The primary issue here, as I understand it, is in lambdifying matrices, where the variables in the matrix are themselves are arrays. When array() gets an input that is a list of lists of arrays, if the arrays are all the same shape, it treats the arrays the same as if it were a normal list of lists. But if the arrays have different dimensions, it creates a ragged dtype=object array, and also spits out a deprecation warning. For example:

>>> import numpy as np
>>> np.array([np.array([1, 2]), np.array([3, 4])])
array([[1, 2],
       [3, 4]])
>>> np.array([np.array([1, 2]), 3])
<mypython-5>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  np.array([np.array([1, 2]), 3])
array([array([1, 2]), 3], dtype=object)

When SymPy lambdifies something like Matrix([[x, 1], [1, 0]]), it just replaces "Matrix" with "array":

>>> import inspect
>>> print(inspect.getsource(lambdify(x, Matrix([[x, 1], [1, 0]]))))
def _lambdifygenerated(x):
    return (array([[x, 1], [1, 0]]))

What we really want is to broadcast each list element so that it treats it in the first way.

I think the issue here is the array that is used for the lambdification of Matrix. The NumPy array constructor doesn't work in the way that we want it to. We really want all the array inputs to be broadcast together, so that the function works just as well for array and constant inputs, regardless of whether some entries are broadcast.

So we should either:

modify the Matrix printer to print something more complex with broadcast_to so that it does this, or
replace array in the NumPy lambdify namespace with something that does this.

Personally I think the former option is better for a fix in SymPy, although for a user workaround until this is fixed, the latter is the best approach. SymPy also is capable of detecting if the inputs are constant or not with respect to inputs, so it can simplify the broadcasting calls to only those entries that need it. For example, for Matrix([[x, 1], [1, 0]]), the only input variable is x, and it knows the 1 and 0 entries are constants, so it can print something like

array([[x, full_like(x, 1)], [full_like(x, 1), full_like(x, 0)]])

In general, it would need to broadcast the inputs to get the shape for the constant expressions like

shape = broadcast([x1, x2, ...]).shape # x1, x2, ... are the inputs to the lambdified function
return array(...) # Where ... uses full(..., shape) for constant inputs, or broadcast_to for inputs that use only a subset of the input variables

Things to get more complicated when you have multiple variables, because some expressions might have only a subset of them, but those expressions should still be broadcast to the shape for all the input variables.

One potential issue I see with this is that it assumes that the input always must be broadcast to a common shape. But I'm not sure if there might be cases when you wouldn't want this. For example, if the input is a completely constant matrix, it might be surprising if f(k-d array) adds k dimensions to the output instead of just returning the n x m matrix. Then again, in that case, things actually do broadcast, so maybe it isn't a huge deal (*). We may need to make this option configurable somehow.

(*) Except for the n x m matrices created in this way, you end up with shapes like (n, m, ..., k), where for broadcasting, as well as operations like dot, to work, you'd generally want to have the matrix dimensions last, like (k, ..., n, m). I'm a little curious how the people using lambdify like this handle this.

rlaugier · 2021-03-18T08:13:31Z

Hi @asmeurer , and thank you for all the effort you spend on this issue.
Indeed this deprecation warning is pushing us to find a solution for this issue (which has been open for a while now...). Yet even though it is most visible in Matrix form, I think it is also a problem when the expression is not a matrix. Consider the following example where the expression is scalar but the input are of arbitrary shapes:

x,y,z = sp.symbols("x, y, z")
xs = np.arange(10)
ys = np.arange(20)
zs = 10.
myfunction_xz = sp.lambdify((x,y,z), x*z, modules="numpy")
myfunction_xyz = sp.lambdify((x,y,z), x*y*z, modules="numpy")
a =  myfunction_xz(xs[:,None,],ys[None,:],zs)
b = myfunction_xyz(xs[:,None,],ys[None,:],zs)

This currently works great as long as all symbols appear in the expression. But this situation a numpy user would expect a and b to have the same shape. Knowing the signature of the ufunc and the shape of the inputs you give it should be sufficient to know the shape of the output. This should be true regardless of the the details of the expression and whether it is - or is not - simplified.
This is the whole appeal of sympy for me: I want to be able to manipulate my expression in all sorts of crazy ways, yet when I finally lambdify it I want it to behave in a predictable and consistent manner.
Ideally I think the output of the printer should behave just as if all the args provided to sp.lambdify() did exist in the expression, because this is what defines the signature of the ufunc. For this purpose, (following my previous suggestion), maybe it (the printer) could add a + 0*y or + y-y to its output iff y is not one of the free symbols of the expression (or cell). From the little I know about the printer, I think this could fix the whole issue in the least invasive way, because everything seems to already work just fine when all the arguments are present as free symbols in the expression. (For information + 0*y seems a bit faster in numpy for some reason) Incidentally, this also applies when lambdifying to numexpr, and the same fix would work solve that too.

I hope this helps.
P.S.

(*) Except for the n x m matrices created in this way, you end up with shapes like (n, m, ..., k), where for broadcasting, as well as operations like dot, to work, you'd generally want to have the matrix dimensions last, like (k, ..., n, m). I'm a little curious how the people using lambdify like this handle this.

You make a good point here. I think this currently ends up (n, m, ..., k), while I think the expected result in numpy would be (k, ..., n, m). I have been checking the shapes of my outputs and using np.swapaxes in such situations. It is annoying but I'm guessing it is different issue. Regardless, the numpy user does expect the k dimension somewhere, and not just (n, m), even if it is constant over that new dimension.

asmeurer · 2021-03-18T17:20:52Z

This currently works great as long as all symbols appear in the expression. But this situation a numpy user would expect a and b to have the same shape. Knowing the signature of the ufunc and the shape of the inputs you give it should be sufficient to know the shape of the output. This should be true regardless of the the details of the expression and whether it is - or is not - simplified.

The reason I don't see this as an issue is that in this case, things should broadcast correctly. The resulting shapes of a and b in your example are (10, 1) and (10, 20), resp. Those shapes are broadcastable together, so any NumPy operation that supports broadcasting should work just fine with a and b as they are. If you really need them to be the same shape, you can use broadcast_arrays on them.

I suppose you are expecting here for lambdify to act like a ufunc and always broadcast its inputs. If you use ufuncify, it does work like this:

>>> from sympy.utilities.autowrap import ufuncify
>>> ufunc = ufuncify((x, y, z), x*z)
>>> ufunc(xs[:,None,],ys[None,:],zs).shape
(10, 20)

Ideally I think the output of the printer should behave just as if all the args provided to sp.lambdify() did exist in the expression, because this is what defines the signature of the ufunc. For this purpose, (following my previous suggestion), maybe it (the printer) could add a + 0y or + y-y to its output iff y is not one of the free symbols of the expression (or cell). From the little I know about the printer, I think this could fix the whole issue in the least invasive way, because everything seems to already work just fine when all the arguments are present as free symbols in the expression. (For information + 0y seems a bit faster in numpy for some reason) Incidentally, this also applies when lambdifying to numexpr, and the same fix would work solve that too.

I would expect printing the appropriate broadcast operations would be faster. Note that the performance in lambdify does matter. Something that can slow down every lambdify call by a significant amount should not be considered lightly. See for instance #14671, where we had at one point called array on all lambdify inputs to fix some other issues, but removed it because of the performance issues it caused. That's why this might need to be considered a optional feature, or at best, something that is only automatically enabled when the printer detects it is really needed, like the matrix case.

This works around an issue with sympy where its lambda returns a scalar instead of an array when the expression evaluates to a scalar: sympy/sympy#5642

* Symbolic parametric pulse This PR makes following changes. 1. add `_define` method to ParametricPulse. This method is expected to be implemented by each ParametricPulse subclass. Subclass must return Sympy or symengine symbolic equation that can be serialized. This method is called once in the constructor to fill `.definition` attribute of the instance. This definition is used for QPY serialization. 2. change behavior of `get_waveform` This method is originally implemented as abstractmethod that calls another callback that generates numpy array of samples (thus not serializable). Now this is a baseclass method that generates waveform array from the symbolic equation. 3. minor updates Now pulse parameters are not direct class attribute. Parameter names are defined as class attribute `PARAMS_DEF` and values are stored as instance variable `param_values`. This is good for writing serializer since it doesn't need to know attribute of each subclass, but just can call name def and values to get data to serialize. * Improve performance of parametrized pulse evaluation * Implement Constant pulse using Piecewise This works around an issue with sympy where its lambda returns a scalar instead of an array when the expression evaluates to a scalar: sympy/sympy#5642 * Fall back to sympy for parametric pulse evaluation * Use sympy Symbol with sympy lambdify * WIP: cache symbolic pulse lambda functions * move lambdify to init subclass for called once * turn parameter properties into attribute and remove slots * remove attribute and add __getattr__ and readd slots for better performance. * move symbolic funcs directly to _define * Convert numerical_func to staticmethod Co-authored-by: Will Shanks <willshanks@us.ibm.com> * remove symbolic add constraints, numerical func is renamed to definition for better integration with circuit instruction * revert changes to parametric pulse and create new symbolic pulse file * add ITE-like program * sympy runtime import * update test notebook * add attribute docs * fix non-constraints * remove pulse type instance variable * use descriptor * remove redundant comment * Update - Add symbolic pulse to assemble - Add symbplic pulse to parameter manager - Remove risefall ratio from GaussianSquare paramters * keep raw data for QPY encoding * update unittest * update helper function name * add reno * remove notebook * fix lint * fix unittest and logic * add more docs * review comments * lint fix * fix documentation * minor drawer fix * documentation upgrade Co-authored-by: Daniel J. Egger <38065505+eggerdj@users.noreply.github.com> * review comment misc Co-authored-by: Will Shanks <willshanks@us.ibm.com> * remove abstract class methods This commit removes class methods for symbolic expression so that SymbolicPulse can be instantiated by itself. And descriptors only saves Dict[Expr, Callable], i.e. lambda cache, rather than enforcing name-based mapping. * add error handling for amplitude * treat amp as a special parameter * Remove expressions for amplitude validation * support symengine - symengine doesn't support lamdify of complex expression, e.g. DRAG pulse - symengine doesn't support Less expression with complex value substitution - symengine Lambda function doesn't support args in mixture of complex and float object To overcome these, envelope expressions are separately stored for I, Q channel, and evaluation of 'amp' is excluded from symbolic operation. Thus in this implementation symbols are all real numbers. * use real=False option * review comment Co-authored-by: Will Shanks <willshanks@us.ibm.com> * - fix attribute - duration and amp are required - instantiate subclass with expression - remove __dict__ usage in __getattr__ and readd __slots__ - fix documentation - update private attirbute names * undo change to requirements-dev * fix __getattr__ mechanism Directly accessing to the instance variable crashes copy and deepcopy because when the object is copied the __init__ is not called before the first getattr is called. Then copied instance tries to find missing (not initialized) attribute in recursive way. use of __getattribute__ fixes this problem. * fix type hint reference * simplification * move amp from constructor to `parameters` dict * review comment Co-authored-by: Will Shanks <willshanks@us.ibm.com> * fix bug - use getattr in __eq__; e.g. Waveform doesn't have envelope - fix parameter maanger in addition, this replaces _param_vals and _param_names tuple with _params dictionary because overhead of generating a dict is significant. * fix typo * add eval_conditions to skip waveform generation * fall back to sympy lamdify when function is not supported * documentation update Co-authored-by: Will Shanks <willshanks@us.ibm.com> Co-authored-by: Daniel Egger <38065505+eggerdj@users.noreply.github.com> * replace eval_conditions with valid_amp_conditions Co-authored-by: Will Shanks <willshanks@us.ibm.com> * update hashing and equality, redefine expressions more immutably * add error message for missing parameter * cleanup * check parameter before hashing * move amp check to constructor * add envelope to hash * update docs * Update qiskit/pulse/library/symbolic_pulses.py Co-authored-by: Matthew Treinish <mtreinish@kortar.org> * lint Co-authored-by: Will Shanks <willshanks@us.ibm.com> Co-authored-by: Daniel J. Egger <38065505+eggerdj@users.noreply.github.com> Co-authored-by: Matthew Treinish <mtreinish@kortar.org>

raoulb added imported labels Mar 7, 2014

jcrist removed the Priority: Medium label Apr 17, 2015

asmeurer added the codegen label Jun 30, 2017

bjodah mentioned this issue Jul 5, 2017

Problems with Lambdify and heterogeneous output symengine/symengine.py#170

Closed

isuruf mentioned this issue Jul 8, 2017

Problems with LLVMDoubleVisitor symengine/symengine#1306

Closed

asmeurer mentioned this issue May 7, 2018

lambdify should generate full functions #14702

Closed

asmeurer mentioned this issue Jun 21, 2018

Performance regression in lambdify #14671

Open

asmeurer mentioned this issue Aug 2, 2018

Better Piecewise printer for NumPy #15014

Open

oscarbenjamin added the utilities.lambdify label Apr 23, 2019

ericpre mentioned this issue May 17, 2019

Make Polynomial an Expression + bugfixes hyperspy/hyperspy#1989

Merged

6 tasks

Simardeep27 mentioned this issue Mar 15, 2021

Integer division operation yields float #21093

Open

asmeurer mentioned this issue Oct 18, 2021

Hardcoded "return" in the lambdify printer prevents multiline code in lambdify #22277

Open

smichr removed the Valid label Nov 19, 2021

wshanks added a commit to wshanks/qiskit-terra that referenced this issue Mar 17, 2022

Implement Constant pulse using Piecewise

6f2496a

This works around an issue with sympy where its lambda returns a scalar instead of an array when the expression evaluates to a scalar: sympy/sympy#5642

bjodah mentioned this issue Sep 9, 2022

lambdify not working with constant expression #24036

Closed

This was referenced Feb 6, 2023

Recursion error when lambdifying an expression with 108k operations #24673

Open

Use sum/fsum & prod/fprod in some python related printers #24685

Closed

asmeurer mentioned this issue Sep 8, 2023

Lamdifying max/min with constants is broken #25660

Open

lambdify of constant functions #5642

lambdify of constant functions #5642

Comments

raoulb commented Jul 6, 2011 • edited by oscarbenjamin

asmeurer commented Jul 5, 2011 • edited by oscarbenjamin

asmeurer commented Jul 17, 2011 • edited by oscarbenjamin

asmeurer commented Mar 20, 2012

cknoll commented Jun 30, 2017

asmeurer commented Jun 30, 2017

asmeurer commented Jun 30, 2017

cknoll commented Jul 4, 2017

asmeurer commented Jul 4, 2017

bjodah commented Jul 5, 2017

a-price commented Apr 2, 2018

asmeurer commented Apr 2, 2018

asmeurer commented Apr 2, 2018

a-price commented Apr 2, 2018 • edited

retsyo commented Aug 27, 2018 • edited

jjramsey commented Mar 14, 2019

oscarbenjamin commented Apr 23, 2019 • edited

asmeurer commented Nov 9, 2020

asmeurer commented Nov 9, 2020

EmilMadsen90 commented Feb 3, 2021

rouckas commented Feb 3, 2021

EmilMadsen90 commented Feb 3, 2021 • edited

rouckas commented Feb 3, 2021

oscarbenjamin commented Feb 3, 2021

asmeurer commented Feb 3, 2021

rlaugier commented Feb 24, 2021

oscarbenjamin commented Feb 24, 2021

oscarbenjamin commented Feb 24, 2021

akirakyle commented Feb 25, 2021

rlaugier commented Feb 25, 2021

asmeurer commented Feb 25, 2021

rlaugier commented Mar 17, 2021

oscarbenjamin commented Mar 17, 2021

asmeurer commented Mar 17, 2021

asmeurer commented Mar 17, 2021

rlaugier commented Mar 18, 2021

asmeurer commented Mar 18, 2021

raoulb commented Jul 6, 2011 •

edited by oscarbenjamin

asmeurer commented Jul 5, 2011 •

edited by oscarbenjamin

asmeurer commented Jul 17, 2011 •

edited by oscarbenjamin

a-price commented Apr 2, 2018 •

edited

retsyo commented Aug 27, 2018 •

edited

oscarbenjamin commented Apr 23, 2019 •

edited

EmilMadsen90 commented Feb 3, 2021 •

edited