Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lambdify of constant functions #5642

Open
raoulb opened this issue Jul 6, 2011 · 37 comments
Open

lambdify of constant functions #5642

raoulb opened this issue Jul 6, 2011 · 37 comments

Comments

@raoulb
Copy link

raoulb commented Jul 6, 2011

Using sympy.lambdify together with numpy arrays
has a severe issue when the lambdified function
is a constant. Let me show an example,
first I show what I would like to have
in any case:

x = sympy.Symbol("x")
f = sympy.lambdify(x, x**2, "numpy")

Now assume we want to evaluate f on a grid u:

u = numpy.linspace(-2,2,10)

where u.shape is (10,) and the result
f(u).shape has shape (10,) too. This is what I
expect when doing numerics. Now the example where
it goes wrong:

g = sympy.lambdify(x, sympy.diff(x**2,x,2), "numpy")

So the function is essentially a constant independent of x.
And this is the cause for the issue:

g(u) return a single float 2 and not a numpy array anymore!

This is not what I expect in numerics, although it's obvious
from a programmers point of view. (And the same issue is present
in plain python lambda expressions.)

Possible solutions:

  • use numpy.vectorize:
  h = numpy.vectorize(sympy.lambdify(x, sympy.diff(x**2,x,2), "numpy"))

h(u) then gives
array([2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

  k = sympy.lambdify(x, sympy.diff(x**2,x,2) + 0*x, "numpy")
  • This does not work because sympy simplifies the 0*x away

  • A possible starting point for a patch:

def lambdify(symbols, expression):
    """Specialized lambdify routine avoiding the unused argument issue.
    A common sympy/python lambda expression 'f = lambda x: 2' has the
    problem that f(x) returns a float 2 even if x was a numpy array.
    Thus we loose all information on the shape of x.
    """
    f = sp.lambdify(symbols, expression, "numpy")

    if not all([ i in expression for i in symbols ]):
        f = np.vectorize(f)

    return f

Original issue for #5642: http://code.google.com/p/sympy/issues/detail?id=2543
Original author: https://code.google.com/u/107490137238222069432/

@asmeurer
Copy link
Member

asmeurer commented Jul 5, 2011

I can confirm this issue.

I think we could fix the issue better than using the ways that you suggest. Remember that you can edit the lambdify code to fix this (that's why you reported the issue to us, isn't it). lambdify(x, 2) basically returns a wrapper around lambda x: 2 (see lambdastr in sympy.utilities.lambdify). Thus, I think we could add vectorization in the constant case to the lambda expression itself. Something like

>>> lambdastr(x, x**2, printer=numpy_lambdastr)
lambda x: (x**2)
>>> lambdastr(x, 2, printer=numpy_lambdastr)
lambda x: numpy.vectorize(lambda x: 2)(x)

where printer=numpy_lambdastr would be sent to lambdastr when numpy is used in lambdify. I hope I am using vectorize properly here. It seems to work:

In [45]: (lambda x: numpy.vectorize(lambda x: 2)(x))(u)
Out[45]: [2 2 2 2 2 2 2 2 2 2]

numpy_lambdastr would recursively call the normal lambdastr for the internal lambda. I don't know much about numpy, so you or someone else should look to see if this looks like a good solution.

Also, I admit I don't know much about lambdify (I basically just read the code for the first time just now, and I've never used it in practice). Would it be better to have a lambda inside the vectorize (outer) lambda, or to have a lambdified function in there? The latter would involve recursively calling lambdify() instead of recursively calling lambdastr().

Status: Accepted

Original comment: http://code.google.com/p/sympy/issues/detail?id=2543#c1
Original author: https://code.google.com/u/asmeurer@gmail.com/

@asmeurer
Copy link
Member

asmeurer commented Jul 17, 2011

So I had a little chat with some people here at the SciPy conference, one of whom is a NumPy developer, and their opinion is that you should take advantage of this trivial case and of broadcasting and use the scalar answer returned to your advantage (it has less memory).

Original comment: http://code.google.com/p/sympy/issues/detail?id=2543#c2
Original author: https://code.google.com/u/asmeurer@gmail.com/

@asmeurer
Copy link
Member

**Status:** Valid  

Original comment: http://code.google.com/p/sympy/issues/detail?id=2543#c3
Original author: https://code.google.com/u/asmeurer@gmail.com/

@cknoll
Copy link
Contributor

cknoll commented Jun 30, 2017

I am affected by this behavior since years. Should I try to submit a PR?

I documented my perspective to the problem:
https://github.com/cknoll/demo-notebooks/blob/master/sympy-lambdify-vectorization-problem.ipynb

@asmeurer
Copy link
Member

Interesting. How would you propose to fix the problem. I wonder if there is a better numpy function you could use than the array constructor to merge the two arrays with broadcasting.

@asmeurer
Copy link
Member

This looks relevant.

@cknoll
Copy link
Contributor

cknoll commented Jul 4, 2017

That link solved my problem. Thank you! Still I consider this as workaround. IMHO it would be cleaner if lambdify (optionally) would return a function which is vectorized independently of the expression (which might depend on runtime data).

Idea (not veryfied):
lambdify(args, expr, vectorized=True) could do the following:

  • store original shape of expr
  • convert expr into flat list
  • apply lambdify -> orig_lambda_fnc
  • define wrapper:

.

def lambdify_wrapper(*args):
    tmp_res = orig_lambda_fnc(*args) # returns a list
    # use args as reference for broadcasting
    tmp_res.extend(args)
    # use the explicit broadcasting trick from above
    bc_res = numpy.array(numpy.broadcast_arrays(tmp_res))
    # remove args
    # use reshape to restore original shape of expr + extra dimensions due to
    # array valued arguments
    return reshaped_res 

@asmeurer
Copy link
Member

asmeurer commented Jul 4, 2017

@bjodah @moorepants what do you think? Have you ever been affected by this issue?

@bjodah
Copy link
Member

bjodah commented Jul 5, 2017

Kind of, if lambdify was only used to generate functions returning numpy arrays it would be easier to fix in a general manner.

But consider also that expr in lambdify may be multiple expressions with varying shape (e.g. a tuple of a scalar, a vector and a matrix).

@a-price
Copy link

a-price commented Apr 2, 2018

Is there a recommended workaround for this issue? I'm attempting to plot some vector fields, and lambdify dies on fields with one dimension of constant value. (Both args and expr arguments to lambdify are sympy.Matrix-valued.)

@asmeurer
Copy link
Member

asmeurer commented Apr 2, 2018

Here is a simple example, for reference

>>> lambdify(Matrix([x, y]), Matrix([x, y]))(np.array([1, 2]), np.array([2, 3]))
array([[[1, 2]],

       [[2, 3]]])
>>> lambdify(Matrix([x, y]), Matrix([x, 2]))(np.array([1, 2]), np.array([2, 3]))
array([[array([1, 2])],
       [2]], dtype=object)

@asmeurer
Copy link
Member

asmeurer commented Apr 2, 2018

I think to fix this we just need to modify the NumPyPrinter for Matrix. Right now it just prints it directly as an array:

def _print_MatrixBase(self, expr):
func = self.known_functions.get(expr.__class__.__name__, None)
if func is None:
func = self._module_format('numpy.array')
return "%s(%s)" % (func, self._print(expr.tolist()))

But it needs to be smarter. I'm not sure what the proper way to print this in general should be.

As a workaround, you might be able to use a wrapper. See the above comments.

@a-price
Copy link

a-price commented Apr 2, 2018

Unfortunately, I'm not able to get a jagged array out of lambdify in order to do the broadcasting recommended earlier, as it crashes during the evaluation. Here's a short working example:

import sympy as sp
import numpy as np
import matplotlib.pyplot as plt

def draw_vector_field(f, x, **kwargs):
    sim_dim = 20j
    X2, X1 = np.mgrid[-2:2:sim_dim, -2:2:sim_dim]

    # Create a computational function from the symbolic one
    f_comp = sp.lambdify(x, f)

    # Compute the numerical values of the function
    F = f_comp(X1, X2)
    print(F.shape)
    F1, F2 = F[0], F[1]

    # Eliminate excess dimension of size 1
    F1 = np.squeeze(F1, axis=0)
    F2 = np.squeeze(F2, axis=0)
   
    fig0, ax0 = plt.subplots()
    ax0.quiver(X1, X2, F1, F2, **kwargs)

x = sp.Matrix(sp.symbols('x_0, x_1'))
f1 = sp.Matrix([[x[0]],[x[1]]])
f2 = sp.Matrix([[0],[-x[1]]])

# This works
draw_vector_field(f1, x)
plt.show()

# This fails
draw_vector_field(f2, x)
plt.show()

Outputs:

(2, 1, 20, 20)
Traceback (most recent call last):
  File "test_code.py", line 33, in <module>
    draw_vector_field(f2, x)
  File "test_code.py", line 13, in draw_vector_field
    F = f_comp(X1, X2)
  File "<string>", line 1, in <lambda>
ValueError: setting an array element with a sequence.

I guess one workaround might be to loop through all entries of f and lambdify them individually?

@retsyo
Copy link

retsyo commented Aug 27, 2018

I met this problem today, and the search lead me here. Is there any conclusion? thanks
my demo is a circle in 3D, where z should be array of 0, but the following code

import numpy as np
import sympy as sp


def lambdify(symbols, expression):
    """Specialized lambdify routine avoiding the unused argument issue.
    A common sympy/python lambda expression 'f = lambda x: 2' has the
    problem that f(x) returns a float 2 even if x was a numpy array.
    Thus we loose all information on the shape of x.
    """
    if type(symbols) not in [type((1,)), type([])]:
        symbols = [symbols]

    f = sp.lambdify(symbols, expression, "numpy")

    if not all([ i in expression for i in symbols ]):
        f = np.vectorize(f)

    return f


t = sp.symbols('t', real=True)

x = sp.cos(t)
y = sp.sin(t)
z = t - t
R = sp.Matrix([x, y, z])


lambdaR = lambdify(t, R)

s = np.arange(0, 2*np.pi, np.pi/4)

cor = lambdaR(s)

print(cor)

says

Traceback (most recent call last):
  File "test_lambdify.py", line 34, in <module>
    cor = lambdaR(s)
  File "E:\prg\py\Anaconda3_64\lib\site-packages\numpy\lib\function_base.py", line 2576, in __call__
    return self._vectorize_call(func=func, args=vargs)
  File "E:\prg\py\Anaconda3_64\lib\site-packages\numpy\lib\function_base.py", line 2655, in _vectorize_call
    res = array(outputs, copy=False, subok=True, dtype=otypes[0])
ValueError: setting an array element with a sequence.

@jjramsey
Copy link

I got caught by this issue recently myself. I used lambdify on a symbolic expression of a gradient vector, that is I did something like this:

my_grad = sympy.zeros(len(params), 1)
for i in range(len(params)):
    my_grad[i] = sympy.diff(my_func, params[i])

If all elements of my_grad are functions, then lambdifying my_grad leads to a function that, when given vector inputs of length N, returns an array of dimensions len(params) x 1 x N. However, if one of the elements is a constant, then lambdifying my_grad leads to a function that, when given vector inputs of length N, returns a ragged array where the "row" corresponding to the constant is a single value, while the other rows are arrays of length N. This is painfully inconsistent.

@oscarbenjamin
Copy link
Contributor

oscarbenjamin commented Apr 23, 2019

Judging from the comments here (over many years) it seems that this is an ongoing problem for many users.

I don't really know what lambdify is or what it is used for though. The docs say

The primary purpose of this function is to provide a bridge from SymPy expressions
to numerical libraries such as NumPy, SciPy, NumExpr, mpmath, and tensorflow

https://docs.sympy.org/latest/modules/utilities/lambdify.html#sympy.utilities.lambdify.lambdify

However @bjodah's comment above makes it seem like we can't just assume we are dealing with numpy which makes it difficult to implement a numpy-specific fix.

Does anyone have a clear idea of what the fix should be?

@asmeurer
Copy link
Member

asmeurer commented Nov 9, 2020

This is also related to #18824

@asmeurer
Copy link
Member

asmeurer commented Nov 9, 2020

I wonder if lambdify should use a wrapper like the one given by @rouckas automatically whenever numpy is a module. lambdify is supposed to create a NumPy-like function, and NumPy functions are expected to broadcast their return value based on the input shapes. My concern is that it would probably affect performance. We already had performance issues in the past by putting a wrapper in lambdify to automatically convert the input into an array, which we had to revert. So we should check the lambdify benchmarks in the benchmark suite. If it the slowdown is too much, then I don't think it's worth changing. Actually the wrapper would only need to go on functions that are constant, but I think performance still is something to consider.

@EmilMadsen90
Copy link

I got caught by this issue as well. With

a, b = sympy.symbols(“a b”)
M = sympy.Matrix([[a, a], [b, b]])
N = sympy.Matrix([[a, a], [b, 0]])
f = sympy.lambdify([a, b], M, 'numpy')
g = sympy.lambdify([a, b], N, 'numpy')
a_num = numpy.array([1.0, 2.0])
b_num = numpy.array([1.0, 2.0])

doing f(a_num, b_num) gives

array([[[1., 2.],
        [1., 2.]],
       [[1., 2.],
        [1., 2.]]])

while doing g(a_num, b_num) gives

<lambdifygenerated-3>:2: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  return (array([[a, a], [b, 0]]))
array([[array([1., 2.]), array([1., 2.])],
       [array([1., 2.]), 0]], dtype=object)

Any suggested workaround?
The broadcast workaround by @rouckas unfortunately does not work - it gives the same error as above.

@rouckas
Copy link
Contributor

rouckas commented Feb 3, 2021

@EmilMadsen90 The problem in your case is in the definition of N

N = sympy.Matrix([[a, a], [b, 0]])

I assume that by 0 you mean a matrix with the same size as b filled with zeros? In numpy, I would just use zeros_like(b) or b*0, but sympy doesn't know that symbol b will be a matrix and b*0 gets evaluated to scalar 0. To prevent this evaluation, you can write b*0 as sympy.Mul(b, 0, evaluate=False). So, my solution would be to define N as
N = sympy.Matrix([[a, a], [b, sympy.Mul(b, 0, evaluate=False)]])

@EmilMadsen90
Copy link

EmilMadsen90 commented Feb 3, 2021

Thanks for the fast answer. I see that my examble was maybe a bit too simple based on what I'm actually doing. Sorry for that.

I assume that by 0 you mean a matrix with the same size as b filled with zeros?

b is a symbol/scalar, not a matrix. I want to evaluate g (the lambdified N) in a set of numerical values for a and b, namely a_num and b_num. Then I want to concatenate the numerical matrices for each entry/value in a_num and b_num along axis 0 in a big matrix.
In my actual work/usecase N is a matrix of (very long trigonometric) expressions obtained via partial differentiation of another matrix, and the location of zeros/constants in the matrix could change at runtime.

@rouckas
Copy link
Contributor

rouckas commented Feb 3, 2021

b is a symbol/scalar, not a matrix. I want to evaluate g (the lambdified N) in a set of numerical values for a and b, namely a_num and b_num. Then I want to concatenate the numerical matrices for each entry/value in a_num and b_num along axis 0 in a big matrix.

If I understand your description, you want to evaluate g(a_num[0], b_num[0]), g(a_num[1], b_num[1]).... and concatenate these matrices? Your code is actually creating a "block" matrix res = [[a_num, a_num], [b_num, b_num]], where a and b are arrays, but numpy is clever and when it sees that a_num and b_num have the same dimension, it turns res into a 3D array, so you can get the first matrix for a_num[0], b_num[0] as res[:,:,0], etc...

I see two possible solutions: first, you can apply the "hack" I suggested before to each element of N as follows:
N = N.applyfunc(lambda x:x+sympy.Mul(b, 0, evaluate=False))

or, I would suggest reorganizing the arrays so that f and g are always applied to pairs of values and use numpy.vectorize to do the vectorization.

a, b = sympy.symbols("a b")
M = sympy.Matrix([[a, a], [b, b]])
N = sympy.Matrix([[a, a], [b, 0]])

f = sympy.lambdify([(a, b)], M, 'numpy')
g = sympy.lambdify([(a, b)], N, 'numpy')

fv = numpy.vectorize(f, signature="(m)->(m,n)")
gv = numpy.vectorize(g, signature="(m)->(m,n)")

a_num = numpy.array([1.0, 2.0, 3.0])
b_num = numpy.array([1.0, 2.0, 3.0])
ab = numpy.vstack((a_num, b_num)).T

now gv(ab) gives

array([[[1., 1.],
        [1., 0.]],

       [[2., 2.],
        [2., 0.]],

       [[3., 3.],
        [3., 0.]]])

so the first matrix is at index gv(ab)[0], which seems more logical to me.

@oscarbenjamin
Copy link
Contributor

I guess that the reason this issue remains unresolved is that it is not generally possible to apply numpy's broadcasting rules in lambdify. Otherwise the broadcast suggestion about would be the obvious approach:

def broadcast(fun):
    return lambda *x: numpy.broadcast_arrays(fun(*x), *x)[0]

That should solve all the simple cases where lambdify is used to apply a scalar function to an array of values efficiently which I think is what most complaints above are about.

The problem though is that lambdify is not always used to broadcast a scalar function and we can have things like:

In [17]: X = IndexedBase('X')

In [18]: expr = X[1] + X[2]

In [19]: expr
Out[19]: X[1] + X[2]

In [20]: f = lambdify(X, expr, modules='numpy')

In [21]: A = np.array([1, 2, 3])

In [22]: f(A)
Out[22]: 5

Here although the input is an array of shape (3,) the output is a scalar. Likewise we also have things like:

In [31]: B = MatrixSymbol('B', 3, 1)

In [32]: expr = B.T * B

In [33]: f = lambdify(B, expr, modules='numpy')

In [34]: f(np.array([1, 2, 3]))
Out[34]: 14

In [35]: f(np.array([[1], [2], [3]]))
Out[35]: array([[14]])

This means that where we allow an array to be substituted for an IndexedBase or a MatrixSymbol it is no longer possible for the lambdified function to infer the shape of the outputs purely from the shape of the inputs.

This mixed interpretation of array inputs as sometimes being for broadcasting a scalar expression and other times to replace a non-scalar component of the expression seems quite poorly defined to me. I'm sure that there must be other edge cases where it isn't really possible to come up with a unique sensible choice of output. To me it would seem cleaner if there were separate functions for broadcasting scalar expressions vs substituting arrays for non-scalars rather than having both actions in the same function lambdify.

@asmeurer
Copy link
Member

asmeurer commented Feb 3, 2021

The matrix question has come up before #19259. I wonder if there is something we can do to improve that, or at least document the correct way to achieve it.

@rlaugier
Copy link

With my use of lambdify in the past years I'm surprised not to have ran into this issue sooner. Now that I have, and find no practical workaround (for my application), I am extremely frustrated.
I agree with the conclusion of @oscarbenjamin that this seems to call for separate functions, or at least an option in the lambdify call.
In the ideal case to get the best of both worlds it would leave the user with the choice of which variable would broadcast/vectorize and which are instead subscriptable in the expression... To be honest I am (almost) surprised by the existence of this possibility of subscripting because I find that broadcasting your mathematical expression to datasets is the main appeal of lambdify.
This was just my input and suggestion as a user.

@oscarbenjamin
Copy link
Contributor

Does the broadcast workaround not work?

def broadcast(fun):
    return lambda *x: numpy.broadcast_arrays(fun(*x), *x)[0]

@oscarbenjamin
Copy link
Contributor

Maybe if all symbols given to lambdify are ordinary symbols then a broadcast could be used.

@akirakyle
Copy link
Contributor

Since I've been living with this issue for a little bit, I thought I'd share one of my work-arounds for the matrix case as a third alternative to the two already shared by @rouckas

def lambdify_matrix(M, syms, xs):
    assert(isinstance(M, Matrix))
    return np.block([[[np.broadcast_to(lambdify(syms, M[i,j])(xs), xs.shape)]
                      for j in range(M.cols)] for i in range(M.rows)])

The second of @rouckas isn't really usable for me due to the performance impact of using np.vectorize, especially since for me it's usually that the sympy matrix dimensions are small while the datasize is large. This issue became a lot less frustrating when I realized that sympy essentially just calls lambdify on each matrix element and then returns np.array() of the result so it's not so different than the above function and should have similar performance.

@rlaugier
Copy link

Thanks @oscarbenjamin for this quick answer. Here is a minimal example for my application:

import numpy as np
import sympy as sp
sigma = sp.symbols("sigma")
mymatrix = sp.Matrix([[1, 0, sigma],
                     [0, 1, 1+sigma]])
f = sp.lambdify(sigma, mymatrix, modules="numpy")
sigmas = np.linspace(-10,10, 100)
b = f(sigmas)

Here it just builds a collection of parametric matrices, which I would ultimately want in a (n,m,nx) or (nx,n,m) array. As written here, I end up with an array of objects containing some integers and some arrays of floats. The simple broadcast workaround fails when the function is called because of dimensions mismatch.
Luckily @akirakyle seems to have read my mind, and his workaround works in my case. Yet, since the lambdified functions for each cell is built, applied and discarded at each call, I'm not sure if it would scale nicely for multiple calls on complicated expressions.
A while back, I built a little class that did a similar thing but stored the functions, and provided a __call__ method that would build the appropriate result. Back then it was to smooth-out some discrepancies between the behaviors of "numpy" and "numexpr" lambdified functions when called on 1D sympy matrices. I suspect if I recycle this structure, upgrade it to 2D, and use @akirakyle 's solution inside, it may work and scalability should be improved. I'll try to share this here when (if) I have the opportunity to do it.
Thanks for the support.

@asmeurer
Copy link
Member

It should be possible to make this work by modifying the lambdify printer for Matrix so it gives @akirakyle's code (except just include the recursive lambdified printed code for each element).

@rlaugier
Copy link

I tried to implement a class like I suggested 20 days ago, but the result is disappointing. It doesn't really scale to something with more than one variable, and may become complicated.
However, I would like to suggest a different avenue for a workaround. The whole problem originates from the fact that sympy automatically simplifies the expressions, leading to the disappearance of the symbols that need to be broadcasted. Instead of that, maybe we should try to have clever ways to force sympy to preserve the mention of those symbols in the expression throughout the process. I have tweaked my example to demonstrate a way to do that "from the outside" by adding a free symbol:

import numpy as np
import sympy as sp
sigmas = np.linspace(-2,2, 100)
sigma = sp.symbols("sigma")
mymatrix = sp.Matrix([[1, 0, sigma],
                     [0, 1, 1+sigma]])
#What we want to do:
f = sp.lambdify(sigma, mymatrix, modules="numpy")
b = f(sigmas)
#What we do instead:
z = sp.symbols("z")
mymatrixz = mymatrix + z*sp.prod(mymatrix.free_symbols)*sp.ones(mymatrix.shape[0], mymatrix.shape[1])
fz = sp.lambdify((sigma, z), mymatrixz, modules="numpy")
bz = fz(sigmas, 0)
bz

This approach seems to work and appears to be easier to implement for a flexible context.
I wonder if there could be a less barbarian way to force sympy to preserve some (or all) free symbols in specific situations?

@oscarbenjamin
Copy link
Contributor

The whole problem originates from the fact that sympy automatically simplifies the expressions, leading to the disappearance of the symbols that need to be broadcasted.

I'm not sure I understand what you mean. The title of the issue is about constant functions like lambdify((x,), 2).

@asmeurer
Copy link
Member

I think the point is that in practice, people are using lambdify(x, expr) where expr is generated in some way, where it is usually nonconstant but it sometimes can cancel out and be constant.

It's true if you do something like lambdify(x, Add(expr, x, -x, evaluate=False)), or something similar when expr is a matrix, then it will always produce the same value but the presence of the cancelled x in the lambdified function will force the the function to broadcast to the input shape the same as if it weren't constant. This is probably a viable workaround for many user codes, assuming the extra x - x computation doesn't bother you. I wouldn't really consider it a good fix for the problem to implement in lambdify itself, however.

@asmeurer
Copy link
Member

The primary issue here, as I understand it, is in lambdifying matrices, where the variables in the matrix are themselves are arrays. When array() gets an input that is a list of lists of arrays, if the arrays are all the same shape, it treats the arrays the same as if it were a normal list of lists. But if the arrays have different dimensions, it creates a ragged dtype=object array, and also spits out a deprecation warning. For example:

>>> import numpy as np
>>> np.array([np.array([1, 2]), np.array([3, 4])])
array([[1, 2],
       [3, 4]])
>>> np.array([np.array([1, 2]), 3])
<mypython-5>:1: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
  np.array([np.array([1, 2]), 3])
array([array([1, 2]), 3], dtype=object)

When SymPy lambdifies something like Matrix([[x, 1], [1, 0]]), it just replaces "Matrix" with "array":

>>> import inspect
>>> print(inspect.getsource(lambdify(x, Matrix([[x, 1], [1, 0]]))))
def _lambdifygenerated(x):
    return (array([[x, 1], [1, 0]]))

What we really want is to broadcast each list element so that it treats it in the first way.

I think the issue here is the array that is used for the lambdification of Matrix. The NumPy array constructor doesn't work in the way that we want it to. We really want all the array inputs to be broadcast together, so that the function works just as well for array and constant inputs, regardless of whether some entries are broadcast.

So we should either:

  • modify the Matrix printer to print something more complex with broadcast_to so that it does this, or
  • replace array in the NumPy lambdify namespace with something that does this.

Personally I think the former option is better for a fix in SymPy, although for a user workaround until this is fixed, the latter is the best approach. SymPy also is capable of detecting if the inputs are constant or not with respect to inputs, so it can simplify the broadcasting calls to only those entries that need it. For example, for Matrix([[x, 1], [1, 0]]), the only input variable is x, and it knows the 1 and 0 entries are constants, so it can print something like

array([[x, full_like(x, 1)], [full_like(x, 1), full_like(x, 0)]])

In general, it would need to broadcast the inputs to get the shape for the constant expressions like

shape = broadcast([x1, x2, ...]).shape # x1, x2, ... are the inputs to the lambdified function
return array(...) # Where ... uses full(..., shape) for constant inputs, or broadcast_to for inputs that use only a subset of the input variables

Things to get more complicated when you have multiple variables, because some expressions might have only a subset of them, but those expressions should still be broadcast to the shape for all the input variables.

One potential issue I see with this is that it assumes that the input always must be broadcast to a common shape. But I'm not sure if there might be cases when you wouldn't want this. For example, if the input is a completely constant matrix, it might be surprising if f(k-d array) adds k dimensions to the output instead of just returning the n x m matrix. Then again, in that case, things actually do broadcast, so maybe it isn't a huge deal (*). We may need to make this option configurable somehow.

(*) Except for the n x m matrices created in this way, you end up with shapes like (n, m, ..., k), where for broadcasting, as well as operations like dot, to work, you'd generally want to have the matrix dimensions last, like (k, ..., n, m). I'm a little curious how the people using lambdify like this handle this.

@rlaugier
Copy link

Hi @asmeurer , and thank you for all the effort you spend on this issue.
Indeed this deprecation warning is pushing us to find a solution for this issue (which has been open for a while now...). Yet even though it is most visible in Matrix form, I think it is also a problem when the expression is not a matrix. Consider the following example where the expression is scalar but the input are of arbitrary shapes:

x,y,z = sp.symbols("x, y, z")
xs = np.arange(10)
ys = np.arange(20)
zs = 10.
myfunction_xz = sp.lambdify((x,y,z), x*z, modules="numpy")
myfunction_xyz = sp.lambdify((x,y,z), x*y*z, modules="numpy")
a =  myfunction_xz(xs[:,None,],ys[None,:],zs)
b = myfunction_xyz(xs[:,None,],ys[None,:],zs)

This currently works great as long as all symbols appear in the expression. But this situation a numpy user would expect a and b to have the same shape. Knowing the signature of the ufunc and the shape of the inputs you give it should be sufficient to know the shape of the output. This should be true regardless of the the details of the expression and whether it is - or is not - simplified.
This is the whole appeal of sympy for me: I want to be able to manipulate my expression in all sorts of crazy ways, yet when I finally lambdify it I want it to behave in a predictable and consistent manner.
Ideally I think the output of the printer should behave just as if all the args provided to sp.lambdify() did exist in the expression, because this is what defines the signature of the ufunc. For this purpose, (following my previous suggestion), maybe it (the printer) could add a + 0*y or + y-y to its output iff y is not one of the free symbols of the expression (or cell). From the little I know about the printer, I think this could fix the whole issue in the least invasive way, because everything seems to already work just fine when all the arguments are present as free symbols in the expression. (For information + 0*y seems a bit faster in numpy for some reason) Incidentally, this also applies when lambdifying to numexpr, and the same fix would work solve that too.

I hope this helps.
P.S.

(*) Except for the n x m matrices created in this way, you end up with shapes like (n, m, ..., k), where for broadcasting, as well as operations like dot, to work, you'd generally want to have the matrix dimensions last, like (k, ..., n, m). I'm a little curious how the people using lambdify like this handle this.

You make a good point here. I think this currently ends up (n, m, ..., k), while I think the expected result in numpy would be (k, ..., n, m). I have been checking the shapes of my outputs and using np.swapaxes in such situations. It is annoying but I'm guessing it is different issue. Regardless, the numpy user does expect the k dimension somewhere, and not just (n, m), even if it is constant over that new dimension.

@asmeurer
Copy link
Member

This currently works great as long as all symbols appear in the expression. But this situation a numpy user would expect a and b to have the same shape. Knowing the signature of the ufunc and the shape of the inputs you give it should be sufficient to know the shape of the output. This should be true regardless of the the details of the expression and whether it is - or is not - simplified.

The reason I don't see this as an issue is that in this case, things should broadcast correctly. The resulting shapes of a and b in your example are (10, 1) and (10, 20), resp. Those shapes are broadcastable together, so any NumPy operation that supports broadcasting should work just fine with a and b as they are. If you really need them to be the same shape, you can use broadcast_arrays on them.

I suppose you are expecting here for lambdify to act like a ufunc and always broadcast its inputs. If you use ufuncify, it does work like this:

>>> from sympy.utilities.autowrap import ufuncify
>>> ufunc = ufuncify((x, y, z), x*z)
>>> ufunc(xs[:,None,],ys[None,:],zs).shape
(10, 20)

Ideally I think the output of the printer should behave just as if all the args provided to sp.lambdify() did exist in the expression, because this is what defines the signature of the ufunc. For this purpose, (following my previous suggestion), maybe it (the printer) could add a + 0y or + y-y to its output iff y is not one of the free symbols of the expression (or cell). From the little I know about the printer, I think this could fix the whole issue in the least invasive way, because everything seems to already work just fine when all the arguments are present as free symbols in the expression. (For information + 0y seems a bit faster in numpy for some reason) Incidentally, this also applies when lambdifying to numexpr, and the same fix would work solve that too.

I would expect printing the appropriate broadcast operations would be faster. Note that the performance in lambdify does matter. Something that can slow down every lambdify call by a significant amount should not be considered lightly. See for instance #14671, where we had at one point called array on all lambdify inputs to fix some other issues, but removed it because of the performance issues it caused. That's why this might need to be considered a optional feature, or at best, something that is only automatically enabled when the printer detects it is really needed, like the matrix case.

@smichr smichr removed the Valid label Nov 19, 2021
wshanks added a commit to wshanks/qiskit-terra that referenced this issue Mar 17, 2022
This works around an issue with sympy where its lambda returns a scalar
instead of an array when the expression evaluates to a scalar:

sympy/sympy#5642
mergify bot pushed a commit to Qiskit/qiskit that referenced this issue Jun 15, 2022
* Symbolic parametric pulse

This PR makes following changes.

1. add `_define` method to ParametricPulse.

This method is expected to be implemented by each ParametricPulse subclass. Subclass must return Sympy or symengine symbolic equation that can be serialized. This method is called once in the constructor to fill `.definition` attribute of the instance. This definition is used for QPY serialization.

2. change behavior of `get_waveform`

This method is originally implemented as abstractmethod that calls another callback that generates numpy array of samples (thus not serializable). Now this is a baseclass method that generates waveform array from the symbolic equation.

3. minor updates

Now pulse parameters are not direct class attribute. Parameter names are defined as class attribute `PARAMS_DEF` and values are stored as instance variable `param_values`. This is good for writing serializer since it doesn't need to know attribute of each subclass, but just can call name def and values to get data to serialize.

* Improve performance of parametrized pulse evaluation

* Implement Constant pulse using Piecewise

This works around an issue with sympy where its lambda returns a scalar
instead of an array when the expression evaluates to a scalar:

sympy/sympy#5642

* Fall back to sympy for parametric pulse evaluation

* Use sympy Symbol with sympy lambdify

* WIP: cache symbolic pulse lambda functions

* move lambdify to init subclass for called once

* turn parameter properties into attribute and remove slots

* remove attribute and add __getattr__ and readd slots for better performance.

* move symbolic funcs directly to _define

* Convert numerical_func to staticmethod

Co-authored-by: Will Shanks <willshanks@us.ibm.com>

* remove symbolic add constraints, numerical func is renamed to definition for better integration with circuit instruction

* revert changes to parametric pulse and create new symbolic pulse file

* add ITE-like program

* sympy runtime import

* update test notebook

* add attribute docs

* fix non-constraints

* remove pulse type instance variable

* use descriptor

* remove redundant comment

* Update
- Add symbolic pulse to assemble
- Add symbplic pulse to parameter manager
- Remove risefall ratio from GaussianSquare paramters

* keep raw data for QPY encoding

* update unittest

* update helper function name

* add reno

* remove notebook

* fix lint

* fix unittest and logic

* add more docs

* review comments

* lint fix

* fix documentation

* minor drawer fix

* documentation upgrade

Co-authored-by: Daniel J. Egger <38065505+eggerdj@users.noreply.github.com>

* review comment misc

Co-authored-by: Will Shanks <willshanks@us.ibm.com>

* remove abstract class methods

This commit removes class methods for symbolic expression so that SymbolicPulse can be instantiated by itself. And descriptors only saves Dict[Expr, Callable], i.e. lambda cache, rather than enforcing name-based mapping.

* add error handling for amplitude

* treat amp as a special parameter

* Remove expressions for amplitude validation

* support symengine
- symengine doesn't support lamdify of complex expression, e.g. DRAG pulse
- symengine doesn't support Less expression with complex value substitution
- symengine Lambda function doesn't support args in mixture of complex and float object

To overcome these, envelope expressions are separately stored for I, Q channel, and evaluation of 'amp' is excluded from symbolic operation. Thus in this implementation symbols are all real numbers.

* use real=False option

* review comment

Co-authored-by: Will Shanks <willshanks@us.ibm.com>

* - fix attribute
- duration and amp are required
- instantiate subclass with expression
- remove __dict__ usage in __getattr__ and readd __slots__
- fix documentation
- update private attirbute names

* undo change to requirements-dev

* fix __getattr__ mechanism

Directly accessing to the instance variable crashes copy and deepcopy because when the object is copied the __init__ is not called before the first getattr is called. Then copied instance tries to find missing (not initialized) attribute in recursive way. use of __getattribute__ fixes this problem.

* fix type hint reference

* simplification

* move amp from constructor to `parameters` dict

* review comment

Co-authored-by: Will Shanks <willshanks@us.ibm.com>

* fix bug

- use getattr in __eq__; e.g. Waveform doesn't have envelope
- fix parameter maanger

in addition, this replaces _param_vals and _param_names tuple with _params dictionary because overhead of generating a dict is significant.

* fix typo

* add eval_conditions to skip waveform generation

* fall back to sympy lamdify when function is not supported

* documentation update

Co-authored-by: Will Shanks <willshanks@us.ibm.com>
Co-authored-by: Daniel Egger <38065505+eggerdj@users.noreply.github.com>

* replace eval_conditions with valid_amp_conditions

Co-authored-by: Will Shanks <willshanks@us.ibm.com>

* update hashing and equality, redefine expressions more immutably

* add error message for missing parameter

* cleanup

* check parameter before hashing

* move amp check to constructor

* add envelope to hash

* update docs

* Update qiskit/pulse/library/symbolic_pulses.py

Co-authored-by: Matthew Treinish <mtreinish@kortar.org>

* lint

Co-authored-by: Will Shanks <willshanks@us.ibm.com>
Co-authored-by: Daniel J. Egger <38065505+eggerdj@users.noreply.github.com>
Co-authored-by: Matthew Treinish <mtreinish@kortar.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests