scipy.optimize.curve_fit leads to unexpected behavior when input is a standard python list #3037

Closed
thoppe opened this Issue Nov 1, 2013 · 10 comments

Comments

Projects
None yet
4 participants

thoppe commented Nov 1, 2013

Inspired by the Stack Overflow question:

http://stackoverflow.com/q/19713689/249341

A minimal working example of the potential pitfall:

import numpy as np
import scipy.optimize

x = [1,2,3,4]
y = [2,4,6,8]

def f(x,a,b): return a*x + b

print "Expected: ", scipy.optimize.curve_fit(f, np.array(x), y)[0]
print "Observed: ", scipy.optimize.curve_fit(f, x, y)[0]

Gives:

Expected:  [  2.00000000e+00  -4.44089210e-16]
Observed:  [ 1.   2.5]

It's clear what is going on here, when x is a standard python list a*x makes a copies of that list. For someone new to python, and is working primarily with the scipy/numpy stack this is an extremely strange and unexpected behavior (I've watched more than one person struggle with this).

Suggestions:

1] Is there any reason not to cast python lists into numpy arrays? If not, then automatically cast on input
2] Check the size of 'x', via len. If it changes size during optimization, perhaps give a warning?
3] At the very least, update the docs to know that there won't be an upcast?

Member

josef-pkt commented Nov 1, 2013

from what I remember

1] curve_fit doesn't directly care about what x is, it could be anything (an instance of a class that holds some data)
the
2] y doesn't change size, if the return changed size, then we get an exception about shape mismatch.
there is something fishy, most likely the starting value == 1
3] doc improvements are always good

I'm trying to see what's going on with the starting value.

Member

josef-pkt commented Nov 1, 2013

Only a starting value == 1.0 can cause this result. Anything else ends up with a shape mismatch.

>>> popt,pcov=opt.curve_fit(f,x,y, p0=[0.99, 10])
Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    popt,pcov=opt.curve_fit(f,x,y, p0=[0.99, 10])
  File "C:\Programs\Python27\lib\site-packages\scipy\optimize\minpack.py", line 515, in curve_fit
    res = leastsq(func, p0, args=args, full_output=1, **kw)
  File "C:\Programs\Python27\lib\site-packages\scipy\optimize\minpack.py", line 354, in leastsq
    shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
  File "C:\Programs\Python27\lib\site-packages\scipy\optimize\minpack.py", line 17, in _check_func
    res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
  File "C:\Programs\Python27\lib\site-packages\scipy\optimize\minpack.py", line 427, in _general_function
    return function(xdata, *params) - ydata
ValueError: operands could not be broadcast together with shapes (0) (8) 
Owner

rgommers commented Nov 10, 2013

This should be fixed in curve_fit imho, can be done by checking if input is a sequence and converting to ndarray in that case. While technically one could make things work as is with lists by putting an asarray call in func, this is obscure and slow. curve_fit is documented to take sequences, so that should work in an intuitive way.

Owner

rgommers commented Nov 10, 2013

xdata and ydata are documented as array_like, so just adding asanyarray calls should be OK I think. If not, it needs to be clearly defined what other objects can be used as input. Random classes won't work, because there's already a statement len(ydata) in there.

Member

josef-pkt commented Nov 10, 2013

I don't think we should "fix" the automatic conversion, x can be anything that the user wants, a sequence of anything that the user wants to feed to her function.

With newer numpy we would just get useless object arrays with asarray or asanyarray.

Given the docstring and that I'm not using curve_fit, I don't care much if this reduction in functionality is introduced to make curve_fit a bit more "foolproof".

Member

josef-pkt commented Nov 10, 2013

ydata has to be the target data array_like, because curve_fit is calculating the error itself ydata - f(xdata, *params).

But xdata is just transported to the function, and not used by curve_fit itself.

Owner

rgommers commented Nov 10, 2013

At least we can do something like this then:

ydata = np.asanyarray(ydata)
if isinstance(xdata, (list, tuple)):
    xdata = np.asarray(xdata)

As well as fix the docs to state that xdata and ydata can be of different types.

Owner

pv commented Nov 10, 2013

I'd vote for being foolproof over allowing for exotic use cases with non-numeric x and y. If you want to pass extra parameters, you can use args.

Member

josef-pkt commented Nov 10, 2013

If you want to pass extra parameters, you can use args.

I don't see how passing extra args is possible. args is set by curve_fit args = (xdata, ydata, f)

curve_fit is a convenience function, so foolproof high priority

Owner

rgommers commented Dec 20, 2013

Proposed fix in gh-3166.

@rgommers rgommers closed this in 856e396 Dec 26, 2013

dlax added a commit that referenced this issue Dec 26, 2013

Merge pull request #3166 from rgommers/curvefit-list-input
BUG: make curve_fit() work with array_like input.  Closes gh-3037.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment