Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

scipy.optimize.curve_fit leads to unexpected behavior when input is a standard python list #3037

Closed
thoppe opened this issue Nov 1, 2013 · 10 comments
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.optimize
Milestone

Comments

@thoppe
Copy link

thoppe commented Nov 1, 2013

Inspired by the Stack Overflow question:

http://stackoverflow.com/q/19713689/249341

A minimal working example of the potential pitfall:

import numpy as np
import scipy.optimize

x = [1,2,3,4]
y = [2,4,6,8]

def f(x,a,b): return a*x + b

print "Expected: ", scipy.optimize.curve_fit(f, np.array(x), y)[0]
print "Observed: ", scipy.optimize.curve_fit(f, x, y)[0]

Gives:

Expected:  [  2.00000000e+00  -4.44089210e-16]
Observed:  [ 1.   2.5]

It's clear what is going on here, when x is a standard python list a*x makes a copies of that list. For someone new to python, and is working primarily with the scipy/numpy stack this is an extremely strange and unexpected behavior (I've watched more than one person struggle with this).

Suggestions:

1] Is there any reason not to cast python lists into numpy arrays? If not, then automatically cast on input
2] Check the size of 'x', via len. If it changes size during optimization, perhaps give a warning?
3] At the very least, update the docs to know that there won't be an upcast?

@josef-pkt
Copy link
Member

from what I remember

1] curve_fit doesn't directly care about what x is, it could be anything (an instance of a class that holds some data)
the
2] y doesn't change size, if the return changed size, then we get an exception about shape mismatch.
there is something fishy, most likely the starting value == 1
3] doc improvements are always good

I'm trying to see what's going on with the starting value.

@josef-pkt
Copy link
Member

Only a starting value == 1.0 can cause this result. Anything else ends up with a shape mismatch.

>>> popt,pcov=opt.curve_fit(f,x,y, p0=[0.99, 10])
Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    popt,pcov=opt.curve_fit(f,x,y, p0=[0.99, 10])
  File "C:\Programs\Python27\lib\site-packages\scipy\optimize\minpack.py", line 515, in curve_fit
    res = leastsq(func, p0, args=args, full_output=1, **kw)
  File "C:\Programs\Python27\lib\site-packages\scipy\optimize\minpack.py", line 354, in leastsq
    shape, dtype = _check_func('leastsq', 'func', func, x0, args, n)
  File "C:\Programs\Python27\lib\site-packages\scipy\optimize\minpack.py", line 17, in _check_func
    res = atleast_1d(thefunc(*((x0[:numinputs],) + args)))
  File "C:\Programs\Python27\lib\site-packages\scipy\optimize\minpack.py", line 427, in _general_function
    return function(xdata, *params) - ydata
ValueError: operands could not be broadcast together with shapes (0) (8) 

@rgommers
Copy link
Member

This should be fixed in curve_fit imho, can be done by checking if input is a sequence and converting to ndarray in that case. While technically one could make things work as is with lists by putting an asarray call in func, this is obscure and slow. curve_fit is documented to take sequences, so that should work in an intuitive way.

@rgommers
Copy link
Member

xdata and ydata are documented as array_like, so just adding asanyarray calls should be OK I think. If not, it needs to be clearly defined what other objects can be used as input. Random classes won't work, because there's already a statement len(ydata) in there.

@josef-pkt
Copy link
Member

I don't think we should "fix" the automatic conversion, x can be anything that the user wants, a sequence of anything that the user wants to feed to her function.

With newer numpy we would just get useless object arrays with asarray or asanyarray.

Given the docstring and that I'm not using curve_fit, I don't care much if this reduction in functionality is introduced to make curve_fit a bit more "foolproof".

@josef-pkt
Copy link
Member

ydata has to be the target data array_like, because curve_fit is calculating the error itself ydata - f(xdata, *params).

But xdata is just transported to the function, and not used by curve_fit itself.

@rgommers
Copy link
Member

At least we can do something like this then:

ydata = np.asanyarray(ydata)
if isinstance(xdata, (list, tuple)):
    xdata = np.asarray(xdata)

As well as fix the docs to state that xdata and ydata can be of different types.

@pv
Copy link
Member

pv commented Nov 10, 2013

I'd vote for being foolproof over allowing for exotic use cases with non-numeric x and y. If you want to pass extra parameters, you can use args.

@josef-pkt
Copy link
Member

If you want to pass extra parameters, you can use args.

I don't see how passing extra args is possible. args is set by curve_fit args = (xdata, ydata, f)

curve_fit is a convenience function, so foolproof high priority

@rgommers
Copy link
Member

Proposed fix in gh-3166.

dlax added a commit that referenced this issue Dec 26, 2013
BUG: make curve_fit() work with array_like input.  Closes gh-3037.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
defect A clear bug or issue that prevents SciPy from being installed or used as expected scipy.optimize
Projects
None yet
Development

No branches or pull requests

4 participants