Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fewer checks on xdata for curve_fit. #10196

Merged
merged 1 commit into from Jun 18, 2019
Merged

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented May 20, 2019

The xdata argument to curve_fit is essentially a helper in the case
where one wants to write the fit function as func(x, *params). Until
scipy 1.3.0, it was possible to just pass a dummy value (e.g. None) as
xdata, e.g.

curve_fit(lambda _, a: a * np.arange(10), None, 2 * np.arange(10))

A comment in the implementation clearly states that the intent was for
xdata to be anything, including a non-array:

    if isinstance(xdata, (list, tuple, np.ndarray)):
        # `xdata` is passed straight to the user-defined `f`, so allow
        # non-array_like `xdata`.

(Effectively, this is an interface closer to the one of least_squares,
with the advantage that the covariance matrix is also returned.)

1.3.0 added two checks that make this impossible: it now requires that
xdata is not empty, and casts it to float64 for improved precision.

The non-emptiness check is spurious: it was introduced at the same time
as a check that ydata not be empty (which is necessary to avoid a
confusing error message, and is reasonable). This PR removes it.

The cast to float64 can be applied only if xdata is indeed an
array-like, which this PR does.

xref #9893, #10076.

Copy link
Contributor

@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll let the resident optimize experts chime in with more meaningful comments, but I just did a quick pass through the line coverage data in Azure in case that is helpful.

Not sure it is crucial to add more tests for what sounds like a simple-enough adjustment though. Probably a good sign that no old tests have been modified, and just a new test added.

else:
ydata = np.asarray(ydata)
ydata = np.asarray(ydata, float)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this code path is never hit by a test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this was already the case before, wasn't it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, my point is that confidence & safety in code changes drops for lines that aren't covered by tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure how you would go about testing this, apart from trying to fit a dataset with a different dtype and seeing if it failed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I modified another test to exercise that code path.

else:
xdata = np.asarray(xdata)
xdata = np.asarray(xdata, float)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

scipy/optimize/tests/test_minpack.py Show resolved Hide resolved

if isinstance(xdata, (list, tuple, np.ndarray)):
# `xdata` is passed straight to the user-defined `f`, so allow
# non-array_like `xdata`.
if check_finite:
xdata = np.asarray_chkfinite(xdata)
xdata = np.asarray_chkfinite(xdata, float)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment doesn't belong here, but still...

The docstring for xdata needs to be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it needs to be explicit.

xdata : array_like or None

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can really be any object, so I edited the docstring accordingly.

@anntzer anntzer force-pushed the curve_fit-xdata branch 2 times, most recently from 7188e68 to 273029a Compare May 28, 2019 10:37
The *xdata* argument to curve_fit is essentially a helper in the case
where one wants to write the fit function as `func(x, *params)`.  Until
scipy 0.13, it was possible to just pass a dummy value (e.g. None) as
*xdata*, e.g.
```
curve_fit(lambda _, a: a * np.arange(10), None, 2 * np.arange(10))
```
A comment in the implementation clearly states that the intent was for
*xdata* to be anything, including a non-array:
```
    if isinstance(xdata, (list, tuple, np.ndarray)):
        # `xdata` is passed straight to the user-defined `f`, so allow
        # non-array_like `xdata`.
```
(Effectively, this is an interface closer to the one of `least_squares`,
with the advantage that the covariance matrix is also returned.)

0.13.0 added two checks that make this impossible: it now requires that
*xdata* is not empty, and casts it to float64 for improved precision.

The non-emptiness check is spurious: it was introduced at the same time
as a check that *ydata* not be empty (which is necessary to avoid a
confusing error message, and is reasonable).  This PR removes it.

The cast to float64 can be applied only if xdata is indeed an
array-like, which this PR does.
@andyfaff
Copy link
Contributor

I'm happy with these changes. Could you add a line to the release notes outlining what's changed here?

@andyfaff andyfaff merged commit 3a37396 into scipy:master Jun 18, 2019
@anntzer anntzer deleted the curve_fit-xdata branch June 19, 2019 09:06
@anntzer
Copy link
Contributor Author

anntzer commented Jun 19, 2019

Is there any chance this could go to 1.3.1 (if there are plans for such a release) instead of 1.4, on the basis that this fixes a regression? If yes, there's no wiki page for 1.3.1 release notes.

@andyfaff andyfaff modified the milestones: 1.3.1, 1.4.0 Jun 19, 2019
@andyfaff
Copy link
Contributor

I've marked it as having a 1.3.1 milestone, hopefully that's the mechanism for specifying that it should be added to such a release, if it occurs.
@tylerjereddy do you know if there is a plan for a 1.3.1 release?

@anntzer
Copy link
Contributor Author

anntzer commented Jun 19, 2019

I'll just post the release notes entry here, you can move it to whereever appropriate :)

scipy.optimize fixes

The xdata parameter to scipy.optimize.curve_fit is no longer checked to be
non-empty and no longer cast to float if not an array-like. This fixes a
regression in scipy 1.13.0, and restores the ability to use any object as xdata.

@tylerjereddy tylerjereddy added the backport-candidate This fix should be ported by a maintainer to previous SciPy versions. label Jun 24, 2019
@tylerjereddy
Copy link
Contributor

@andyfaff It is possible to do 1.3.1 I think, although I'd suggest that we haven't quite reached the "critical mass" of backports to justify it yet. I'll put the backport label on as well just in case.

@brandonrwin
Copy link

This regression came back, and rather than fixing it, the docs were changed. See: #12632

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants