-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
More changes to coercing model inputs to Float64 arrays #899
Conversation
…loat64, ravelling, applying nan policy
… independent vars to float64
Codecov Report
@@ Coverage Diff @@
## master #899 +/- ##
==========================================
- Coverage 93.60% 93.32% -0.29%
==========================================
Files 10 10
Lines 3643 3653 +10
==========================================
- Hits 3410 3409 -1
- Misses 233 244 +11
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@newville I quickly read through this PR and left two comments.
I think the docstrings could here and there be changed a little to follow conventions we use throughout the rest of the code for style and wording. But other than that this looks like an improvement to me.
If you're eager to get this merged I'll not hold it up for technicalities besides the two review comments. I haven't had much time recently to do thing myself and cannot commit to a timeline at the moment. But as I said small changes in documentation/docstrings can always be made later.
lmfit/minimizer.py
Outdated
@@ -2493,6 +2508,9 @@ def _nan_policy(arr, nan_policy='raise', handle_inf=True): | |||
return arr | |||
|
|||
|
|||
_nan_policy = coerce_float64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this redefine? The function started already with a _
which means that external code should not use it or rely on it. So ideally this should not be needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, I moved "coerce to Float64, 1D (at least usually)" to this, so it seems a little less like "enforce the nan policy" and a little more like "make sure the array-like data will be acceptable to the solver", and then the name seemed less good. It also looked more like a function that didn't really need to be private.
And, yes _nan_policy
is not part of the public API, but I've seen (or done myself!) a few occasions of downstream code relying on functions we never made public. We could deprecate that _nan_policy
, but I don't think it is really doing any harm. Would a code comment like " for historical purposes, we will keep this alias for a while" be acceptable?
@reneeotten I'll look at the docstrings. I think this topic needs a documentation section: what data is acceptable, what "array-like" means to us, when coercion happens, and how to control it. The issue is most pronounced for Model, but also appears in Minimizer. Will try to add. It's not so much that I am in a hurry as the number of small-ish changes recently. And when PRs stack up they start to overlap and it's hard for me to keep 3 separate PR branches sane and "on topic". I think that we will be ready for 1.2.2 soon (say once all open PRs are merged), maybe in a couple of weeks. I don't think we need to rush, but I also don't think we want to have PRs coming at a rate of about 1 per week and sitting around for a month. |
@reneeotten OK, I'm going to merge this then work on merging #888 |
Description
This makes further changes to how non-ndarray inputs (primary data and independent variables) are handled by
Model.fit()
. See, for example #873.This adds a
coerce_farray
option toModel.fit()
which defaults toTrue
. WhenTrue
, data and independent variables that are array-like (ie, hasattr__array__
, so pandas Series, hdf5 Grouops, probably anything from dask, xarray, etc) are coerced to Float64 (or Complex128 if complex) before the fit is begun.It is now possible to completely turn that coercion off
coerce_farray=False
. In that case, no coercion of input data is done at all -- a list is still a list -- and the model function is expected to handle that correctly. Making this explicit and non-default means that the user basically has to know they are doing this, so that errors likeare going to be theirs to track down and fix. But, if someone wants to have a pile of input data that is int8 images and they know what they are doing with them, we don't need to coerce them to Float64. And the current state of converting some things but maybe not everything is just too confusing.
This also adds/modifies code in
Minimizer
to really make sure that the array sent to the solver is a 1-D Float64 array. That means that if the user's model function does send a pandas Series or other array-like object, it should be mostly handled okay.Type of Changes
Tested on
Python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:17:34) [Clang 14.0.6 ]
lmfit: 1.2.1.post8+g1ab95ae.d20230626, scipy: 1.10.1, numpy: 1.24.3, asteval: 0.9.30, uncertainties: 3.1.7
Verification
Have you
This probably needs to have clarifying docs - even a doc section - and a couple of examples.
Still, comments and suggestions on this approach would be greatly appreciated.