More changes to coercing model inputs to Float64 arrays #899

newville · 2023-06-26T16:38:04Z

Description

This makes further changes to how non-ndarray inputs (primary data and independent variables) are handled by Model.fit(). See, for example #873.

This adds a coerce_farray option to Model.fit() which defaults to True. When True, data and independent variables that are array-like (ie, hasattr __array__, so pandas Series, hdf5 Grouops, probably anything from dask, xarray, etc) are coerced to Float64 (or Complex128 if complex) before the fit is begun.

It is now possible to completely turn that coercion off coerce_farray=False. In that case, no coercion of input data is done at all -- a list is still a list -- and the model function is expected to handle that correctly. Making this explicit and non-default means that the user basically has to know they are doing this, so that errors like

TypeError: can't multiply sequence by non-int of type 'float'

are going to be theirs to track down and fix. But, if someone wants to have a pile of input data that is int8 images and they know what they are doing with them, we don't need to coerce them to Float64. And the current state of converting some things but maybe not everything is just too confusing.

This also adds/modifies code in Minimizer to really make sure that the array sent to the solver is a 1-D Float64 array. That means that if the user's model function does send a pandas Series or other array-like object, it should be mostly handled okay.

Type of Changes

Bug fix
New feature
Refactoring / maintenance
Documentation / examples

Tested on

Python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:17:34) [Clang 14.0.6 ]
lmfit: 1.2.1.post8+g1ab95ae.d20230626, scipy: 1.10.1, numpy: 1.24.3, asteval: 0.9.30, uncertainties: 3.1.7

Verification

Have you

included docstrings that follow PEP 257?

referenced existing Issue and/or provided relevant link to mailing list?

verified that existing tests pass locally?

verified that the documentation builds locally?

squashed/minimized your commits and written descriptive commit messages?

added or updated existing tests to cover the changes?
updated the documentation and/or added an entry to the release notes (doc/whatsnew.rst)?
added an example?

This probably needs to have clarifying docs - even a doc section - and a couple of examples.

Still, comments and suggestions on this approach would be greatly appreciated.

…loat64, ravelling, applying nan policy

… independent vars to float64

codecov · 2023-06-26T18:04:46Z

Codecov Report

Merging #899 (6c41986) into master (7e63299) will decrease coverage by 0.29%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #899      +/-   ##
==========================================
- Coverage   93.60%   93.32%   -0.29%     
==========================================
  Files          10       10              
  Lines        3643     3653      +10     
==========================================
- Hits         3410     3409       -1     
- Misses        233      244      +11

Impacted Files	Coverage Δ
lmfit/lineshapes.py	`100.00% <100.00%> (ø)`
lmfit/minimizer.py	`90.26% <100.00%> (-1.11%)`	⬇️
lmfit/model.py	`91.11% <100.00%> (+0.02%)`	⬆️
lmfit/models.py	`91.38% <100.00%> (ø)`

reneeotten

@newville I quickly read through this PR and left two comments.

I think the docstrings could here and there be changed a little to follow conventions we use throughout the rest of the code for style and wording. But other than that this looks like an improvement to me.

If you're eager to get this merged I'll not hold it up for technicalities besides the two review comments. I haven't had much time recently to do thing myself and cannot commit to a timeline at the moment. But as I said small changes in documentation/docstrings can always be made later.

reneeotten · 2023-06-28T01:54:25Z

lmfit/minimizer.py

@@ -2493,6 +2508,9 @@ def _nan_policy(arr, nan_policy='raise', handle_inf=True):
    return arr


+_nan_policy = coerce_float64


what is the purpose of this redefine? The function started already with a _ which means that external code should not use it or rely on it. So ideally this should not be needed?

well, I moved "coerce to Float64, 1D (at least usually)" to this, so it seems a little less like "enforce the nan policy" and a little more like "make sure the array-like data will be acceptable to the solver", and then the name seemed less good. It also looked more like a function that didn't really need to be private.

And, yes _nan_policy is not part of the public API, but I've seen (or done myself!) a few occasions of downstream code relying on functions we never made public. We could deprecate that _nan_policy, but I don't think it is really doing any harm. Would a code comment like " for historical purposes, we will keep this alias for a while" be acceptable?

lmfit/model.py

newville · 2023-06-28T14:18:03Z

@reneeotten I'll look at the docstrings. I think this topic needs a documentation section: what data is acceptable, what "array-like" means to us, when coercion happens, and how to control it. The issue is most pronounced for Model, but also appears in Minimizer. Will try to add.

It's not so much that I am in a hurry as the number of small-ish changes recently. And when PRs stack up they start to overlap and it's hard for me to keep 3 separate PR branches sane and "on topic".

I think that we will be ready for 1.2.2 soon (say once all open PRs are merged), maybe in a couple of weeks. I don't think we need to rush, but I also don't think we want to have PRs coming at a rate of about 1 per week and sitting around for a month.

…used

newville · 2023-06-30T21:50:33Z

@reneeotten OK, I'm going to merge this then work on merging #888

newville added 7 commits June 26, 2023 10:54

harden some models and lineshapes to tolerated some array-like data

88132cd

harden some models and lineshapes to tolerated some array-like data

65212c4

rename/refactor _nan_policy function to coerce_float64, coercing to f…

a804d36

…loat64, ravelling, applying nan policy

needs explicit numpy arrays

5617524

add coerce_farray option to model, default of True to coerce data and…

580c107

… independent vars to float64

rewrite test_coercion_of_input_data to allow coerce_farray True/False

9eb3dae

fix testing of pandas types with pandas not present

08f48bf

Merge branch 'master' into model_coerce_farray

894e507

reneeotten reviewed Jun 28, 2023

View reviewed changes

newville added 5 commits June 28, 2023 12:47

fix argument order

e24a5f8

deprecated _nan_policy function

932fc2d

update docs for coerce_farray, clarify what data types can/should be …

7db2547

…used

return _nan_policy to a near copy of coerce_farray

c5b71ac

return _nan_policy

6c41986

newville merged commit 115113c into master Jun 30, 2023
15 checks passed

newville mentioned this pull request Jul 3, 2023

CompositeModel depends on return type of model functions #875

Closed

newville deleted the model_coerce_farray branch July 6, 2023 01:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

More changes to coercing model inputs to Float64 arrays #899

More changes to coercing model inputs to Float64 arrays #899

newville commented Jun 26, 2023 •

edited

Loading

codecov bot commented Jun 26, 2023 •

edited

Loading

reneeotten left a comment

reneeotten Jun 28, 2023

newville Jun 28, 2023

newville commented Jun 28, 2023

newville commented Jun 30, 2023

		@@ -2493,6 +2508,9 @@ def _nan_policy(arr, nan_policy='raise', handle_inf=True):
		return arr


		_nan_policy = coerce_float64

More changes to coercing model inputs to Float64 arrays #899

More changes to coercing model inputs to Float64 arrays #899

Conversation

newville commented Jun 26, 2023 • edited Loading

Description

Type of Changes

Tested on

Verification

codecov bot commented Jun 26, 2023 • edited Loading

Codecov Report

reneeotten left a comment

Choose a reason for hiding this comment

reneeotten Jun 28, 2023

Choose a reason for hiding this comment

newville Jun 28, 2023

Choose a reason for hiding this comment

newville commented Jun 28, 2023

newville commented Jun 30, 2023

newville commented Jun 26, 2023 •

edited

Loading

codecov bot commented Jun 26, 2023 •

edited

Loading