Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More changes to coercing model inputs to Float64 arrays #899

Merged
merged 13 commits into from
Jun 30, 2023

Conversation

newville
Copy link
Member

@newville newville commented Jun 26, 2023

Description

This makes further changes to how non-ndarray inputs (primary data and independent variables) are handled by Model.fit(). See, for example #873.

This adds a coerce_farray option to Model.fit() which defaults to True. When True, data and independent variables that are array-like (ie, hasattr __array__, so pandas Series, hdf5 Grouops, probably anything from dask, xarray, etc) are coerced to Float64 (or Complex128 if complex) before the fit is begun.

It is now possible to completely turn that coercion off coerce_farray=False. In that case, no coercion of input data is done at all -- a list is still a list -- and the model function is expected to handle that correctly. Making this explicit and non-default means that the user basically has to know they are doing this, so that errors like

TypeError: can't multiply sequence by non-int of type 'float'

are going to be theirs to track down and fix. But, if someone wants to have a pile of input data that is int8 images and they know what they are doing with them, we don't need to coerce them to Float64. And the current state of converting some things but maybe not everything is just too confusing.

This also adds/modifies code in Minimizer to really make sure that the array sent to the solver is a 1-D Float64 array. That means that if the user's model function does send a pandas Series or other array-like object, it should be mostly handled okay.

Type of Changes
  • Bug fix
  • New feature
  • Refactoring / maintenance
  • Documentation / examples
Tested on

Python: 3.10.10 | packaged by conda-forge | (main, Mar 24 2023, 20:17:34) [Clang 14.0.6 ]
lmfit: 1.2.1.post8+g1ab95ae.d20230626, scipy: 1.10.1, numpy: 1.24.3, asteval: 0.9.30, uncertainties: 3.1.7

Verification

Have you

  • included docstrings that follow PEP 257?
  • referenced existing Issue and/or provided relevant link to mailing list?
  • verified that existing tests pass locally?
  • verified that the documentation builds locally?
  • squashed/minimized your commits and written descriptive commit messages?
  • added or updated existing tests to cover the changes?
  • updated the documentation and/or added an entry to the release notes (doc/whatsnew.rst)?
  • added an example?

This probably needs to have clarifying docs - even a doc section - and a couple of examples.

Still, comments and suggestions on this approach would be greatly appreciated.

@codecov
Copy link

codecov bot commented Jun 26, 2023

Codecov Report

Merging #899 (6c41986) into master (7e63299) will decrease coverage by 0.29%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #899      +/-   ##
==========================================
- Coverage   93.60%   93.32%   -0.29%     
==========================================
  Files          10       10              
  Lines        3643     3653      +10     
==========================================
- Hits         3410     3409       -1     
- Misses        233      244      +11     
Impacted Files Coverage Δ
lmfit/lineshapes.py 100.00% <100.00%> (ø)
lmfit/minimizer.py 90.26% <100.00%> (-1.11%) ⬇️
lmfit/model.py 91.11% <100.00%> (+0.02%) ⬆️
lmfit/models.py 91.38% <100.00%> (ø)

Copy link
Contributor

@reneeotten reneeotten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@newville I quickly read through this PR and left two comments.

I think the docstrings could here and there be changed a little to follow conventions we use throughout the rest of the code for style and wording. But other than that this looks like an improvement to me.

If you're eager to get this merged I'll not hold it up for technicalities besides the two review comments. I haven't had much time recently to do thing myself and cannot commit to a timeline at the moment. But as I said small changes in documentation/docstrings can always be made later.

@@ -2493,6 +2508,9 @@ def _nan_policy(arr, nan_policy='raise', handle_inf=True):
return arr


_nan_policy = coerce_float64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of this redefine? The function started already with a _ which means that external code should not use it or rely on it. So ideally this should not be needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, I moved "coerce to Float64, 1D (at least usually)" to this, so it seems a little less like "enforce the nan policy" and a little more like "make sure the array-like data will be acceptable to the solver", and then the name seemed less good. It also looked more like a function that didn't really need to be private.

And, yes _nan_policy is not part of the public API, but I've seen (or done myself!) a few occasions of downstream code relying on functions we never made public. We could deprecate that _nan_policy, but I don't think it is really doing any harm. Would a code comment like " for historical purposes, we will keep this alias for a while" be acceptable?

lmfit/model.py Outdated Show resolved Hide resolved
@newville
Copy link
Member Author

@reneeotten I'll look at the docstrings. I think this topic needs a documentation section: what data is acceptable, what "array-like" means to us, when coercion happens, and how to control it. The issue is most pronounced for Model, but also appears in Minimizer. Will try to add.

It's not so much that I am in a hurry as the number of small-ish changes recently. And when PRs stack up they start to overlap and it's hard for me to keep 3 separate PR branches sane and "on topic".

I think that we will be ready for 1.2.2 soon (say once all open PRs are merged), maybe in a couple of weeks. I don't think we need to rush, but I also don't think we want to have PRs coming at a rate of about 1 per week and sitting around for a month.

@newville
Copy link
Member Author

@reneeotten OK, I'm going to merge this then work on merging #888

@newville newville merged commit 115113c into master Jun 30, 2023
15 checks passed
@newville newville deleted the model_coerce_farray branch July 6, 2023 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants