-
Notifications
You must be signed in to change notification settings - Fork 273
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add checks on input data type for Model #723
Conversation
764d9e1
to
969a5ad
Compare
Codecov Report
@@ Coverage Diff @@
## master #723 +/- ##
==========================================
+ Coverage 93.64% 93.71% +0.07%
==========================================
Files 10 10
Lines 3428 3436 +8
==========================================
+ Hits 3210 3220 +10
+ Misses 218 216 -2
Continue to review full report at Codecov.
|
@peendebak Again, thanks for trying to work on this, but it seems like I'm not getting my main point across - I've tried to make it a couple of times now (well, in #722), but I'll try again. In this PR you say:
It might seem pedantic or like I'm splitting hairs here, but this is not exactly true. There is no such expectation on input data. Maybe this needs a FAQ entry. The parameter values in lmfit are double-precision floats. Like, not "expected to be" but a firm "are". For models, there is the data ("y", say) which must be numeric data. This is expected to be "1-D float64", but really more like expected to be coercible to "1-D float64". The user typically also has independent data that is used (somehow) in the calculation of the model. Lmfit places no restrictions on what the independent data is - really, none. It does not need to be numeric data - it could be a dict or (odd, but allowed) text or a database connection. It could be numeric data of type uint32 or complex128. The model function takes the independent data and the parameter values and calculates and returns some array that is "the model". The difference of that array with the "y" data (possibly scaled by a weighting factor) is used in the actual fit -- and that is what needs to be "1-D float64". Importantly, the data ("y" data) that you used is
If the range or scale of the independent data were larger or there were more data points, the fit may have worked. Anyway, the point is that we really want to not dictate how the user does their calculation, but we do want to make sure that the fit will try to work in most cases. I'm +1 on setting a more reasonable value for I'm less enthusiastic about adding a new UserWarning subclass that we use once or twice. That seems somehow like a race to see how many corner cases users can come up with. |
I agree with what Matt said in the two PRs.
If possible I would prefer sticking to the "default" values set by SciPy, unless the user explicitly changes them (i.e., the defaults seem to work well for them so why would we change that)? And if we do coerce data and/or function results to
Without looking at the code in detail I would think indeed that only doing this should work... Perhaps the reason it didn't seem to do so for @peendebak is that this should be done as well in the
I agree and don't really like adding a new subclass for this. |
@newville @reneeotten Let me try to summarize what I understand so far:
If the independent data is in
outputs in
With all this combined I see no options to improve the situation in |
We do have non-default values for the tolerances (stopping criteria). The
I think that doing that coercion is probably a wise thing to do. If the It isn't sufficient for this example because changes in the parameters at the 1.e-8 level combined with a Float32
It will help with other cases of reduced precision.
Yes, what you do in your model function is up to you. Promoting to float64 is a fine idea - definitely the best idea. And if you had been using one of the built-in models -- most of which do assume a 1D array for
Better? Yes. Likely to happen quickly? Hm, maybe not. And I would be comfortable increasing
Really, I see the problem as not using high-precision inputs. The fitting algorithms use and assume Float64 internally. Python is liberal in its data types and numpy supports many variations. This mismatch can cause problems.
Some things that might be considered:
I would not object to any of these. |
Some more details on how the different options mentioned above work out for the minimal example. For the minimal example there are three cases I considered: i) Data float64, independent data float64 With default values the first case and third case work out fine, the second one fails for the default settings. Option 1. This converts case iii) to case ii), so makes iii) fail. I would be hesitant therefore to do option 1) without some of the other options @newville Thanks for the assistance. Since none of the options match the current PR I will close it. |
@peendebak OK. I read your closure of this PR as meaning that you do not to implement any of the things I suggested to consider. FWIW, I think that increasing the default value of Anyway, thanks - I think the conversation was helpful. |
Description
lmfit
excepts input data to be in double-precision format. However, users can be unaware of this or pre-processing methods could accidentally cast to single-precision. This PR adds warnings to all model input data that is of type float32.Notes:
DataPrecisionWarning
so users can filter out the warning if they choose to. No error is generated so that the change is backwards compatible with current behaviour.float32
, which seems to be the most common type of mistake. Integer types are converted automatically to float type. Addingnp.float16
to the check would be an option.Type of Changes
Tested on
Verification
Have you