WIP: Add scale_pcov argument to scipy.optimize.curve_fit #448

cdeil · 2013-02-24T23:27:03Z

An option scale_pcov is added to scipy.optimize.curve_fit, to accommodate the common cases:

sigma = relative weights, output covariance matrix pcov should be scaled. This was the previous behaviour, so to keep backwards compatibility I chose the default scale_pcov=True.
sigma = one standard deviation errors on ydata, pcov should not be scaled. This use case was requested e.g. here and can now be obtained by explicitly setting scale_pcov=False.

@josef-pkt @taldcroft Please comment if the description in the docstring is OK or if it can be improved.
I was thinking that maybe a new option yerr or ydata_err that is mutually exclusive with sigma instead of the new option scale_pcov could be better?

I can add a unit test for scale_pcov=False tomorrow.

cdeil · 2013-02-24T23:30:22Z

scipy/optimize/minpack.py

@josef-pkt What's this case pcov=inf? Did I put the if scale_pcov in the right place?

I think you will have to put the if scale_pcov at line 536, inside the condition for valid pcov.

I think the case covers when the outerproduct of Jacobian (Hessian) is not invertible, for example reduced rank/perfect multicollinearity.
I think with have pcov is None in that case.
I never looked at the case when (len(ydata) > len(p0)) doesn't hold. I don't know what leastsq returns in that case. (doesn't make much statistical sense.)

returning inf is reasonable, but Travis could also have chosen nan or None instead.

taldcroft · 2013-02-25T03:47:35Z

@cdeil, @josef-pkt - what do you think about using a parameter name relative_weights instead of scale_pcov. The intent here is to have the parameter name reflect what is obvious to a user. Something like:

    relative_weights : bool
        If True (default), interpret the input ``sigma`` parameter as
        a relative weighting factor in the least squares problem, where
        the mean of ``sigma`` is scaled to the standard deviation of ``ydata`.  
        If False then the ``sigma`` parameter provides the standard deviation
        of each ``ydata`` point.

I realize I'm turning things on end a bit, and possibly even making a mistake since I haven't studied the code thoroughly. But the point is that as an end user I don't know (or care) about the internal operation of rescaling the output covariance. From the user perspective what is conceptually happening is changing the interpretation of the sigma.

With this suggested change I would also drop all the explicit calculational details in the documentation. In any case it wouldn't be obvious to me what the pcov rescaling is doing and whether the output is what I would expect (from the astronomer perspective).

taldcroft · 2013-02-25T03:54:14Z

As I think about it more, I almost prefer the initial suggestion of using a mutually exclusive kwarg like ydata_err. It's not a strong preference, but this option would tend to make the code cleaner, at least for those that would otherwise need relative_weights=True in every call to curve_fit. This way you just have one or the other in all your code.

josef-pkt · 2013-02-25T04:24:00Z

one problem with using the term weights prominently is that it will be confusing about whether it's a factor or inverse factor in the weighting.

in the function weights = 1/sigma
and in statsmodels, we also use weights as a multiplicative factor, not division
I think this terminology is common, but I'm not sure how exclusively used.

cdeil · 2013-02-25T06:41:24Z

I think my slightly preferred solution would also be to use ydata_err instead of scale_pcov or relative_weights.
And for the sigma case, I'd prefer describing (and maybe even implementing) it as Tom suggested: the sigma are scaled like ydata_err = sigma / ydata.std() and then in any case a chi**2 fit is done with chi = (f(xdata, *popt) - ydata) / ydata_err.
@josef-pkt and others: what do you think? would that also make sense e.g. to economists / statisticians?

josef-pkt · 2013-02-25T12:43:08Z

why do you scale by the standard deviation of the ydata? ydata_err = sigma / ydata.std()
This looks to me like another, third definition of sigma.
(If pcov is scaled by the error std estimate as now, then it doesn't make a difference to pcov.)

Can either of you write a test case, ydata, xdata, expected results for params and sqrt(diag(pcov))?

My impression is that there are two mutually exclusive user groups, and each won't understand the interpretation for the other part. If this get's too complicated, then maybe adding a second function would be better.

cdeil · 2013-02-25T13:10:09Z

@josef-pkt We certainly don't want to introduce a third definition of sigma. We want to understand the scale factor you apply to pcov (do you have a reference?). There should be an equivalent scale factor for sigma that would result in the same pcov, and our guess is that it is 1 / ydata.std() or similar. I'll write a test case as an IPython notebook now.

josef-pkt · 2013-02-25T14:00:32Z

we don't know the correct scale before estimation, that's why we multiply pcov by an estimate of the error variance.

What do you expect in the case that sigma is not given? :
Without scaling, the assumption is that the error variance is one. With scaling of pcov, the assumption is that
sigma^2 = (ydata - ypredicted)**2 / (nobs - n_params)
is a good estimate of the error variance.
The objective function is ((ydata - f(xdata, params))**2) not a scaled version in this case, which depends on an arbitrary scale of ydata.
I'm trying to find a reference.

cdeil · 2013-02-25T14:22:20Z

Here's my gist.

I'm pretty sure there is no factor we can apply to sigma before the fit that would correspond to the factor chi ** 2 / dof that is currently applied to pcov after the fit, but I don't have an argument proving this.

So I guess we should forget about this idea and focus on explaining well in the docstring why scaling pcov by chi **2 / dof corresponds to estimating the residual variance from the data and is thus a reasonable thing to do if the sigmas represent only relative weights of the (xdata, ydata) measurements.

@josef-pkt I think a whole separate function might be too extreme, given that the difference is only one line of code. What do you think about calling the new parameter ydata_err or ydata_error?

taldcroft · 2013-02-25T14:26:29Z

@josef-pkt - when sigma is not given I usually expect that it defaults to 1.0. But we are experiencing a language issue (coming from different disciplines) because I tend to think of sigma as being a multiplier in front of a standard deviation, i.e. sigma=3 would be a 3-sigma error. So what I really mean is that if ydata_err were not supplied then it should default to 1.0, which is consistent with most fitting applications I've used.

About a reference, I'm not sure if it's online, but Numerical Recipes in the section describing mrqmin talks in detail about Lev-Mar minimization and the covariance matrix, and provides a proof of the relation between the covariance matrix and the parameter uncertainties.

taldcroft · 2013-02-25T14:34:10Z

@cdeil - could you add to your gist an example where you fit a constant to some data to estimate the mean? The advantage is that the "correct" answers for mean and uncertainty on mean are trivial to compute. For the "ydata_err" version, the uncertainty estimate should be independent of the variance of the input data since this corresponds to the case where we are forcing our knowledge of data uncertainties and not relying on the actual data variance.

This would be a good test case as well, again because the correct answers are obvious by inspection.

taldcroft · 2013-02-25T14:36:51Z

I'm not immediately opposed to a new function just because it is only different by one line. But the question to ask is whether having a new function clarifies overall usage for people who want to fit, or confuses the issue. I suspect it will confuse the issue because you now see two fit functions and have to decide which one. I think the new "ydata_err" parameter would be my choice for the simplest way to make the distinction here.

taldcroft · 2013-02-25T14:40:35Z

BTW, I was assuming that a new function would mostly just consist of calling the existing curve_fit.

josef-pkt · 2013-02-25T14:59:32Z

On Mon, Feb 25, 2013 at 9:26 AM, Tom Aldcroft notifications@github.comwrote:

@josef-pkt https://github.com/josef-pkt - when sigma is not given I
usually expect that it defaults to 1.0. But we are experiencing a language
issue (coming from different disciplines) because I tend to think of sigmaas being a multiplier in front of a standard deviation, i.e.
sigma=3 would be a 3-sigma error. So what I really mean is that if
ydata_err were not supplied then it should default to 1.0, which is
consistent with most fitting applications I've used.

sigma as a multiplier: multiplying with what? My impression is that's
what we are doing sigma * estimated error standard deviation
but in all references I've seen, sigma_i is usually the standard deviation
of the error of the i'th observation, that we have to estimate.

About a reference, I'm not sure if it's online, but Numerical Recipes in
the section describing mrqmin talks in detail about Lev-Mar minimization
and the covariance matrix, and provides a proof of the relation between the
covariance matrix and the parameter uncertainties.

NR is not on my usual reading list, I can look for it

cdeil · 2013-02-25T15:02:42Z

I have updated the gist with a cell at the end that shows that multiplying the pcov by the reduced chi square is equivalent to dividing the input sigma by the square root of that number, which in turn corresponds to forcing a reduced chi square of one, i.e. to estimating the absolute scale of the sigmas such that an average goodness of fit is obtained.

:-)

I don't know how to explain it better, sorry, hopefully Josef will find a reference that does a better job.

Given that curve_fit is used by many scientists / engineers that don't know much about fitting, I think explaining this well in the docs is important.

@taldcroft I will make a more extensive notebook that also contains the basic example you have in your notebook, but for now I just wanted to understand myself what is going on ...

josef-pkt · 2013-02-25T15:26:29Z

good gist, I think it explains the difference pretty well.

Note, I never heard of "chi" before this discussion, because as far as I have seen, it (chi2 / dof) is always 1 by definition.

Overall I like the boolean solution better, but understand that it would be easier to use if there are different sigma-weight arguments.
However, if neither are specified, then we still expect two different behavior, in usual statistic the standard deviation of the estimated error is used, you would expect that it is assumed to be 1.

Tillsten · 2013-02-25T15:30:34Z

I would argue that curve_min is inadequate for most scientist anyways: It is missing parameter bounds and the ability to fix parameters. Probably something like lmfit, kaptyn or sherpa should be included into scipy.

josef-pkt · 2013-02-25T15:31:29Z

Numerical Recipes:
I found a numerical recipes in C pdf (2nd ed.), and as far as I can see from a quick browsing of section 15.5 on Non-linear models, it doesn't discuss at all where the sigma_i are supposed to come from. (It looks a bit more like optimize.leastsq than optimize.curve_fit to me)

josef-pkt · 2013-02-25T15:47:27Z

looking at SAS, I didn't see a weights option, but the allow a user-given residual variance

 SIGSQ=value

    specifies a value to use as the estimate of the residual variance in lieu of the estimated mean-squared error. This value is used in computing the standard errors of the estimates. Fixing the value of the residual variance can be useful, for example, in maximum likelihood estimation.

in your case it would be 1.

Related:
in statsmodels we have to fix sometimes the "scale" which is the factor that multiplies the raw covariance matrix. In some models we have internally scale=1.

So one option would also be to add a scale or err_scale option, with None as default (estimate like now) and users can set it to 1 to fix the scale.

josef-pkt · 2013-02-26T05:55:47Z

another package
http://www.originlab.com/www/helponline/Origin/en/UserGuide/The_Fit_Results.html#Parameter_Standard_Errors
halfway in this section they have

You can choose whether to exclude s2 when calculating the covariance matrix. This will affect the 
Standard Error values. When excluding s2, clear the Use reduce Chi-Sqr check box on the 
Advanced page. 
``

pv · 2013-02-28T10:38:22Z

I'd prefer to have a separate keyword argument, say, y_err, for specifying the known absolute errors in the y-data. In physical sciences, you often know or have some thumb rule for these.

The documentation should also explain that the covariance matrix is either based on standard errors estimated from the data, or those supplied by the user. In the estimation case, it needs to also discuss how the relative weights affect the estimate. It should also explain whether the relative weights are 1/sigma_i, sigma_i, 1/sigma_i^2.

I wouldn't like boolean flags and adding discussion about "scaling" the covariance matrix --- the covariance matrix is defined as <dx_i dx_j>, I don't think there's room for "scaling" it, just different ways to estimate it.

So for instance weights (relative weights), y_err (absolute errors). These should probably be made mutually exclusive --- raise an error if both are given. If the user wants the SAS-like functionality where they know the residual variance and relative weights, they could pass in y_err=known_abs_error/weights.

@Tillsten: bounds and constraints (eg. using L-BFGS-B) and fixed parameter masks could probably be added to curve_fit without too many problems.

Tillsten · 2013-02-28T14:53:04Z

@pv I am not sure, for the typical data fitting problems i had much better performance with levenberg-marquard than any of the other solvers. I also think there should be a scipy.fitting package. The optimize package tends to irritate people who wants just fit a function. I also think it is enough to include one of existing packages, e.g.
kaptyn (http://www.astro.rug.nl/software/kapteyn/license.html) or lmfit. both are BSD-licensend and already do all those things. (maybe i am biased towards lmfit as a contributor).

josef-pkt · 2013-02-28T15:17:33Z

http://www.astro.rug.nl/software/kapteyn/kmpfit.html

abswei – True if weights are absolute. For absolute weights the unscaled covariance matrix elements 
are used in the calculations. For unit weighting (i.e. unweighted) and relative weighting, the 
covariance matrix elements are scaled with the value of the reduced chi squared.

taldcroft · 2013-02-28T15:24:36Z

I agree strongly with each of the first four comments that @pv made concerning the immediate issue of what to do with the sigma and covariance in curve_fit. I think this concisely captures where @cdeil should go with this PR.

josef-pkt · 2013-02-28T15:47:09Z

I don't see a way to implement two different sigma arguments that almost mean the same, are mutually exclusive, should have a "statistics" default, and to explain this easily in the doc string.
So far all other packages I have seen use a bool.

In analogy to the linear model and the model without weights/sigma, we always (?) have a scale factor. This function is minimizing the sum of squares ("Nonlinear Least Squares") not "chi" or "chisquare" as in astronomy packages.
(But obviously, I'm the only one with a different background here.)

pv · 2013-03-01T09:23:26Z

@Tillsten: if on this page below the "Fitting" subtitle, there would be additional tools, people would not find them? The point is that having too many namespaces also complicates things. Anyway, I won't object too strongly. We could as well rename scipy.odr to scipy.fitting and add whatever additional stuff we want there.

Tillsten · 2013-03-01T09:37:05Z

@pv Maybe a submodul in optimize would be enough? As someone who tries to spread python typical question to curve_fit are:

2d-data don't work,
how to fix a parameter,
how to set bounds,
whats the error estimate?
resulting statistics

Because installing custom packages in windows as an non admin is quite an adventure (thank god for winpython), i would be very happy to see these basic functionality in scipy. sorry btw for going a little bit offtopic.

rgommers · 2013-03-02T12:01:36Z

I agree with @Tillsten's last comment that we need this functionality (but preferably not in a new submodule). We discussed before including lmfit at some point. I haven't used any of Sherpa, Kapteyn and Minuit though, and lmfit not very extensively, so no opinion on their relative merits.

astrofrog · 2013-06-16T12:29:42Z

As I indicated on the scipy-user mailing list, I would really like to see this pull request implemented since in most cases what I care about is the true uncertainty on the parameters, taking into account the errors on the data.

I also think that the docstring of curve_fit should be expanded to show an example of using scale_pcov=True demonstrating the difference between the two.

sciunto · 2013-06-16T19:45:46Z

I totally share @Tillsten point of view about its exp. physicist experience. That's why I use lmfit for this purpose (and it still longer than my favorite gnuplot approach).

keflavich · 2013-06-18T17:03:48Z

👍 to treating sigma as the errors on the Y data and returning the appropriate covariance matrix such that the diagonals are (in the absence of correlation) the variance on the fitted parameters. I had assumed that's what curve_fit did until @astrofrog pointed out my misinterpretation!

josef-pkt · 2013-06-18T17:14:27Z

How about creating a second function astro_curve_fit or chisquare_fit, then neither group needs to read the docstring?

astrofrog · 2013-06-18T17:17:16Z

But is it really specific to Astronomy to not rescale the covariance matrix to chi^2/dof = 1? The curve_fit function has an option sigma, and for most scientists, sigma means the standard deviation of a Gaussian, so I don't see how being taken as relative errors makes sense?

josef-pkt · 2013-06-18T17:30:01Z

same arguments as before:

every (general) statistics package that I know, does by default the same as the current curve_fit.
None of the question on how to use curve_fit on stackoverflow mentions that they are surprised by the standard errors.

Since I figured out what this story is I have seen it mentioned in several packages and in tutorials oriented to some physical (?) sciences. But I have no idea how wide spread the use is outside of astronomy and related fields.

astrofrog · 2013-06-18T17:41:44Z

@josef-pkt - i think that just because no one is surprised doesn't make it right though ;) I've used it for a couple of years before realizing this myself... Anyway, if that is what other packages are doing, then that does add to the argument for keeping the current default, but do other packages also call the errors sigma in this case? I just find that confusing...

keflavich · 2013-06-18T17:43:06Z

At least in R, the standard statistics package, they call it weights when used in this context: http://stat.ethz.ch/R-manual/R-patched/library/stats/html/lm.html

josef-pkt · 2013-06-18T18:07:54Z

Note, I'm not too happy now about sigma either. But when curve_fit was added, I had no idea that someone doesn't want to scale by the estimated error variance.

weights = 1./sigma**2 in R and in weighted least squares in statsmodels. Since it's the inverse weights that is used in curve_fit, I thought sigma is not ambiguous.

There were some suggestion higher up in this PR, especially Pauli's, how to fix this without changing the default. (without repeating the entire discussion)

adding a chi2_fit might really reduce cognitive dissonance, even if almost all the code is the same.

astrofrog · 2013-06-18T21:29:16Z

@josef-pkt - thanks for the clarifications. I personally feel that the approach in this PR is fine, i.e. have an optional scale_pcov option. My philosophy is, if the user mis-uses something because they didn't read the docstring, that's their fault. If the docstring is missing information, it's our fault, so just adding scale_pcov with a clear docstring, and leaving the default behavior as it is now is fine.

josef-pkt · 2013-06-18T22:40:36Z

The only two things that I really care about are that the default stays with scaling and that the term weights is not used for the inverse weights or sigma.

Maybe a vote on the different solutions, and someone needs to update the pull request.

Then Ralf or Pauli can check and merge.

taldcroft · 2013-06-19T11:19:31Z

At this point the extra keyword option seems reasonable. I would vote for a flag name more like absolute_sigma, which gives a more immediate connection to the sigma input. I think if someone were reading source code without seeing the function docstring they would have a better chance of guessing what's going on.

The name scale_pcov doesn't say too much (at least to me), and also from the perspective of the physical scientist there would never be a reason to scale the covariance matrix in this way (because sigma should always be absolute for us) so the parameter name is just confusing.

josef-pkt · 2013-08-16T19:40:03Z

deadline for next release is approaching

pv · 2013-11-28T22:19:55Z

Perhaps it's time to get this moving. How about this: https://github.com/pv/scipy-work/compare/curve-fit-yerr
It's mainly just the same as in this PR, except renamed kwarg and changed documentation.

josef-pkt · 2013-11-29T00:48:29Z

@pv looks good to me

what happens if absolute_sigma is True and pcov is None (wasn't returned by leastsq)?
line 558

+        else:
+            pcov = inf

Should the behavior match the absolut_sigma is False case.
(even though I'm not sure why we return inf in this case)

astrofrog · 2013-11-29T08:57:13Z

@pv - looks good to me too!

pv · 2013-11-29T18:29:33Z

Continued in gh-3098

@josef-pkt: I changed the returned pcov be a 2D matrix full of inf than inf scalar in the indeterminate variance case. Maybe it's better like this?

pv · 2013-12-01T18:35:59Z

Merged in 511ea8f, thanks everyone for contributing to the discussion.

Add scale_pcov argument to scipy.optimize.curve_fit

14f89a4

cdeil reviewed Feb 24, 2013
View reviewed changes

scipy-gitbot mentioned this pull request Apr 25, 2013

signal.residue causes crash whenever denominator has repeated roots (Trac #364) #891

Closed

taldcroft mentioned this pull request Jun 16, 2013

Update curve_fit example in the quick tour for correct covariance python4astronomers/python4astronomers#14

Open

argriffing mentioned this pull request Oct 16, 2013

Curve_fit covariance matrix unaffected by error scaling #3001

Closed

rgommers mentioned this pull request Nov 28, 2013

Added linfit function for linear least squares fitting numpy/numpy#4080

Closed

pv mentioned this pull request Nov 29, 2013

Specifying absolute errors in curve_fit #3098

Merged

pv closed this Nov 29, 2013

argriffing mentioned this pull request May 25, 2014

misleading documentation of scipy.optimize.curve_fit #3682

Closed

jotasi mentioned this pull request Jun 1, 2018

BUG/ENH: Removed non-standard scaling of the covariance matrix and added option to disable scaling completely. numpy/numpy#11197

Merged

pv mentioned this pull request May 26, 2020

ENH: add optional argument for scaled covariance matrix of ODR #12207

Closed

Uh oh!

WIP: Add scale_pcov argument to scipy.optimize.curve_fit #448

WIP: Add scale_pcov argument to scipy.optimize.curve_fit #448

Uh oh!

Conversation

cdeil commented Feb 24, 2013

Uh oh!

cdeil Feb 24, 2013

Choose a reason for hiding this comment

Uh oh!

josef-pkt Feb 24, 2013

Choose a reason for hiding this comment

Uh oh!

taldcroft commented Feb 25, 2013

Uh oh!

taldcroft commented Feb 25, 2013

Uh oh!

josef-pkt commented Feb 25, 2013

Uh oh!

cdeil commented Feb 25, 2013

Uh oh!

josef-pkt commented Feb 25, 2013

Uh oh!

cdeil commented Feb 25, 2013

Uh oh!

josef-pkt commented Feb 25, 2013

Uh oh!

cdeil commented Feb 25, 2013

Uh oh!

taldcroft commented Feb 25, 2013

Uh oh!

taldcroft commented Feb 25, 2013

Uh oh!

taldcroft commented Feb 25, 2013

Uh oh!

taldcroft commented Feb 25, 2013

Uh oh!

josef-pkt commented Feb 25, 2013

Uh oh!

cdeil commented Feb 25, 2013

Uh oh!

josef-pkt commented Feb 25, 2013

Uh oh!

Tillsten commented Feb 25, 2013

Uh oh!

josef-pkt commented Feb 25, 2013

Uh oh!

josef-pkt commented Feb 25, 2013

Uh oh!

josef-pkt commented Feb 26, 2013

Uh oh!

pv commented Feb 28, 2013

Uh oh!

Tillsten commented Feb 28, 2013

Uh oh!

josef-pkt commented Feb 28, 2013

Uh oh!

taldcroft commented Feb 28, 2013

Uh oh!

josef-pkt commented Feb 28, 2013

Uh oh!

pv commented Mar 1, 2013

Uh oh!

Tillsten commented Mar 1, 2013

Uh oh!

rgommers commented Mar 2, 2013

Uh oh!

astrofrog commented Jun 16, 2013

Uh oh!

sciunto commented Jun 16, 2013

Uh oh!

keflavich commented Jun 18, 2013

Uh oh!

josef-pkt commented Jun 18, 2013

Uh oh!

astrofrog commented Jun 18, 2013

Uh oh!

josef-pkt commented Jun 18, 2013

Uh oh!

astrofrog commented Jun 18, 2013