Skip to content

Conversation

cdeil
Copy link
Contributor

@cdeil cdeil commented Feb 24, 2013

An option scale_pcov is added to scipy.optimize.curve_fit, to accommodate the common cases:

  • sigma = relative weights, output covariance matrix pcov should be scaled. This was the previous behaviour, so to keep backwards compatibility I chose the default scale_pcov=True.
  • sigma = one standard deviation errors on ydata, pcov should not be scaled. This use case was requested e.g. here and can now be obtained by explicitly setting scale_pcov=False.

@josef-pkt @taldcroft Please comment if the description in the docstring is OK or if it can be improved.
I was thinking that maybe a new option yerr or ydata_err that is mutually exclusive with sigma instead of the new option scale_pcov could be better?

I can add a unit test for scale_pcov=False tomorrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@josef-pkt What's this case pcov=inf? Did I put the if scale_pcov in the right place?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you will have to put the if scale_pcov at line 536, inside the condition for valid pcov.

I think the case covers when the outerproduct of Jacobian (Hessian) is not invertible, for example reduced rank/perfect multicollinearity.
I think with have pcov is None in that case.
I never looked at the case when (len(ydata) > len(p0)) doesn't hold. I don't know what leastsq returns in that case. (doesn't make much statistical sense.)

returning inf is reasonable, but Travis could also have chosen nan or None instead.

@taldcroft
Copy link
Contributor

@cdeil, @josef-pkt - what do you think about using a parameter name relative_weights instead of scale_pcov. The intent here is to have the parameter name reflect what is obvious to a user. Something like:

    relative_weights : bool
        If True (default), interpret the input ``sigma`` parameter as
        a relative weighting factor in the least squares problem, where
        the mean of ``sigma`` is scaled to the standard deviation of ``ydata`.  
        If False then the ``sigma`` parameter provides the standard deviation
        of each ``ydata`` point.

I realize I'm turning things on end a bit, and possibly even making a mistake since I haven't studied the code thoroughly. But the point is that as an end user I don't know (or care) about the internal operation of rescaling the output covariance. From the user perspective what is conceptually happening is changing the interpretation of the sigma.

With this suggested change I would also drop all the explicit calculational details in the documentation. In any case it wouldn't be obvious to me what the pcov rescaling is doing and whether the output is what I would expect (from the astronomer perspective).

@taldcroft
Copy link
Contributor

As I think about it more, I almost prefer the initial suggestion of using a mutually exclusive kwarg like ydata_err. It's not a strong preference, but this option would tend to make the code cleaner, at least for those that would otherwise need relative_weights=True in every call to curve_fit. This way you just have one or the other in all your code.

@josef-pkt
Copy link
Member

one problem with using the term weights prominently is that it will be confusing about whether it's a factor or inverse factor in the weighting.

in the function weights = 1/sigma
and in statsmodels, we also use weights as a multiplicative factor, not division
I think this terminology is common, but I'm not sure how exclusively used.

@cdeil
Copy link
Contributor Author

cdeil commented Feb 25, 2013

I think my slightly preferred solution would also be to use ydata_err instead of scale_pcov or relative_weights.
And for the sigma case, I'd prefer describing (and maybe even implementing) it as Tom suggested: the sigma are scaled like ydata_err = sigma / ydata.std() and then in any case a chi**2 fit is done with chi = (f(xdata, *popt) - ydata) / ydata_err.
@josef-pkt and others: what do you think? would that also make sense e.g. to economists / statisticians?

@josef-pkt
Copy link
Member

why do you scale by the standard deviation of the ydata? ydata_err = sigma / ydata.std()
This looks to me like another, third definition of sigma.
(If pcov is scaled by the error std estimate as now, then it doesn't make a difference to pcov.)

Can either of you write a test case, ydata, xdata, expected results for params and sqrt(diag(pcov))?

My impression is that there are two mutually exclusive user groups, and each won't understand the interpretation for the other part. If this get's too complicated, then maybe adding a second function would be better.

@cdeil
Copy link
Contributor Author

cdeil commented Feb 25, 2013

@josef-pkt We certainly don't want to introduce a third definition of sigma. We want to understand the scale factor you apply to pcov (do you have a reference?). There should be an equivalent scale factor for sigma that would result in the same pcov, and our guess is that it is 1 / ydata.std() or similar. I'll write a test case as an IPython notebook now.

@josef-pkt
Copy link
Member

we don't know the correct scale before estimation, that's why we multiply pcov by an estimate of the error variance.

What do you expect in the case that sigma is not given? :
Without scaling, the assumption is that the error variance is one. With scaling of pcov, the assumption is that
sigma^2 = (ydata - ypredicted)**2 / (nobs - n_params)
is a good estimate of the error variance.
The objective function is ((ydata - f(xdata, params))**2) not a scaled version in this case, which depends on an arbitrary scale of ydata.
I'm trying to find a reference.

@cdeil
Copy link
Contributor Author

cdeil commented Feb 25, 2013

Here's my gist.

I'm pretty sure there is no factor we can apply to sigma before the fit that would correspond to the factor chi ** 2 / dof that is currently applied to pcov after the fit, but I don't have an argument proving this.

So I guess we should forget about this idea and focus on explaining well in the docstring why scaling pcov by chi **2 / dof corresponds to estimating the residual variance from the data and is thus a reasonable thing to do if the sigmas represent only relative weights of the (xdata, ydata) measurements.

@josef-pkt I think a whole separate function might be too extreme, given that the difference is only one line of code. What do you think about calling the new parameter ydata_err or ydata_error?

@taldcroft
Copy link
Contributor

@josef-pkt - when sigma is not given I usually expect that it defaults to 1.0. But we are experiencing a language issue (coming from different disciplines) because I tend to think of sigma as being a multiplier in front of a standard deviation, i.e. sigma=3 would be a 3-sigma error. So what I really mean is that if ydata_err were not supplied then it should default to 1.0, which is consistent with most fitting applications I've used.

About a reference, I'm not sure if it's online, but Numerical Recipes in the section describing mrqmin talks in detail about Lev-Mar minimization and the covariance matrix, and provides a proof of the relation between the covariance matrix and the parameter uncertainties.

@taldcroft
Copy link
Contributor

@cdeil - could you add to your gist an example where you fit a constant to some data to estimate the mean? The advantage is that the "correct" answers for mean and uncertainty on mean are trivial to compute. For the "ydata_err" version, the uncertainty estimate should be independent of the variance of the input data since this corresponds to the case where we are forcing our knowledge of data uncertainties and not relying on the actual data variance.

This would be a good test case as well, again because the correct answers are obvious by inspection.

@taldcroft
Copy link
Contributor

I'm not immediately opposed to a new function just because it is only different by one line. But the question to ask is whether having a new function clarifies overall usage for people who want to fit, or confuses the issue. I suspect it will confuse the issue because you now see two fit functions and have to decide which one. I think the new "ydata_err" parameter would be my choice for the simplest way to make the distinction here.

@taldcroft
Copy link
Contributor

BTW, I was assuming that a new function would mostly just consist of calling the existing curve_fit.

@josef-pkt
Copy link
Member

On Mon, Feb 25, 2013 at 9:26 AM, Tom Aldcroft notifications@github.comwrote:

@josef-pkt https://github.com/josef-pkt - when sigma is not given I
usually expect that it defaults to 1.0. But we are experiencing a language
issue (coming from different disciplines) because I tend to think of sigmaas being a multiplier in front of a standard deviation, i.e.
sigma=3 would be a 3-sigma error. So what I really mean is that if
ydata_err were not supplied then it should default to 1.0, which is
consistent with most fitting applications I've used.

sigma as a multiplier: multiplying with what? My impression is that's
what we are doing sigma * estimated error standard deviation
but in all references I've seen, sigma_i is usually the standard deviation
of the error of the i'th observation, that we have to estimate.

About a reference, I'm not sure if it's online, but Numerical Recipes in
the section describing mrqmin talks in detail about Lev-Mar minimization
and the covariance matrix, and provides a proof of the relation between the
covariance matrix and the parameter uncertainties.

NR is not on my usual reading list, I can look for it

@cdeil
Copy link
Contributor Author

cdeil commented Feb 25, 2013

I have updated the gist with a cell at the end that shows that multiplying the pcov by the reduced chi square is equivalent to dividing the input sigma by the square root of that number, which in turn corresponds to forcing a reduced chi square of one, i.e. to estimating the absolute scale of the sigmas such that an average goodness of fit is obtained.

:-)

I don't know how to explain it better, sorry, hopefully Josef will find a reference that does a better job.

Given that curve_fit is used by many scientists / engineers that don't know much about fitting, I think explaining this well in the docs is important.

@taldcroft I will make a more extensive notebook that also contains the basic example you have in your notebook, but for now I just wanted to understand myself what is going on ...

@josef-pkt
Copy link
Member

good gist, I think it explains the difference pretty well.

Note, I never heard of "chi" before this discussion, because as far as I have seen, it (chi2 / dof) is always 1 by definition.

Overall I like the boolean solution better, but understand that it would be easier to use if there are different sigma-weight arguments.
However, if neither are specified, then we still expect two different behavior, in usual statistic the standard deviation of the estimated error is used, you would expect that it is assumed to be 1.

@Tillsten
Copy link

I would argue that curve_min is inadequate for most scientist anyways: It is missing parameter bounds and the ability to fix parameters. Probably something like lmfit, kaptyn or sherpa should be included into scipy.

@josef-pkt
Copy link
Member

Numerical Recipes:
I found a numerical recipes in C pdf (2nd ed.), and as far as I can see from a quick browsing of section 15.5 on Non-linear models, it doesn't discuss at all where the sigma_i are supposed to come from. (It looks a bit more like optimize.leastsq than optimize.curve_fit to me)

@josef-pkt
Copy link
Member

looking at SAS, I didn't see a weights option, but the allow a user-given residual variance

 SIGSQ=value

    specifies a value to use as the estimate of the residual variance in lieu of the estimated mean-squared error. This value is used in computing the standard errors of the estimates. Fixing the value of the residual variance can be useful, for example, in maximum likelihood estimation. 

in your case it would be 1.

Related:
in statsmodels we have to fix sometimes the "scale" which is the factor that multiplies the raw covariance matrix. In some models we have internally scale=1.

So one option would also be to add a scale or err_scale option, with None as default (estimate like now) and users can set it to 1 to fix the scale.

@josef-pkt
Copy link
Member

another package
http://www.originlab.com/www/helponline/Origin/en/UserGuide/The_Fit_Results.html#Parameter_Standard_Errors
halfway in this section they have

You can choose whether to exclude s2 when calculating the covariance matrix. This will affect the 
Standard Error values. When excluding s2, clear the Use reduce Chi-Sqr check box on the 
Advanced page. 
``

@pv
Copy link
Member

pv commented Feb 28, 2013

I'd prefer to have a separate keyword argument, say, y_err, for specifying the known absolute errors in the y-data. In physical sciences, you often know or have some thumb rule for these.

The documentation should also explain that the covariance matrix is either based on standard errors estimated from the data, or those supplied by the user. In the estimation case, it needs to also discuss how the relative weights affect the estimate. It should also explain whether the relative weights are 1/sigma_i, sigma_i, 1/sigma_i^2.

I wouldn't like boolean flags and adding discussion about "scaling" the covariance matrix --- the covariance matrix is defined as <dx_i dx_j>, I don't think there's room for "scaling" it, just different ways to estimate it.

So for instance weights (relative weights), y_err (absolute errors). These should probably be made mutually exclusive --- raise an error if both are given. If the user wants the SAS-like functionality where they know the residual variance and relative weights, they could pass in y_err=known_abs_error/weights.

@Tillsten: bounds and constraints (eg. using L-BFGS-B) and fixed parameter masks could probably be added to curve_fit without too many problems.

@Tillsten
Copy link

@pv I am not sure, for the typical data fitting problems i had much better performance with levenberg-marquard than any of the other solvers. I also think there should be a scipy.fitting package. The optimize package tends to irritate people who wants just fit a function. I also think it is enough to include one of existing packages, e.g.
kaptyn (http://www.astro.rug.nl/software/kapteyn/license.html) or lmfit. both are BSD-licensend and already do all those things. (maybe i am biased towards lmfit as a contributor).

@josef-pkt
Copy link
Member

http://www.astro.rug.nl/software/kapteyn/kmpfit.html

abswei – True if weights are absolute. For absolute weights the unscaled covariance matrix elements 
are used in the calculations. For unit weighting (i.e. unweighted) and relative weighting, the 
covariance matrix elements are scaled with the value of the reduced chi squared.

@taldcroft
Copy link
Contributor

I agree strongly with each of the first four comments that @pv made concerning the immediate issue of what to do with the sigma and covariance in curve_fit. I think this concisely captures where @cdeil should go with this PR.

@josef-pkt
Copy link
Member

I don't see a way to implement two different sigma arguments that almost mean the same, are mutually exclusive, should have a "statistics" default, and to explain this easily in the doc string.
So far all other packages I have seen use a bool.

In analogy to the linear model and the model without weights/sigma, we always (?) have a scale factor. This function is minimizing the sum of squares ("Nonlinear Least Squares") not "chi" or "chisquare" as in astronomy packages.
(But obviously, I'm the only one with a different background here.)

@pv
Copy link
Member

pv commented Mar 1, 2013

@Tillsten: if on this page below the "Fitting" subtitle, there would be additional tools, people would not find them? The point is that having too many namespaces also complicates things. Anyway, I won't object too strongly. We could as well rename scipy.odr to scipy.fitting and add whatever additional stuff we want there.

@Tillsten
Copy link

Tillsten commented Mar 1, 2013

@pv Maybe a submodul in optimize would be enough? As someone who tries to spread python typical question to curve_fit are:

  • 2d-data don't work,
  • how to fix a parameter,
  • how to set bounds,
  • whats the error estimate?
  • resulting statistics

Because installing custom packages in windows as an non admin is quite an adventure (thank god for winpython), i would be very happy to see these basic functionality in scipy. sorry btw for going a little bit offtopic.

@rgommers
Copy link
Member

rgommers commented Mar 2, 2013

I agree with @Tillsten's last comment that we need this functionality (but preferably not in a new submodule). We discussed before including lmfit at some point. I haven't used any of Sherpa, Kapteyn and Minuit though, and lmfit not very extensively, so no opinion on their relative merits.

@astrofrog
Copy link
Contributor

As I indicated on the scipy-user mailing list, I would really like to see this pull request implemented since in most cases what I care about is the true uncertainty on the parameters, taking into account the errors on the data.

I also think that the docstring of curve_fit should be expanded to show an example of using scale_pcov=True demonstrating the difference between the two.

@sciunto
Copy link
Contributor

sciunto commented Jun 16, 2013

I totally share @Tillsten point of view about its exp. physicist experience. That's why I use lmfit for this purpose (and it still longer than my favorite gnuplot approach).

@keflavich
Copy link

👍 to treating sigma as the errors on the Y data and returning the appropriate covariance matrix such that the diagonals are (in the absence of correlation) the variance on the fitted parameters. I had assumed that's what curve_fit did until @astrofrog pointed out my misinterpretation!

@josef-pkt
Copy link
Member

How about creating a second function astro_curve_fit or chisquare_fit, then neither group needs to read the docstring?

@astrofrog
Copy link
Contributor

But is it really specific to Astronomy to not rescale the covariance matrix to chi^2/dof = 1? The curve_fit function has an option sigma, and for most scientists, sigma means the standard deviation of a Gaussian, so I don't see how being taken as relative errors makes sense?

@josef-pkt
Copy link
Member

same arguments as before:

every (general) statistics package that I know, does by default the same as the current curve_fit.
None of the question on how to use curve_fit on stackoverflow mentions that they are surprised by the standard errors.

Since I figured out what this story is I have seen it mentioned in several packages and in tutorials oriented to some physical (?) sciences. But I have no idea how wide spread the use is outside of astronomy and related fields.

@astrofrog
Copy link
Contributor

@josef-pkt - i think that just because no one is surprised doesn't make it right though ;) I've used it for a couple of years before realizing this myself... Anyway, if that is what other packages are doing, then that does add to the argument for keeping the current default, but do other packages also call the errors sigma in this case? I just find that confusing...

@keflavich
Copy link

At least in R, the standard statistics package, they call it weights when used in this context: http://stat.ethz.ch/R-manual/R-patched/library/stats/html/lm.html

@josef-pkt
Copy link
Member

Note, I'm not too happy now about sigma either. But when curve_fit was added, I had no idea that someone doesn't want to scale by the estimated error variance.

weights = 1./sigma**2 in R and in weighted least squares in statsmodels. Since it's the inverse weights that is used in curve_fit, I thought sigma is not ambiguous.

There were some suggestion higher up in this PR, especially Pauli's, how to fix this without changing the default. (without repeating the entire discussion)

adding a chi2_fit might really reduce cognitive dissonance, even if almost all the code is the same.

@astrofrog
Copy link
Contributor

@josef-pkt - thanks for the clarifications. I personally feel that the approach in this PR is fine, i.e. have an optional scale_pcov option. My philosophy is, if the user mis-uses something because they didn't read the docstring, that's their fault. If the docstring is missing information, it's our fault, so just adding scale_pcov with a clear docstring, and leaving the default behavior as it is now is fine.

@josef-pkt
Copy link
Member

The only two things that I really care about are that the default stays with scaling and that the term weights is not used for the inverse weights or sigma.

Maybe a vote on the different solutions, and someone needs to update the pull request.

Then Ralf or Pauli can check and merge.

@taldcroft
Copy link
Contributor

At this point the extra keyword option seems reasonable. I would vote for a flag name more like absolute_sigma, which gives a more immediate connection to the sigma input. I think if someone were reading source code without seeing the function docstring they would have a better chance of guessing what's going on.

The name scale_pcov doesn't say too much (at least to me), and also from the perspective of the physical scientist there would never be a reason to scale the covariance matrix in this way (because sigma should always be absolute for us) so the parameter name is just confusing.

@josef-pkt
Copy link
Member

deadline for next release is approaching

@pv
Copy link
Member

pv commented Nov 28, 2013

Perhaps it's time to get this moving. How about this: https://github.com/pv/scipy-work/compare/curve-fit-yerr
It's mainly just the same as in this PR, except renamed kwarg and changed documentation.

@josef-pkt
Copy link
Member

@pv looks good to me

what happens if absolute_sigma is True and pcov is None (wasn't returned by leastsq)?
line 558

+        else:
+            pcov = inf

Should the behavior match the absolut_sigma is False case.
(even though I'm not sure why we return inf in this case)

@astrofrog
Copy link
Contributor

@pv - looks good to me too!

@pv
Copy link
Member

pv commented Nov 29, 2013

Continued in gh-3098

@josef-pkt: I changed the returned pcov be a 2D matrix full of inf than inf scalar in the indeterminate variance case. Maybe it's better like this?

@pv pv closed this Nov 29, 2013
@pv
Copy link
Member

pv commented Dec 1, 2013

Merged in 511ea8f, thanks everyone for contributing to the discussion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants