Possible wrong polyfit documentation #16842

davidedalbosco · 2020-07-13T10:19:07Z

Hi! I think that the documentation of numpy polyfit could be misleading/wrong when describing the scaling factor for the covariance matrix.

Indeed with the option cov=True, the covariance matrix is scaled so to return a unit value for reduced chi2 of the fit.
I, therefore, expect that the scaling factor should be chi2/(M-N) = chi2/dof, where M is the number of data points and N the parameters of the fit.

However, the documentation reports that the scaling factor is chi2/sqrt(N-dof), where I guess N is the number of data points.

I did a test and actually the scaling factor I found is consistent with chi2/dof, rather than the expression reported in the documentation.

Reproducing code example:

# Generate synthetically the data
# True parameters
import numpy as np

true_slope = 3
true_intercept = 7

x_data = np.linspace(-5, 5, 30)

# The y-data will have a noise term, to simulate imperfect observations
sigma = 1
y_data = true_slope * np.linspace(-5, 5, 30) + true_intercept
y_obs = y_data + np.random.normal(loc=0.0, scale=sigma, size=x_data.size)

# Here I generate artificially some unequal uncertainties 
# (even if there is no reason for them to be so)
y_uncertainties = sigma * np.random.normal(loc=1.0, scale=0.5*sigma, size=x_data.size)

# Make the fit
popt, pcov = np.polyfit(x_data, y_obs, 1, w=1/y_uncertainties, cov='unscaled')
popt, pcov_scaled = np.polyfit(x_data, y_obs, 1, w=1/y_uncertainties, cov=True)

my_scale_factor = np.sum((y_obs - popt[0] * x_data  - popt[1])**2 / y_uncertainties**2)\
                         / (len(y_obs)-2)

scale_factor =  pcov_scaled[0,0] / pcov[0,0]

As you can see the real scaling factor is equal to "my_scale_factor", i.e. chi2/dof.

Numpy/Python version information:

1.18.5 3.7.7 (default, May 6 2020, 11:45:54) [MSC v.1916 64 bit (AMD64)]

The text was updated successfully, but these errors were encountered:

melissawm · 2020-07-13T14:11:31Z

Hello @davidedalbosco , thanks for bringing this up. Looks like that comment was added in this PR and from what I understand your observation is correct. This is when that comment was added; note that in the code the scaling factor is fac = resids / (len(x) - order), where order=deg+1. Unless I'm missing something, I'm not sure what was meant with that comment. Maybe @mhvk could help here?

han-kwang · 2020-07-13T14:32:48Z

I'm coming here from the stackoverflow question. I confirm that the present implementation is reasonable and that just the docstring needs to be updated. I don't know enough about Bayesian statistics.

mhvk · 2020-07-13T18:53:42Z

Indeed, the documentation seems to have not been updated quite right. Sorry! Would be wonderful if someone could make a quite PR...

davidedalbosco · 2020-07-15T20:49:58Z

Indeed, the documentation seems to have not been updated quite right. Sorry! Would be wonderful if someone could make a quite PR...

I am new to git, but I tried to do a pull request, you can find it at the link
#16878

Hopefully I didn't do anything wrong.

melissawm added the 04 - Documentation label Jul 13, 2020

rossbar added the component: numpy.lib label Jul 13, 2020

davidedalbosco mentioned this issue Jul 15, 2020

DOC: edit to the documentation of lib/polynomial.py/polyfit #16878

Merged

melissawm closed this as completed in #16878 Jul 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible wrong polyfit documentation #16842

Possible wrong polyfit documentation #16842

davidedalbosco commented Jul 13, 2020

melissawm commented Jul 13, 2020

han-kwang commented Jul 13, 2020

mhvk commented Jul 13, 2020

davidedalbosco commented Jul 15, 2020

Possible wrong polyfit documentation #16842

Possible wrong polyfit documentation #16842

Comments

davidedalbosco commented Jul 13, 2020

Reproducing code example:

Numpy/Python version information:

melissawm commented Jul 13, 2020

han-kwang commented Jul 13, 2020

mhvk commented Jul 13, 2020

davidedalbosco commented Jul 15, 2020