Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible wrong polyfit documentation #16842

Closed
davidedalbosco opened this issue Jul 13, 2020 · 4 comments · Fixed by #16878
Closed

Possible wrong polyfit documentation #16842

davidedalbosco opened this issue Jul 13, 2020 · 4 comments · Fixed by #16878

Comments

@davidedalbosco
Copy link
Contributor

Hi! I think that the documentation of numpy polyfit could be misleading/wrong when describing the scaling factor for the covariance matrix.

Indeed with the option cov=True, the covariance matrix is scaled so to return a unit value for reduced chi2 of the fit.
I, therefore, expect that the scaling factor should be chi2/(M-N) = chi2/dof, where M is the number of data points and N the parameters of the fit.

However, the documentation reports that the scaling factor is chi2/sqrt(N-dof), where I guess N is the number of data points.

I did a test and actually the scaling factor I found is consistent with chi2/dof, rather than the expression reported in the documentation.

Reproducing code example:

# Generate synthetically the data
# True parameters
import numpy as np

true_slope = 3
true_intercept = 7

x_data = np.linspace(-5, 5, 30)

# The y-data will have a noise term, to simulate imperfect observations
sigma = 1
y_data = true_slope * np.linspace(-5, 5, 30) + true_intercept
y_obs = y_data + np.random.normal(loc=0.0, scale=sigma, size=x_data.size)

# Here I generate artificially some unequal uncertainties 
# (even if there is no reason for them to be so)
y_uncertainties = sigma * np.random.normal(loc=1.0, scale=0.5*sigma, size=x_data.size)

# Make the fit
popt, pcov = np.polyfit(x_data, y_obs, 1, w=1/y_uncertainties, cov='unscaled')
popt, pcov_scaled = np.polyfit(x_data, y_obs, 1, w=1/y_uncertainties, cov=True)

my_scale_factor = np.sum((y_obs - popt[0] * x_data  - popt[1])**2 / y_uncertainties**2)\
                         / (len(y_obs)-2)

scale_factor =  pcov_scaled[0,0] / pcov[0,0]

As you can see the real scaling factor is equal to "my_scale_factor", i.e. chi2/dof.

Numpy/Python version information:

1.18.5 3.7.7 (default, May 6 2020, 11:45:54) [MSC v.1916 64 bit (AMD64)]

@melissawm
Copy link
Member

Hello @davidedalbosco , thanks for bringing this up. Looks like that comment was added in this PR and from what I understand your observation is correct. This is when that comment was added; note that in the code the scaling factor is fac = resids / (len(x) - order), where order=deg+1. Unless I'm missing something, I'm not sure what was meant with that comment. Maybe @mhvk could help here?

@han-kwang
Copy link

I'm coming here from the stackoverflow question. I confirm that the present implementation is reasonable and that just the docstring needs to be updated. I don't know enough about Bayesian statistics.

@mhvk
Copy link
Contributor

mhvk commented Jul 13, 2020

Indeed, the documentation seems to have not been updated quite right. Sorry! Would be wonderful if someone could make a quite PR...

@davidedalbosco
Copy link
Contributor Author

Indeed, the documentation seems to have not been updated quite right. Sorry! Would be wonderful if someone could make a quite PR...

I am new to git, but I tried to do a pull request, you can find it at the link
#16878

Hopefully I didn't do anything wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants