New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: override halfnorm fit #18760
ENH: override halfnorm fit #18760
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't run this locally yet, but the math is good.
What happens when floc
is greater than the minimum of the data?
Co-authored-by: Matt Haberland <mhaberla@calpoly.edu>
Good point. The minimal sample would be outside of the support of the distribution. The likelihood to observe that data point would be 0. Should we throw an error? |
I think so. In scipy/scipy/stats/_continuous_distns.py Lines 7412 to 7415 in 2e16ec5
|
Ok, I will also add that for the halflogistic distribution. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Confirmed math and compared implementation against math. Confirmed the speed and accuracy improvements locally. Confirmed that the appropriate error messages are raised when floc
greater than minimum of the data and when scale and location are both fixed. Tests look appropriate.
Reference issue
Part of #17832 and #11782
What does this implement/fix?
Adds an analytical fitting method for the halfnormal distribution. This one is a low hanging fruit compared to most other distributions. The default fit method works but is slow compared to the simple analytical result.
Additional information
Math
The log likelihood for the parameters$\mu$ , $\sigma$ given data ($x_1,...,x_n$ ) is then
As$x>0$ , this is is a monotonically increasing function for $\mu$ . The maximum likelihood is reached for $\mu=x_{min}$ as for a higher $\mu$ , the minimal value $x_{min}$ would be outside of the support of the distribution (similar to the halflogistic distribution in #18695).
Setting the gradient to zero yields the regular variance which depends on the location:
Fit results for free parameters
Details
import numpy as np
from scipy import stats
from time import perf_counter
import matplotlib.pyplot as plt
import os
rng = np.random.default_rng(2433006235492611347)
qrng = stats.qmc.Halton(6, seed=rng)
N = 1000
times_pr = np.full(N, np.nan)
times_super = np.full(N, np.nan)
nllfs_super = np.full(N, np.nan)
nllfs_pr = np.full(N, np.nan)
distribution = stats.halfnorm
for i in range(N):
print(i)
sample = qrng.random()
sample = stats.qmc.scale(sample, [-3, -3, -3, 1, -3, -1], [2, 3, 3, 5, 2, 1])[0]
_, loc, scale, size, floc = 10**sample[:5]
size = int(size)
nllf_diff = (nllfs_pr - nllfs_super)/np.abs(nllfs_super)
nllf_diff = nllf_diff[np.isfinite(nllf_diff)]
nllf_diff_good = -nllf_diff[nllf_diff < 0]
nllf_diff_bad = nllf_diff[nllf_diff > 0]
bins = np.arange(-6, 1.5, 0.25)
plt.hist(np.log10(times_pr), bins=bins, alpha=0.5)
plt.hist(np.log10(times_super), bins=bins, alpha=0.5)
plt.legend(('PR', 'main'))
plt.xlabel('log10 of
fit
execution time (s)')plt.ylabel('Frequency')
plt.title('Histogram of log10 of
fit
execution time (s), free')plt.savefig("Time_free.png")
plt.clf()
plt.hist(np.log10(nllf_diff_good), label="good", alpha=0.5)
plt.hist(np.log10(nllf_diff_bad), label="bad", alpha=0.5)
plt.xlabel('log10 of NLLF differences')
plt.ylabel('Frequency')
plt.title('Histogram of log10 of NLLF differences, free')
plt.legend()
plt.savefig("NNLF_Differences_free.png")
Fit results for fixed location
Fit results for fixed scale