-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add kernel density estimation to the statistics module #115532
Comments
This looks like a pretty clean useful addition. Nice draft! Some things to consider (better explored in draft PR form than inline in an issue draft): |
Thanks for the compliment on the draft. I've been iterating on it for a few weeks. FWIW, having a string kernel name is consistent with the API style for quantiles() and correlation() elsewhere in the module. I also like having kde() function fully self-contained, making it easier to use. The listed methods are consistently the only ones mentioned in KDE literature and SciPy only implements the normal kernel since it dominates the others in practice. So I think supporting custom kernels would be an over-generalization beyond what people actually use. Side note: While the code volume makes it look like the kernel is the most important part, the referenced paper says that changing kernels only makes minor differences in the output. The significant parameter by far is h which has dramatic effects on the final result. |
Feature
Proposal:
I propose promoting the KDE statistics recipe to be an actual part of the statistics module.
See: https://docs.python.org/3.13/library/statistics.html#kernel-density-estimation
My thought Is that it is an essential part of statistical reasoning about sample data
See: https://www.itm-conferences.org/articles/itmconf/pdf/2018/08/itmconf_sam2018_00037.pdf
Discussion of this feature:
This was discussed and reviewed with the module author who gave it his full support:
Working Draft:
Linked PRs
The text was updated successfully, but these errors were encountered: