Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KernelDensity docstring #3270

Closed
mblondel opened this issue Jun 12, 2014 · 12 comments
Closed

KernelDensity docstring #3270

mblondel opened this issue Jun 12, 2014 · 12 comments

Comments

@mblondel
Copy link
Member

I had troubles understanding how parameters related to KD-tree and Ball tree affect DensityEstimation.

In my understanding, KDE uses the average evaluation of kernels centered on every training point so I didn't quite understand how KD-tree and Ball-tree are useful (I would understand if it used, say, the average of the k nearest neighbors only).

The docstring says that we can specify tolerance options rtol and atol. With respect to what stopping criterion are these constants used?

Regarding the breadth_first option, the docstring says "use a breadth-first approach to the problem". I didn't understand what "problem" it refers to.

What is the practical impact of of the tree related parameters? Do they only affect speed or can they also affect quality of estimation?

Sorry for the naive questions but it would help my understanding if we could add a couple of words to clarify the docstring.

BTW, thanks @jakevdp for this great module!

@mblondel
Copy link
Member Author

Just realized that Jake had written a nice blog post answering most of my questions.
http://jakevdp.github.io/blog/2013/12/01/kernel-density-estimation/

Would be nice to improve the documentation based on this blog post.

@jakevdp
Copy link
Member

jakevdp commented Jun 13, 2014

Would be nice to improve the documentation based on this blog post.

Yes, I agree! Part of the reason this info is not in the doc string is that I hadn't explored it in detail yet: the blog post was the result of me trying to figure that out 😀

@Hasil-Sharma
Copy link
Contributor

Hi, Is this issue open for resolving ?

@mblondel
Copy link
Member Author

@Hasil-Sharma All issues are open for resolving :)

@Winterflower
Copy link

Working on improving the docs based on the blog post mentioned above,

@jakevdp
Copy link
Member

jakevdp commented Aug 31, 2014

Awesome - thanks @Winterflower

@jakevdp
Copy link
Member

jakevdp commented Aug 31, 2014

One thing I'd thought of here: the default parameter value asks for exact results, which is basically the slowest possible algorithm. Most users will not likely dig into the doc strings to figure this out... perhaps we should change it to use some reasonable error threshold as the default?

@amueller amueller added Easy Well-defined and straightforward way to resolve Need Contributor labels Oct 27, 2016
@amueller amueller added the Sprint label Mar 3, 2017
@loldja
Copy link
Contributor

loldja commented Mar 4, 2017

🤔 I'm working on this

@amueller
Copy link
Member

interestingly enough the BinaryTree.kernel_density function has the default rtol of 1E-8 but documents 0...

@amueller
Copy link
Member

I think we actually might need @jakevdp on this

@amueller amueller removed Easy Well-defined and straightforward way to resolve Sprint help wanted labels Sep 29, 2018
@reshamas
Copy link
Member

reshamas commented Jul 8, 2021

Note: This file (https://github.com/scikit-learn/scikit-learn/blob/506b12b2761ad88039114dec1c6c4fcec4a7a021/sklearn/neighbors/_binary_tree.pxi) has

        rtol : float, default=1e-8
            Specify the desired relative tolerance of the result.
            If the true result is `K_true`, then the returned result `K_ret`
            satisfies ``abs(K_true - K_ret) < atol + rtol * K_ret``
            The default is `1e-8` (i.e. machine precision).

and log_rtol is used after rtol definition.

@adrinjalali
Copy link
Member

Docstrings have evolved and improved. Closing this, happy to have a new issue if something's still unclear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants