Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pdf() and logpdf() methods for scipy.stats.gaussian_kde #3198

Closed
juliohm opened this issue Jan 9, 2014 · 7 comments
Closed

pdf() and logpdf() methods for scipy.stats.gaussian_kde #3198

juliohm opened this issue Jan 9, 2014 · 7 comments
Labels
enhancement A new feature or improvement good first issue Good topic for first contributor pull requests, with a relatively straightforward solution scipy.stats
Milestone

Comments

@juliohm
Copy link

juliohm commented Jan 9, 2014

Please consider adding these two wrappers to the gaussian_kde class for being consistent with the rest of the distributions in scipy.stats:

def pdf(self, x):
    return self.evaluate(x)
def logpdf(self, x):
    return numpy.log(self.evaluate(x))

A Gaussian KDE can be thought as a non-parametric probability distribution.

@rgommers
Copy link
Member

rgommers commented Jan 9, 2014

Makes sense.

@nicodelpiano
Copy link
Contributor

#3398

I did a pull request for this issue.
Please, tell me what you think.

Cheers!

@juliohm
Copy link
Author

juliohm commented Feb 26, 2014

@nicodelpiano, your patch is fine, but I think there is a more fundamental issue in the SciPy Gaussian KDE implementation. It should do all the work in log scale to avoid the "vanishing problem" (i.e. probabilities goes to zero very rapidly in high dimensions) and expose it directly as logpdf(). The wrap would be the other way around:

def pdf(self, x):
    return numpy.exp(self.evaluate(x))

def logpdf(self, x):
    return self.evaluate(x)

Among the implementations I've found---SciPy, Statsmodels, Scikit-learn---only Scikit-learn does it correctly.

Of course this change in evaluate() wouldn't be backward compatible.

@rgommers
Copy link
Member

@juliohm that's kind of understandable given the focus of the different implementations. For statistics the high-dimensional use cases aren't common, for machine learning they are.

Might make sense to improve this anyway, but the changes have to be backwards-compatible. Maybe you can open a new issue for this with a proposal of what to change.

@juliohm
Copy link
Author

juliohm commented Feb 26, 2014

@rgommers, sure, I'm too busy with my masters at the moment, but I'll try to find time in the future to look at it carefully.

Anyways, is the @nicodelpiano patch okay for now? At least we can type pdf() and logpdf() when it's needed.

@argriffing
Copy link
Contributor

For gaussian kde the pdf appears to be a sum instead of a product; in this situation it is less straightforward to work in log space. But it could be possible using functions like logsumexp.

@rgommers
Copy link
Member

PR merged in c0864c9.

@rgommers rgommers added this to the 0.15.0 milestone May 10, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement A new feature or improvement good first issue Good topic for first contributor pull requests, with a relatively straightforward solution scipy.stats
Projects
None yet
Development

No branches or pull requests

4 participants