pdf() and logpdf() methods for scipy.stats.gaussian_kde #3198

juliohm · 2014-01-09T16:55:03Z

Please consider adding these two wrappers to the gaussian_kde class for being consistent with the rest of the distributions in scipy.stats:

def pdf(self, x):
    return self.evaluate(x)
def logpdf(self, x):
    return numpy.log(self.evaluate(x))

A Gaussian KDE can be thought as a non-parametric probability distribution.

The text was updated successfully, but these errors were encountered:

rgommers · 2014-01-09T20:11:30Z

Makes sense.

nicodelpiano · 2014-02-26T21:41:45Z

#3398

I did a pull request for this issue.
Please, tell me what you think.

Cheers!

juliohm · 2014-02-26T22:21:14Z

@nicodelpiano, your patch is fine, but I think there is a more fundamental issue in the SciPy Gaussian KDE implementation. It should do all the work in log scale to avoid the "vanishing problem" (i.e. probabilities goes to zero very rapidly in high dimensions) and expose it directly as logpdf(). The wrap would be the other way around:

def pdf(self, x):
    return numpy.exp(self.evaluate(x))

def logpdf(self, x):
    return self.evaluate(x)

Among the implementations I've found---SciPy, Statsmodels, Scikit-learn---only Scikit-learn does it correctly.

Of course this change in evaluate() wouldn't be backward compatible.

rgommers · 2014-02-26T22:31:17Z

@juliohm that's kind of understandable given the focus of the different implementations. For statistics the high-dimensional use cases aren't common, for machine learning they are.

Might make sense to improve this anyway, but the changes have to be backwards-compatible. Maybe you can open a new issue for this with a proposal of what to change.

juliohm · 2014-02-26T22:36:20Z

@rgommers, sure, I'm too busy with my masters at the moment, but I'll try to find time in the future to look at it carefully.

Anyways, is the @nicodelpiano patch okay for now? At least we can type pdf() and logpdf() when it's needed.

argriffing · 2014-02-26T22:43:18Z

For gaussian kde the pdf appears to be a sum instead of a product; in this situation it is less straightforward to work in log space. But it could be possible using functions like logsumexp.

rgommers · 2014-05-10T19:42:42Z

PR merged in c0864c9.

rgommers closed this as completed May 10, 2014

rgommers added this to the 0.15.0 milestone May 10, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdf() and logpdf() methods for scipy.stats.gaussian_kde #3198

pdf() and logpdf() methods for scipy.stats.gaussian_kde #3198

juliohm commented Jan 9, 2014

rgommers commented Jan 9, 2014

nicodelpiano commented Feb 26, 2014

juliohm commented Feb 26, 2014

rgommers commented Feb 26, 2014

juliohm commented Feb 26, 2014

argriffing commented Feb 26, 2014

rgommers commented May 10, 2014

pdf() and logpdf() methods for scipy.stats.gaussian_kde #3198

pdf() and logpdf() methods for scipy.stats.gaussian_kde #3198

Comments

juliohm commented Jan 9, 2014

rgommers commented Jan 9, 2014

nicodelpiano commented Feb 26, 2014

juliohm commented Feb 26, 2014

rgommers commented Feb 26, 2014

juliohm commented Feb 26, 2014

argriffing commented Feb 26, 2014

rgommers commented May 10, 2014