pdf() and logpdf() methods for scipy.stats.gaussian_kde #3198

Closed
juliohm opened this Issue Jan 9, 2014 · 7 comments

Comments

Projects
None yet
4 participants

juliohm commented Jan 9, 2014

Please consider adding these two wrappers to the gaussian_kde class for being consistent with the rest of the distributions in scipy.stats:

def pdf(self, x):
    return self.evaluate(x)
def logpdf(self, x):
    return numpy.log(self.evaluate(x))

A Gaussian KDE can be thought as a non-parametric probability distribution.

Owner

rgommers commented Jan 9, 2014

Makes sense.

Contributor

nicodelpiano commented Feb 26, 2014

#3398

I did a pull request for this issue.
Please, tell me what you think.

Cheers!

juliohm commented Feb 26, 2014

@nicodelpiano, your patch is fine, but I think there is a more fundamental issue in the SciPy Gaussian KDE implementation. It should do all the work in log scale to avoid the "vanishing problem" (i.e. probabilities goes to zero very rapidly in high dimensions) and expose it directly as logpdf(). The wrap would be the other way around:

def pdf(self, x):
    return numpy.exp(self.evaluate(x))

def logpdf(self, x):
    return self.evaluate(x)

Among the implementations I've found---SciPy, Statsmodels, Scikit-learn---only Scikit-learn does it correctly.

Of course this change in evaluate() wouldn't be backward compatible.

Owner

rgommers commented Feb 26, 2014

@juliohm that's kind of understandable given the focus of the different implementations. For statistics the high-dimensional use cases aren't common, for machine learning they are.

Might make sense to improve this anyway, but the changes have to be backwards-compatible. Maybe you can open a new issue for this with a proposal of what to change.

juliohm commented Feb 26, 2014

@rgommers, sure, I'm too busy with my masters at the moment, but I'll try to find time in the future to look at it carefully.

Anyways, is the @nicodelpiano patch okay for now? At least we can type pdf() and logpdf() when it's needed.

Contributor

argriffing commented Feb 26, 2014

For gaussian kde the pdf appears to be a sum instead of a product; in this situation it is less straightforward to work in log space. But it could be possible using functions like logsumexp.

Owner

rgommers commented May 10, 2014

PR merged in c0864c9.

@rgommers rgommers closed this May 10, 2014

@rgommers rgommers added this to the 0.15.0 milestone May 10, 2014

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment