KDEUnivariate with weights #823

jseabold opened this Issue May 9, 2013 · 3 comments


None yet

2 participants

jseabold commented May 9, 2013

What should evaluate do in the case of weighted KDE?

import numpy as np
import matplotlib.pyplot as plt
from statsmodels.nonparametric.kde import KDEUnivariate
bimodal = np.concatenate([10+np.random.randn(100), 30+np.random.randn(100)])
kde = KDEUnivariate(bimodal)
x = np.linspace(0, 40, 256)
w1 = np.concatenate([ones(100), zeros(100)])
kde.fit(weights=w1, fft=False)
plt.plot(x, kde.evaluate(x));

Compare this to

plt.plot(kde.support, kde.density)

Non-optional weights argument to evaluate, etc.?


I think this is a bug in evaluate that it doesn't take weights
into account.

The way I understand the code after browsing a bit:

fit() sets the bandwidth (without taking weights into account)
calculates density for given points

evaluate() just calls the kernel which has bandwidth from fit() and
evaluates without taking weights into account.

I didn't look at weighted KDE, however, I think we can interpret
weights like weights in RLM, downweight certain observations.
Then we want to evaluate at a point assuming the weight on the point
is one, however based on a density that uses the weighted data points.

In the 0-1 case above, observations with weights zero, would be
effectively dropped when evaluating the density.

to support it with fft we would just need to adjust the weights/counts at the grid points, I think

reported at http://comments.gmane.org/gmane.comp.python.pystatsmodels/10970


KDEUnivariate has the weights argument, but it's not listed under Parameters

should be considered a bug.


looks like reference result in unit tests are from Stata, which allows fweights, aweights, and iweights in kdensity

@josef-pkt josef-pkt closed this in c0a62a0 Dec 18, 2013
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment