What should evaluate do in the case of weighted KDE?
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.nonparametric.kde import KDEUnivariate
bimodal = np.concatenate([10+np.random.randn(100), 30+np.random.randn(100)])
kde = KDEUnivariate(bimodal)
x = np.linspace(0, 40, 256)
w1 = np.concatenate([ones(100), zeros(100)])
Compare this to
Non-optional weights argument to evaluate, etc.?
I think this is a bug in evaluate that it doesn't take weights
The way I understand the code after browsing a bit:
fit() sets the bandwidth (without taking weights into account)
calculates density for given points
evaluate() just calls the kernel which has bandwidth from fit() and
evaluates without taking weights into account.
I didn't look at weighted KDE, however, I think we can interpret
weights like weights in RLM, downweight certain observations.
Then we want to evaluate at a point assuming the weight on the point
is one, however based on a density that uses the weighted data points.
In the 0-1 case above, observations with weights zero, would be
effectively dropped when evaluating the density.
to support it with fft we would just need to adjust the weights/counts at the grid points, I think
reported at http://comments.gmane.org/gmane.comp.python.pystatsmodels/10970
KDEUnivariate has the weights argument, but it's not listed under Parameters
should be considered a bug.
looks like reference result in unit tests are from Stata, which allows fweights, aweights, and iweights in kdensity
fweights, aweights, and iweights