weighted KDE #4394
weighted KDE #4394
Comments
I think that wouldn't be too hard to add but @jakevdp knows better. |
Thats good news. |
It's actually not trivial, because of the fast tree-based KDE that sklearn uses. Currently, nodes are ranked by distance and the local estimate is updated until it can be shown that the desired tolerance has been reached. With non-uniform weights, the ranking procedure would have to be based on a combination of minimum distance and maximum weight in each node, which would require a slightly different KD-tree/Ball tree traversal algorithm, along with an updated node data structure to store those weights. It would be relatively easy to add a slower brute-force version of KDE which supports weighted points, however. |
Hum, for some reason I thought the trees did support weights. I guess I was confused by the weighting in KNN which is much easier to implement. |
Quick question – I've heard a number of requests for this feature. Though it would be difficult to implement for the tree-based KDE, it would be relatively straightforward to add an Do you think that would be a worthwhile contribution? |
I think it would. In practice it means it would it would only be practical for small-ish data sets of course, but I don't see that as not a good reason to implement it. |
Just a comment - for low dimensional data sets statsmodels already has a weighted KDE. |
It would also be extremely convenient for me if there was a version of the algorithm that accepted weights. I think it's a very important feature and surprisingly almost none of the python libraries have it. Statsmodels does have it, but only for univariate KDE; for multivariate KDE the feature is also missing. |
2 years have passed since this issue was opened and it hasn't been solved yet |
Do you want to contribute it? Go ahead! |
Hi, I'm interested in this too. What about this? https://gist.github.com/afrendeiro/9ab8a1ea379030d10f17 I can ask and try and integrate this into sklearn if you think it's fine. |
Hi, I've been working on this lately. |
(scikit learn v0.20.0) Using 'score_samples()' function after fitting a kernel density with 'sample_weight' in Jupyter notebook forces the kernel to restart constantly (it does produce any output). I used a numpy array with shape (2305, 2) for the training dataset and a numpy array with shape (2305,) for the weights. I am able to get results (see image below) when using the the same function without a sample_weight array. I assume this is a known bug!?? |
I'm not sure why you would think this was a known bug. Please open a new
issue with examine code
|
Not sure this is the correct place, but I would very much appreciate the ability to
pass a weight for each sample in kde density estimation.
There exits a adapted version of scipy.stats.gaussian_kde :
http://stackoverflow.com/questions/27623919/weighted-gaussian-kernel-density-estimation-in-python
The text was updated successfully, but these errors were encountered: