Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
Fix intermittently failing Epanechnikov kernel KDE test #1979
Frequently, the test
I spent some time digging into this and this was kind of a funny bug to dig into. I first determined that some of the prunes were invalid---it was estimated that the maximum kernel and minimum kernel for a query point and a reference node were both 0... but the actual kernel values were quite large! So, this was confusing, until I noticed that the minimum distance between the query point and reference node was being computed as a negative value!
At this moment it all fell into place: this bug only occurs with a few types of kernels, most specifically not the Gaussian kernel, because if you give the Gaussian kernel a negative distance, it computes a very large value. So it is only kernels that are symmetric about 0 that have this problem. Further, the bug was only in the cover tree computation, so that narrows down the bug further.
Inside the cover tree, the minimum distance between a point and a node was being computed as
We can thank me circa 2013 for this bug: 8a58c9f
I also fixed this situation where it occurs a couple of times in the KDE rules, and everything seems to work well now. I ran the test with the cover tree and Epanechnikov kernel in both single-tree and dual-tree mode 1000+ times and saw no failures, so I think the issue is resolved.
@robertohueso let me know what you think or if I overlooked anything. :)
robertohueso left a comment
Thanks for the fix :) Everything looks good to me and ready to merge.
It was a very specific issue, I'm sure it wasn't easy to find. Looks a lot like one of the bugs we had with PCA evaluation so I'll be more careful with similar ones in the future