Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix intermittently failing Epanechnikov kernel KDE test #1979

Merged
merged 3 commits into from Aug 21, 2019

Conversation

@rcurtin
Copy link
Member

commented Aug 15, 2019

Frequently, the test KDETest/EpanechnikovCoverSingleKDETest fails:

http://ci.mlpack.org/job/docker%20mlpack%20nightly%20build/armadillo_version=armadillo-6.500.5,boost_version=boost_1_49_0,buildmode=debug,compiler_version=gcc-7.2.0,label=docker/lastCompletedBuild/testReport/

[Exception] - difference{37.9501%} between bfEstimations[i]{0.6164662373649441} and treeEstimations[i]{0.99350104568814213} exceeds 8%
 == [File] - /home/jenkins/workspace/docker mlpack nightly build/armadillo_version/armadillo-6.500.5/boost_version/boost_1_49_0/buildmode/debug/compiler_version/gcc-7.2.0/label/docker/repo/src/mlpack/tests/kde_test.cpp
 == [Line] -233

I spent some time digging into this and this was kind of a funny bug to dig into. I first determined that some of the prunes were invalid---it was estimated that the maximum kernel and minimum kernel for a query point and a reference node were both 0... but the actual kernel values were quite large! So, this was confusing, until I noticed that the minimum distance between the query point and reference node was being computed as a negative value!

At this moment it all fell into place: this bug only occurs with a few types of kernels, most specifically not the Gaussian kernel, because if you give the Gaussian kernel a negative distance, it computes a very large value. So it is only kernels that are symmetric about 0 that have this problem. Further, the bug was only in the cover tree computation, so that narrows down the bug further.

Inside the cover tree, the minimum distance between a point and a node was being computed as

d(q, r) - furthestDescendantDistance

where r is the point at the center of the reference node. However, we need to make sure this is not negative! So the fix is quite easy.

We can thank me circa 2013 for this bug: 8a58c9f
:)

I also fixed this situation where it occurs a couple of times in the KDE rules, and everything seems to work well now. I ran the test with the cover tree and Epanechnikov kernel in both single-tree and dual-tree mode 1000+ times and saw no failures, so I think the issue is resolved.

@robertohueso let me know what you think or if I overlooked anything. :)

Copy link
Member

left a comment

Thanks for the fix :) Everything looks good to me and ready to merge.

It was a very specific issue, I'm sure it wasn't easy to find. Looks a lot like one of the bugs we had with PCA evaluation so I'll be more careful with similar ones in the future 🤔.

@@ -115,7 +115,8 @@ Score(const size_t queryIndex, TreeType& referenceNode)
// Don't duplicate calculations.
alreadyDidRefPoint0 = true;
const double furthestDescDist = referenceNode.FurthestDescendantDistance();
minDistance = traversalInfo.LastBaseCase() - furthestDescDist;
minDistance = std::max(traversalInfo.LastBaseCase() - furthestDescDist,
0.0);

This comment has been minimized.

Copy link
@robertohueso

robertohueso Aug 16, 2019

Member

Super minor style question here, should this be lined up with the previous argument traversalInfo.LastBaseCase() - furthestDescDist or does this just apply to method declarations?

This comment has been minimized.

Copy link
@rcurtin

rcurtin Aug 20, 2019

Author Member

Our style guide doesn't prescribe which to do---either is fine. I can line it up with the first argument of the max() if you think it's clearer, just let me know. 👍

@zoq
zoq approved these changes Aug 20, 2019
Copy link
Member

left a comment

Looks good to me, no comment from my side.

@mlpack-bot
mlpack-bot bot approved these changes Aug 21, 2019
Copy link

left a comment

Second approval provided automatically after 24 hours. 👍

@rcurtin rcurtin merged commit 3ff5f9f into mlpack:master Aug 21, 2019
6 checks passed
6 checks passed
LaTeX Documentation Checks Build finished.
Details
Memory Checks Build finished.
Details
Static Code Analysis Checks Build finished.
Details
Style Checks Build finished.
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@rcurtin rcurtin deleted the rcurtin:kde-epan-cover-fix branch Aug 21, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.