CosineTreeTest/CosineNodeCosineSplit test sometimes fails, only on i386 #358

Closed
rcurtin opened this Issue Dec 29, 2014 · 3 comments

Projects

None yet

1 participant

@rcurtin
Member
rcurtin commented Dec 29, 2014

Reported by rcurtin on 7 Jul 44943747 12:11 UTC
I have noticed in the nightly matrix build (http://big.cc.gt.atl.ga.us/job/mlpack%20-%20nightly%20matrix%20build/) that the test CosineTreeTest/CosineNodeCosineSplit sometimes fails, but only on i386:

[- check cosineMax - cosines(i) < cosines(i) - cosineMin failed [0.2321881278243989 >= 0.050073702602425718](Error])
 == [- /home/jenkins/workspace/mlpack - nightly matrix build/arch/linux-i386/armadillo/armadillo-4.400.1/buildmode/release/repo/src/mlpack/tests/cosine_tree_test.cpp
 == [Line](File]) - 112

[- check cosineMax - cosines(i) < cosines(i) - cosineMin failed [0.23254797225922919 >= 0.049713858167595437](Error])
 == [- /home/jenkins/workspace/mlpack - nightly matrix build/arch/linux-i386/armadillo/armadillo-4.400.1/buildmode/release/repo/src/mlpack/tests/cosine_tree_test.cpp
 == [Line](File]) - 112

[- check cosineMax - cosines(i) < cosines(i) - cosineMin failed [0.24496212097773529 >= 0.037299709449089336](Error])
 == [- /home/jenkins/workspace/mlpack - nightly matrix build/arch/linux-i386/armadillo/armadillo-4.400.1/buildmode/release/repo/src/mlpack/tests/cosine_tree_test.cpp
 == [Line](File]) - 112

[- check cosineMax - cosines(i) < cosines(i) - cosineMin failed [0.24307624422077645 >= 0.039185586206048173](Error])
 == [- /home/jenkins/workspace/mlpack - nightly matrix build/arch/linux-i386/armadillo/armadillo-4.400.1/buildmode/release/repo/src/mlpack/tests/cosine_tree_test.cpp
 == [Line](File]) - 112

[- check cosineMax - cosines(i) < cosines(i) - cosineMin failed [0.23793054672682101 >= 0.04433128370000361](Error])
 == [- /home/jenkins/workspace/mlpack - nightly matrix build/arch/linux-i386/armadillo/armadillo-4.400.1/buildmode/release/repo/src/mlpack/tests/cosine_tree_test.cpp
 == [Line](File]) - 112

To reproduce it, do this:

  • Get an i386 system (a VM should work)
  • Modify cosine_tree_test.cpp and add the following to the beginning of CosineTreeTest:
const size_t seed = std::time(NULL);
math::RandomSeed(seed);
std::cout << "Seed: " << seed << std::endl;
  • Now compile and run the test in a bash loop, to find a random seed that causes the issue:
$ while(true); do bin/mlpack_test -t CosineTreeTest/CosineNodeCosineSplit; sleep 1; done

Then you can hardcode the failing random seed for the purposes of debugging.

The cosine tree is built by choosing a basis vector (that is, a point from the dataset), and calculating the cosine distance between that basis vector and all other points. Points with large cosines go to the left child, and points with small cosines go to the right child; the current implemented split is (I think) the median cosine -- this results in a tree where the left and right child have the same number of points. This process is repeated iteratively. (The details are in the paper QUIC-SVD: Fast SVD Using Cosine Trees by Michael Holmes, Charles Isbell, and Alex Gray; I've only described it here just to give an idea.)

The tests appear to be failing because in some cases, points in the right child appear to have larger cosine to the basis vector than points in the left child. I've checked for memory leaks and other issues with valgrind that may only show up on i386 and fixed what I found (missing destructor) in r17485, but this did not fix the issue.

So, the bug may be either in the cosine tree code or the test itself; I don't know enough to say. But anyone wishing to solve this bug should spend a little time understanding the basics of the cosine tree, debugging what (or if) there is a problem for cosine trees that fail the test, and at that point maybe the solution will be clear. For the 1.0.11 release I've commented the test out and noted that the bug is present (r17488).

Siddharth, I've CC'ed you just in case my description is incorrect or anything like that. If you have thoughts, feel free, but if not, don't feel obligated. I would call this relatively low priority since i386 is far less important these days. :)

Migrated-From: http://trac.research.cc.gatech.edu/fastlab/ticket/376

@rcurtin rcurtin self-assigned this Dec 29, 2014
@rcurtin rcurtin added this to the mlpack 1.1.0 milestone Dec 29, 2014
@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by siddharth.950 on 10 Mar 44943832 22:41 UTC
I noticed a problem with the test case, it uses LT(strict inequality) whereas the division of the node is not strict. However, looking at the values that is causing the test to fail, this is clearly not the issue. Also, the values don't seem to suggest a particular edge case that is being missed in the code. i386 is 32-bit, can it be an overflow(desperate speculation)? The description seems fine, unfortunately I cannot contribute more than this right now.

This also reminds me of a problem I had noticed with mlpack tests. We randomly initialize the data matrices used to test the implementations, but in armadillo each time the 'random' values are the same. This practically makes the test use the same matrices again and again and we are less likely to encounter any flaws in the code/test. I think we should set the seed as you have done in the description and then initialize the matrices.

@rcurtin
Member
rcurtin commented Dec 30, 2014

Commented by rcurtin on 6 May 44946312 07:54 UTC
I think the issue you've pointed out with random seeds was fixed in r17395: up until then, math::RandomSeed() wasn't setting the random seed properly for Armadillo matrices. The reason that the seeds aren't initialized to std::time(NULL) or whatever is for the sake of reproducibility. So if a particular test fails on some system, one can simply connect to the machine, get the compiled mlpack_test, and then use that to reproduce the issue. At least, that's the idea. :)

@rcurtin rcurtin removed their assignment Dec 30, 2014
@rcurtin rcurtin added D: difficult and removed C: mlpack labels Jan 8, 2015
@rcurtin rcurtin modified the milestone: mlpack 2.0.1, mlpack 2.0.0 Dec 23, 2015
@rcurtin rcurtin modified the milestone: mlpack 2.0.2, mlpack 2.0.1 Feb 4, 2016
@lozhnikov lozhnikov added a commit to lozhnikov/mlpack that referenced this issue Mar 17, 2016
@lozhnikov lozhnikov Fix bug #358 (CosineTreeTest/CosineNodeCosineSplit test sometimes fai…
…ls, only on i386)
7605291
@lozhnikov lozhnikov added a commit to lozhnikov/mlpack that referenced this issue Mar 17, 2016
@lozhnikov lozhnikov Fix bug #358 (CosineTreeTest/CosineNodeCosineSplit test sometimes fai…
…ls, only on i386)
86f7f2d
@lozhnikov lozhnikov added a commit to lozhnikov/mlpack that referenced this issue Mar 17, 2016
@lozhnikov lozhnikov Fix bug #358 (CosineTreeTest/CosineNodeCosineSplit test sometimes fai…
…ls, only on i386)
e5076db
@rcurtin
Member
rcurtin commented Apr 5, 2016

Fixed in #581.

@rcurtin rcurtin closed this Apr 5, 2016
@rcurtin rcurtin added the R: fixed label Apr 5, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment