New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
meaning of tree failure. #23
Comments
That's extremely odd-that error-check code is there to test the internal consistency of the implementation. Can you provide your data so I can take a look?
|
What it means, essentially, is that during the tree-search part of the neighbor search algorithm, it found zero neighbors for a point. It should not be possible for that to happen. I would really appreciate knowing the details of your dataset and the parameters you were using. I'm guessing you found some sort of edge case, and I should add a check for it. |
Thank you for the explanation. When I largvis my dataset (dim:1600, number of record: 900K) successfully, I try to understand more about our data when dim is small, So I use feature reduction skill to reduce my 1600 dim. Another point is my working machine is not updated to the latest version, version 580b2d2 will run out of memory when doing 70% of randomProjectionTreeSearch, seems use more memory than 654da27. 654da27 can handle all the data (1600*9000K) smoothly. Although you advised me to use gcc 4.9.3 to built 580b2d2 successfully, i return to the old version due to the memory issue. I will try to update to latest version to see if i will see the exception again. |
Can you elaborate on the memory issue and is it possible to see this data? The relevant code in neighbor search hasn't changed in quite some time so memory usage in that phase should be constant. And reducing dims to ~30 shouldn't affect the tree search at all. (What might affect it are na's and Nan's though.) Thank you for reporting this! I'd really appreciate your help nailing it down.
|
https://github.com/sparktsao/casetreefail |
The function does have random behavior as part of the algorithm, but that error should never occur. Thank you for posting the data - I will take a look tonight.
|
Wait a sec... Your log seems to show that the current version performs properly, you're only getting the error on old release 0.1.5 is that right?
|
But why not use the current version?
|
And Yes, latest version only output warning message without 'tree failure'. |
Can you show me the data where the current version died? It should not be less memory efficient at all.
|
Actually - one thing that did change after 0.1.6 was the default parameters. So what may be happening is that it's trying to use default settings, probably for tree_threshold, that are using more ram. The reason for the change is to emulate the settings of the paper authors' reference code. Try tamping-down the tree threshold. They set it way too big on high-D data.
|
@sparktsao I just tried it, and with the default settings, it ran and completed on my machine in less than 3 seconds. It did not take long enough for me to even measure how much RAM was being used. I tried it up to K = 100. (I do need to adjust that progress bar a bit...) The reason why you're getting fewer neighbors found than you're looking for, by the way, is that approximately 1/3 of your dataset are duplicates.
Is there anything else you can do to help me reproduce the issue you're having? |
The data i prepared is the minimum set of data i can reproduce tree failure case in build 654da27, not for memory issue. The default setting change might explain why i met the memory issue, Now I will try to use tree threshold parameters to find a good configuration for my large dataset. I will report if i meet memory issue in the future. thanks so much for helping again. |
Ok I'm going to close this issue. Regarding the tree threshold, I suggest you look at the benchmarks vignette. It includes a detailed discussion of how changing the threshold, the number of trees, and the number of exploration-iterations affects performance, memory usage, and accuracy. It is intended to be helpful to folks dealing with issues like yours -- if it doesn't get you to where you need to go, let me know and I'll try to improve it. |
Thank you for providing the great works,
I got a question that some dataset will lead to "tree failure exception" in the function "copyHeapToMatrix".
What does it means? how can I avoid it when preparing the dataset?
The text was updated successfully, but these errors were encountered: