New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DET has only one leaf prior to pruning every time #515

Closed
jeroneandrews opened this Issue Feb 4, 2016 · 4 comments

Comments

Projects
None yet
3 participants
@jeroneandrews

jeroneandrews commented Feb 4, 2016

For some reason the DET keeps giving me only one leaf node, prior to pruning, and after pruning - for several datasets I have tried. Here is one of the datasets (attached):

5vs2Train1.txt

It contains 896 observations of the MNIST digit 5.

Any help towards solving this issue will be greatly appreciated.

Thanks in advance.

Output from terminal:
Jerones-MacBook:5vs2 Jerone$ mlpack_det -t 5vs2Train1.txt -T 5vs2Test.txt -e aTrainEst.txt -E aTestEst.txt -M aOutput.txt -v -f 0
[INFO ] Loading '5vs2Train1.txt' as CSV data. Size is 784 x 896.
[INFO ] Performing leave-one-out cross validation.
[INFO ] 1 leaf nodes in the tree using full dataset; minimum alpha: 1.79769e+308.
[INFO ] 1 trees in the sequence; maximum alpha: 0.
[INFO ] Optimal alpha: -1.
[INFO ] 1 leaf nodes in the optimally pruned tree; optimal alpha: -1.79769e+308.
[INFO ] Saving raw ASCII formatted data to 'aTrainEst.txt'.
[INFO ] Loading '5vs2Test.txt' as CSV data. Size is 784 x 1536.
[INFO ] Saving raw ASCII formatted data to 'aTestEst.txt'.
[INFO ]
[INFO ] Execution parameters:
[INFO ] folds: 0
[INFO ] help: false
[INFO ] info: ""
[INFO ] input_model_file: ""
[INFO ] max_leaf_size: 10
[INFO ] min_leaf_size: 5
[INFO ] output_model_file: aOutput.txt
[INFO ] test_file: 5vs2Test.txt
[INFO ] test_set_estimates_file: aTestEst.txt
[INFO ] training_file: 5vs2Train1.txt
[INFO ] training_set_estimates_file: aTrainEst.txt
[INFO ] verbose: true
[INFO ] version: false
[INFO ] vi_file: ""
[INFO ]
[INFO ] Program timers:
[INFO ] cross_validation: 12.433051s
[INFO ] det_estimation_time: 0.001443s
[INFO ] det_test_set_estimation: 0.002492s
[INFO ] det_training: 12.465670s
[INFO ] loading_data: 1.454027s
[INFO ] saving_data: 0.002742s
[INFO ] total_time: 13.933760s

@jeroneandrews jeroneandrews reopened this Feb 4, 2016

@rcurtin rcurtin added the T: defect label Feb 4, 2016

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Feb 10, 2016

Member

Diagnosis: the log negative error of a DET is defined as

R(t) = log(|t|^2 / (N^2 V_t)).

At the first level of this tree, the volume of the node is the entire volume spanned by the data. i.e. V = the width of every dimension multiplied together. But some dimensions have width 0 in this dataset, so, V = 0 and R(t) = inf.

I don't yet know how I want to handle this problem for the mlpack code; I need to review the paper and maybe send Pari an email or something depending on what I can come up with.

A quick solution is to add tiny bits of noise to your data points, or to drop any dimensions that have zero range (i.e. where all of the rows have 0 in that dimension).

I'll keep digging and let you know what I think of.

Member

rcurtin commented Feb 10, 2016

Diagnosis: the log negative error of a DET is defined as

R(t) = log(|t|^2 / (N^2 V_t)).

At the first level of this tree, the volume of the node is the entire volume spanned by the data. i.e. V = the width of every dimension multiplied together. But some dimensions have width 0 in this dataset, so, V = 0 and R(t) = inf.

I don't yet know how I want to handle this problem for the mlpack code; I need to review the paper and maybe send Pari an email or something depending on what I can come up with.

A quick solution is to add tiny bits of noise to your data points, or to drop any dimensions that have zero range (i.e. where all of the rows have 0 in that dimension).

I'll keep digging and let you know what I think of.

@jeroneandrews

This comment has been minimized.

Show comment
Hide comment
@jeroneandrews

jeroneandrews Feb 11, 2016

Thanks for your help. I'll try adding a bit of noise as a temporary solution.

jeroneandrews commented Feb 11, 2016

Thanks for your help. I'll try adding a bit of noise as a temporary solution.

@rcurtin

This comment has been minimized.

Show comment
Hide comment
@rcurtin

rcurtin Feb 18, 2016

Member

I talked with Pari and we decided that the best idea was just to ignore the zero-variance dimensions in the log negative error calculation. This change has been made in 4e069ab and should fix your issue, so there should be no more need to add noise. Let me know if it doesn't and we can reopen the ticket. Thanks for reporting the issue! :)

Member

rcurtin commented Feb 18, 2016

I talked with Pari and we decided that the best idea was just to ignore the zero-variance dimensions in the log negative error calculation. This change has been made in 4e069ab and should fix your issue, so there should be no more need to add noise. Let me know if it doesn't and we can reopen the ticket. Thanks for reporting the issue! :)

@rcurtin rcurtin closed this Feb 18, 2016

@rcurtin rcurtin added the R: fixed label Feb 18, 2016

@rcurtin rcurtin added this to the mlpack 2.0.2 milestone Feb 18, 2016

@jaelim

This comment has been minimized.

Show comment
Hide comment
@jaelim

jaelim Apr 13, 2016

@jeroneandrews Hi, Could you please provide your other test file?

jaelim commented Apr 13, 2016

@jeroneandrews Hi, Could you please provide your other test file?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment