Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binding test for hoeffding_tree #1211

Merged
merged 10 commits into from
Feb 16, 2018

Conversation

haritha1313
Copy link
Contributor

@haritha1313 haritha1313 commented Jan 23, 2018

{
arma::mat inputData;
DatasetInfo info;
if (!data::Load("tae.csv", inputData, info))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could reuse an existing dataset here e.g. thyroid or reconstruct the iris dataset, the repo size is already quite high, so I think whenever we can we should avoid adding datasets. Let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, a dataset with last column as labels was needed, but modifying iris would do the trick too. 👍

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tests! This is a good start but I think there are still some other things we should test. You can add them here in this PR if you like, or we can merge this and then we can note that more testing is needed in #1152---just let me know what you'd like to do.

We should test these options:

  • min_samples: maybe we can build two models, one with very small min_samples and one with very large min_samples, and ensure that the model with small min_samples is larger (i.e. has more nodes in it).
  • max_samples: same idea as min_samples, I think that will work.
  • confidence: a model with lower confidence should split more often, so we can have the same strategy as min_samples and max_samples.
  • numeric_split_strategy: when using the 'domingos' strategy, we can use a numeric dataset and check that as we change 'bins', the number of children of the root node change. For the 'binary' strategy, we can change 'bins' and ensure that we always have 2 children.
  • passes: we can check that a tree with more passes has more children (since it will split more).
  • observations_before_binning: hmm, I think that maybe we can set min_samples to 10, use the 'domingos' split, set observations_before_binning to 100, set the confidence really low, and then feed it only 50 samples and ensure that it didn't split (since it shouldn't split until observations_before_binning samples are seen).

Maybe there are better ways to do those tests... those are just the ideas off the top of my head. Anyway, let me know how you want to proceed. If you want to merge this now and the other tests can be done by someone else (or later by you) that's just fine of course.

@haritha1313
Copy link
Contributor Author

@rcurtin Thanks for the review. I can work on adding the tests, no problem. Let this PR remain open, I'll commit them in a few days time since I'm also trying to implement something in src/mlpack/methods/ann (and have exams coming too).

@rcurtin
Copy link
Member

rcurtin commented Jan 25, 2018

Sure, no huge hurry. Good luck with your exams! :)

@rcurtin
Copy link
Member

rcurtin commented Feb 5, 2018

Just FYI, this probably needs to be updated to handle the changes in #1214. Here is an example of the type of changes that might need to be made:
https://github.com/mlpack/mlpack/pull/1214/files#diff-8fcb02fea0b4d1d0ee48e346e252a10b

@haritha1313
Copy link
Contributor Author

@rcurtin Regarding the test for numeric_split_strategy, I don't think there exists any method to access the number of children of any node in the HoeffdingTreeModel class. Shall I write a utility function similar to HoeffdingTreeModel::NumNodes(), which can count NumChildren() or is there some other way around?

@rcurtin
Copy link
Member

rcurtin commented Feb 7, 2018

Hmm, I think that NumNodes() should work fine for you here---it should print the full number of nodes in the tree, which is sufficient for the test. If the tree only has one level (which you can control by passing in the right amount of data, or setting max_samples carefully), then I think it will be fine. Let me know you'd like me to clarify what I mean; I hope what I wrote makes sense but I am not sure. :)

@haritha1313
Copy link
Contributor Author

I think the tests are working fine although I can't understand the travis build problem. Please let me know if any changes are to be made.

@zoq
Copy link
Member

zoq commented Feb 8, 2018

We are working on a fix for the travis build: travis-ci/travis-ci#9129 and #1221.


// Check that both models have different number of nodes.
BOOST_CHECK_PREDICATE(std::not_equal_to<int>(),
((CLI::GetParam<HoeffdingTreeModel*>("output_model"))->NumNodes())(nodes));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use BOOST_REQUIRE_NE(...) here? Seems to me like that would be simpler.

@rcurtin
Copy link
Member

rcurtin commented Feb 8, 2018

All of the tests look good to me, so if you want to handle the one comment I left I think we can get this merged. Thanks for the nice tests! :)

@haritha1313
Copy link
Contributor Author

I have made the edit (some silly confusion caused it). It is great that the travis issue is being solved, failing travis build was becoming a persistent thing. :)

Copy link
Member

@rcurtin rcurtin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks for the contribution! I'll go ahead and merge this in 4 days to leave time in case anyone else has any other comments.

Copy link
Member

@zoq zoq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, no comments from my end.

@rcurtin rcurtin merged commit 617a2ac into mlpack:master Feb 16, 2018
@rcurtin
Copy link
Member

rcurtin commented Feb 16, 2018

Thanks for the nice contribution! :)

@haritha1313 haritha1313 deleted the hoeffding_binding_test branch February 16, 2018 15:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants