Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG max_depth=1 should be decision stump in HistGradientBoosti… #16182

Merged
merged 7 commits into from Jan 28, 2020

Conversation

SanthoshBala18
Copy link
Contributor

Reference Issues/PRs

Fixes #16124

What does this implement/fix? Explain your changes.

Modified the validation in TreeGrower to allow max_depth value as 1.

@SanthoshBala18 SanthoshBala18 changed the title Fixes issue #16124 Allow max_depth "1" in HistGradientBoosting Jan 23, 2020
@glemaitre glemaitre changed the title Allow max_depth "1" in HistGradientBoosting BUG max_depth=1 should be decision stump in HistGradientBoosting Jan 23, 2020
@glemaitre
Copy link
Contributor

You should look at the tests which are failing. I assume that there a lot of places where we check the number of nodes or the depth where the count will be off now.

@thomasjpfan
Copy link
Member

There is a few places where the depth is stored with the original definition in mind. For example, the node_struct stores depth based on the original definition.

This needs a test to assert that the predictors are stumps with max_depth==1.

@SanthoshBala18
Copy link
Contributor Author

@thomasjpfan, the test "test_max_depth" in test_grower.py asserts the tree depth to be 1 when max_depth is set to 1. If we need something more, can you please guide me on what needs to be tested.?

@thomasjpfan
Copy link
Member

The nodes stored in the predictor uses the previous definition of depth.

For this PR, one needs to go through the comments and docstrings in the _hist module to make sure it is using the updated definition of depth.

@NicolasHug
Copy link
Member

The nodes stored in the predictor uses the previous definition of depth.

I don't think that's true, the predictor nodes use the same convention as the grower node since we just do node['depth'] = grower_node.depth.

However yes, docstrings should be updated

@thomasjpfan
Copy link
Member

I don't think that's true, the predictor nodes use the same convention as the grower node since

On master, the root node is set to depth 0 and the children are set according, which means the node's depth was set using the tree-depth definition: "The depth of a node is the length of the path to its root". It looks like it was impossible to create stumps before this PR.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @SanthoshBala18 , looks good.

Please also udpate the docstrings of TreeGrower and HistGradientBoostingClassifier.

Let's also add a non-regression test to make sure that we have a stump when we pass max_depth=1, with e.g.

def assert_is_stump(grower):
	for leaf in (grower.root.left_child, grower.root.right_child):
		assert leaf.left_child is None
		assert leaf.right_child is None

you can use that in test_max_depth, or create a similar new test.

@@ -689,8 +689,7 @@ class HistGradientBoostingRegressor(RegressorMixin, BaseHistGradientBoosting):
than 1. If None, there is no maximum limit.
max_depth : int or None, optional (default=None)
The maximum depth of each tree. The depth of a tree is the number of
nodes to go from the root to the deepest leaf. Must be strictly greater
than 1. Depth isn't constrained by default.
nodes to go from the root to the deepest leaf. Depth isn't constrained by default.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
nodes to go from the root to the deepest leaf. Depth isn't constrained by default.
edges to go from the root to the deepest leaf. Depth isn't constrained by default.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also update the classifier's docstring

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicolasHug Thanks for the review. I have updated the docstrings and also added a test accordingly.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @SanthoshBala18 , LGTM

Please add an entry to the change log at doc/whats_new/v0.23.rst. Like the other entries there, please reference this pull request with :issue: and credit yourself with :user:.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -90,6 +90,9 @@ Changelog
:user:`Reshama Shaikh <reshamas>`, and
:user:`Chiara Marmo <cmarmo>`.

- |Fix| Fixed HistGradientBoosting to allow max_depth=1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- |Fix| Fixed HistGradientBoosting to allow max_depth=1
- |Fix| Changed the convention for `max_depth` parameter of :class:`ensemlble.HistGradientBoostingClassifier` and :class:`ensemlble.HistGradientBoostingRegressor`. The depth now corresponds to the number of edges to go from the root to the deepest leaf. Stumps (trees with one split) are now allowed.

(skip lines where appropriate ;) )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my bad. Would update the changes properly from next time :)

@@ -90,6 +90,13 @@ Changelog
:user:`Reshama Shaikh <reshamas>`, and
:user:`Chiara Marmo <cmarmo>`.

- |Fix| Changed the convention for `max_depth` parameter of
:class:`ensemlble.HistGradientBoostingClassifier` and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
:class:`ensemlble.HistGradientBoostingClassifier` and
:class:`ensemble.HistGradientBoostingClassifier` and

same below ;)

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan changed the title BUG max_depth=1 should be decision stump in HistGradientBoosting BUG max_depth=1 should be decision stump in HistGradientBoosti… Jan 28, 2020
@thomasjpfan thomasjpfan merged commit cc2fbed into scikit-learn:master Jan 28, 2020
@thomasjpfan
Copy link
Member

Thank you @SanthoshBala18 for the PR!

@SanthoshBala18 SanthoshBala18 deleted the bug-#16124 branch January 29, 2020 05:42
thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Feb 22, 2020
panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Off-by-one in HistGradientBoosting max_depth
5 participants