-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PUBDEV-6754: provide actual ntree value #4078
PUBDEV-6754: provide actual ntree value #4078
Conversation
h2o-py/h2o/estimators/gbm.py
Outdated
@@ -953,3 +953,10 @@ def check_constant_response(self, check_constant_response): | |||
self._parms["check_constant_response"] = check_constant_response | |||
|
|||
|
|||
""" | |||
Returns actual number of trees in GBM model (whether CV is enabled or not). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this related to CV?
I think we can mention early stopping instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is taken from the ticket description. Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
print("number of trees built with early-stopping is:") | ||
print(prostate_gbm.ntree_actual()) | ||
|
||
assert prostate_gbm.ntree_actual() == prostate_gbm._model_json['output']['model_summary']['number_of_trees'][0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would also suggest checking that the model actually stopped early. That is actual number of trees < 1000.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
@@ -3,6 +3,19 @@ def update_param(name, param): | |||
param['values'].remove('ordinal') | |||
return param | |||
return None # param untouched | |||
|
|||
def class_extensions(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want the same for other tree algos as well? drf, xgboost, isofor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not specified on the ticket (required only for gbm) but already clarifying in #h2o-3-r-client after @michalkurka s suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added for all tree algos
|
||
Type: ``float`` | ||
""" | ||
def ntree_actual(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont like the name. Everywhere we use ntrees, my suggestions:
- ntrees_actual
- ntrees_built
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taken from the ticket description. Will change :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
prostate = h2o.import_file(path=pyunit_utils.locate("smalldata/logreg/prostate.csv")) | ||
prostate[1] = prostate[1].asfactor() | ||
prostate.summary() | ||
ntrees_original = 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ideally I would chose a much lower number here, so that the test runs as fast as possible (same for R test)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test runs faster because of early stopping. trees_original is 1000 to give enough space for build to be early stopped and to be illustrative / that there is a difference between trees_original and trees_actual (so it builds ~20 trees for each model).
3026cce
to
a5435b6
Compare
h2o-py/h2o/model/model_base.py
Outdated
@@ -390,6 +390,20 @@ def scoring_history(self): | |||
print("No score history for this model") | |||
|
|||
|
|||
def ntrees_actual(self): | |||
""" | |||
Returns actual number of trees in GBM model. If early stopping enabled, GBM can reset the ntrees value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mentions GBM when in facts applies to other algos
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
@@ -390,6 +390,20 @@ def scoring_history(self): | |||
print("No score history for this model") | |||
|
|||
|
|||
def ntrees_actual(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I like this approach, this adds this method to models where it does not make sense. @michalkurka what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @honzasterba, this is however a common approach - this class already does have algo-specific method.
I would keep this for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to make a jira - I think we can fix this in the future (we can eg. dynamically remove methods from the instances).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👋
aff8d27
to
183e075
Compare
No description provided.