Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PUBDEV-6754: provide actual ntree value #4078

Conversation

koniecsveta
Copy link
Contributor

No description provided.

@michalkurka michalkurka changed the title PUBDEV-6754: provide actual ntee value PUBDEV-6754: provide actual ntree value Nov 18, 2019
@@ -953,3 +953,10 @@ def check_constant_response(self, check_constant_response):
self._parms["check_constant_response"] = check_constant_response


"""
Returns actual number of trees in GBM model (whether CV is enabled or not).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this related to CV?

I think we can mention early stopping instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is taken from the ticket description. Will fix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

print("number of trees built with early-stopping is:")
print(prostate_gbm.ntree_actual())

assert prostate_gbm.ntree_actual() == prostate_gbm._model_json['output']['model_summary']['number_of_trees'][0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also suggest checking that the model actually stopped early. That is actual number of trees < 1000.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -3,6 +3,19 @@ def update_param(name, param):
param['values'].remove('ordinal')
return param
return None # param untouched

def class_extensions():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want the same for other tree algos as well? drf, xgboost, isofor?

Copy link
Contributor Author

@koniecsveta koniecsveta Nov 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not specified on the ticket (required only for gbm) but already clarifying in #h2o-3-r-client after @michalkurka s suggestions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added for all tree algos


Type: ``float``
"""
def ntree_actual(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont like the name. Everywhere we use ntrees, my suggestions:

  • ntrees_actual
  • ntrees_built

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taken from the ticket description. Will change :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@honzasterba honzasterba left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

prostate = h2o.import_file(path=pyunit_utils.locate("smalldata/logreg/prostate.csv"))
prostate[1] = prostate[1].asfactor()
prostate.summary()
ntrees_original = 1000
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally I would chose a much lower number here, so that the test runs as fast as possible (same for R test)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test runs faster because of early stopping. trees_original is 1000 to give enough space for build to be early stopped and to be illustrative / that there is a difference between trees_original and trees_actual (so it builds ~20 trees for each model).

@koniecsveta koniecsveta force-pushed the zuzanao_PUBDEV-6754-Add_method_to_provide_actual_ntree_value branch from 3026cce to a5435b6 Compare November 20, 2019 17:16
@@ -390,6 +390,20 @@ def scoring_history(self):
print("No score history for this model")


def ntrees_actual(self):
"""
Returns actual number of trees in GBM model. If early stopping enabled, GBM can reset the ntrees value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mentions GBM when in facts applies to other algos

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -390,6 +390,20 @@ def scoring_history(self):
print("No score history for this model")


def ntrees_actual(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I like this approach, this adds this method to models where it does not make sense. @michalkurka what do you think?

Copy link
Contributor

@michalkurka michalkurka Nov 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point @honzasterba, this is however a common approach - this class already does have algo-specific method.

I would keep this for now.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you want to make a jira - I think we can fix this in the future (we can eg. dynamically remove methods from the instances).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👋

@koniecsveta koniecsveta force-pushed the zuzanao_PUBDEV-6754-Add_method_to_provide_actual_ntree_value branch from aff8d27 to 183e075 Compare November 22, 2019 09:00
@koniecsveta koniecsveta merged commit 4a3f679 into rel-yau Nov 22, 2019
@koniecsveta koniecsveta deleted the zuzanao_PUBDEV-6754-Add_method_to_provide_actual_ntree_value branch November 22, 2019 13:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants