Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DRF can create incorrect leaf nodes due to rounding errors #7407

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 2 comments
Closed

DRF can create incorrect leaf nodes due to rounding errors #7407

exalate-issue-sync bot opened this issue May 11, 2023 · 2 comments

Comments

@exalate-issue-sync
Copy link

Observations can be misclassified into incorrect leaves

Tree can have a total weight that is larger than the total actual weight of the observations of the dataset

Repro on 3.33.1.5511 example:

{code:python}import h2o
from h2o.estimators import H2ORandomForestEstimator
h2o.init()

h2o_df = h2o.import_file("/Users/jgranados/datasets/BNPParibas.csv")

predictors = h2o_df.columns
response = "v58"
predictors.remove(response)

cars_drf = H2ORandomForestEstimator(
ntrees=1,
max_depth=10,
min_rows=1,
nbins=4096,
nbins_top_level=4096,
min_split_improvement=1e-06,
seed=12
)
cars_drf.train(x=predictors,
y=response,
training_frame=h2o_df,
)

from h2o.tree import H2OTree

node_assignments_num = cars_drf.predict_leaf_node_assignment(h2o_df, type="Node_ID")

tree = H2OTree(model = cars_drf, tree_number=0)
children = []
for i in range(0, len(tree)):
if (tree.left_children[i] == -1):
children.append(tree.node_ids[i])

print(len(node_assignments_num.unique()))
print(len(children))
len(node_assignments_num.unique()) == len(children){code}

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Details

Jira Issue: PUBDEV-8246
Assignee: Michal Kurka
Reporter: Joseph Granados
State: Resolved
Fix Version: 3.32.1.5
Attachments: N/A
Development PRs: Available

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

Linked PRs from JIRA

#5583
#5590

@h2o-ops h2o-ops closed this as completed May 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant