Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

predict_leaf_node_assignment path/terminal node doesn't match #8048

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 6 comments
Closed

predict_leaf_node_assignment path/terminal node doesn't match #8048

exalate-issue-sync bot opened this issue May 11, 2023 · 6 comments

Comments

@exalate-issue-sync
Copy link

It’s possible for {{model.predict_leaf_node_assignment(df, type='path')}} to produce a different number of unique elements than {{model.predict_leaf_node_assignment(df, type='Node_ID')}}.

{code:python}import h2o
from h2o.estimators.random_forest import H2ORandomForestEstimator
from h2o.tree import H2OTree

h2o.init(min_mem_size='5G')

model = H2ORandomForestEstimator(
ntrees = 1,
max_depth = 1000
)

df = h2o.import_file('/Users/jgranados/datasets/kaggle/ieee-fraud-detection/train_transaction.csv')

x = df.columns
y = 'isFraud'
x.remove(y)

model.train(x=x, y=y, training_frame=df)

tree = H2OTree(model = model, tree_number = 0)

path_assignment = model.predict_leaf_node_assignment(df, type='path')
node_assignment = model.predict_leaf_node_assignment(df, type='Node_ID')

len(path_assignment.unique()) == len(node_assignment.unique()){code}

@exalate-issue-sync
Copy link
Author

Satish Maruvada commented: Could we implement this a part of a patch release? The customer has/is pinging us multiple times for this. [~accountid:5d3a21774ee45b0c8fec4afb] fyi!

@exalate-issue-sync
Copy link
Author

Niki Athanasiadou commented: Yes, there is a lot of interest in leaf node path assignments there, for h2o-3 as well as DAI. Thanks [~accountid:5d6fedeb294c590c3ae52968].

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: Fixed for tree with {{max_depth}} to 63. For {{max_depth}} 64 or higher the code will no longer produce incorrect results, it will instead return {{NA}} for tree paths and {{-1}} for node ids (for path and nodes that “too” deep).

[~accountid:5d6fedeb294c590c3ae52968] [~accountid:5d3a21774ee45b0c8fec4afb] it is able to make the code work for any tree depth, however, this change is more involved and IMHO out of the scope of a fix release. If such functionality is desired please file a new jira.

@exalate-issue-sync
Copy link
Author

Michal Kurka commented: [~accountid:557058:6e44bc1a-dd50-499b-a331-2e049f28773b] what would be good place to document these limitations?

@exalate-issue-sync
Copy link
Author

Angela Bartz commented: [~accountid:557058:04659f86-fbfe-4d01-90c9-146c34df6ee6] I think it can go in a couple of places. We can add a note in the "Predict Leaf Node Assignment" section:
https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/product/performance-and-prediction.rst#predicting-leaf-node-assignment

I think we can also add information about how the value of max_depth affects leaf node prediction in the max_depth parameter appendix entry:
https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/product/data-science/algo-params/max_depth.rst

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Migration Info

Jira Issue: PUBDEV-7590
Assignee: Michal Kurka
Reporter: Joseph Granados
State: Resolved
Fix Version: 3.30.0.5
Attachments: N/A
Development PRs: Available

Linked PRs from JIRA

#4673
#4679
#4694

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant