-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
predict_leaf_node_assignment path/terminal node doesn't match #8048
Comments
Satish Maruvada commented: Could we implement this a part of a patch release? The customer has/is pinging us multiple times for this. [~accountid:5d3a21774ee45b0c8fec4afb] fyi! |
Niki Athanasiadou commented: Yes, there is a lot of interest in leaf node path assignments there, for h2o-3 as well as DAI. Thanks [~accountid:5d6fedeb294c590c3ae52968]. |
Michal Kurka commented: Fixed for tree with {{max_depth}} to 63. For {{max_depth}} 64 or higher the code will no longer produce incorrect results, it will instead return {{NA}} for tree paths and {{-1}} for node ids (for path and nodes that “too” deep). [~accountid:5d6fedeb294c590c3ae52968] [~accountid:5d3a21774ee45b0c8fec4afb] it is able to make the code work for any tree depth, however, this change is more involved and IMHO out of the scope of a fix release. If such functionality is desired please file a new jira. |
Michal Kurka commented: [~accountid:557058:6e44bc1a-dd50-499b-a331-2e049f28773b] what would be good place to document these limitations? |
Angela Bartz commented: [~accountid:557058:04659f86-fbfe-4d01-90c9-146c34df6ee6] I think it can go in a couple of places. We can add a note in the "Predict Leaf Node Assignment" section: I think we can also add information about how the value of max_depth affects leaf node prediction in the max_depth parameter appendix entry: |
It’s possible for {{model.predict_leaf_node_assignment(df, type='path')}} to produce a different number of unique elements than {{model.predict_leaf_node_assignment(df, type='Node_ID')}}.
{code:python}import h2o
from h2o.estimators.random_forest import H2ORandomForestEstimator
from h2o.tree import H2OTree
h2o.init(min_mem_size='5G')
model = H2ORandomForestEstimator(
ntrees = 1,
max_depth = 1000
)
df = h2o.import_file('/Users/jgranados/datasets/kaggle/ieee-fraud-detection/train_transaction.csv')
x = df.columns
y = 'isFraud'
x.remove(y)
model.train(x=x, y=y, training_frame=df)
tree = H2OTree(model = model, tree_number = 0)
path_assignment = model.predict_leaf_node_assignment(df, type='path')
node_assignment = model.predict_leaf_node_assignment(df, type='Node_ID')
len(path_assignment.unique()) == len(node_assignment.unique()){code}
The text was updated successfully, but these errors were encountered: