Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H2OIsolationForestEstimator: clarify formula used for score calculation #7412

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 4 comments
Closed
Assignees

Comments

@exalate-issue-sync
Copy link

I am trying to use Isolation Forest for fraud detection project at one of the world's largest audit company. Thus, it's crucial for us to know which formula is used to compute the score (using predict method of H2OIsolationForestEstimator). Could you please point me to the underlying formula used for calculation of the score as well as to the corresponding code chunk in the source code

@exalate-issue-sync
Copy link
Author

Adam Valenta commented: Hello Roman,

thank you for your question. We will definitely put this information to the documentation.
The “{{predict}}“ column in the output of {{H2OIsolationForestEstimator }}predict method is normalized {{mean_length}}.

[https://github.com/h2oai/h2o-3/blob/753e852e2d193e59e39a3263f90c9fac0fed74c1/h2o-algos/src/main/java/hex/tree/isofor/IsolationForestModel.java#L146|https://github.com/h2oai/h2o-3/blob/753e852e2d193e59e39a3263f90c9fac0fed74c1/h2o-algos/src/main/java/hex/tree/isofor/IsolationForestModel.java#L146]

[https://github.com/h2oai/h2o-3/blob/753e852e2d193e59e39a3263f90c9fac0fed74c1/h2o-algos/src/main/java/hex/tree/isofor/IsolationForestModel.java#L162|https://github.com/h2oai/h2o-3/blob/753e852e2d193e59e39a3263f90c9fac0fed74c1/h2o-algos/src/main/java/hex/tree/isofor/IsolationForestModel.java#L162]

Where {{minPathLength }}and {{maxPathLength}} are assigned in training. It can happen that anomalous point has “predict“ value > 1.
Higher {{predict}} value mean “more anomalous“ point. We are not using the formula from Isolation Forest paper (Equation (2)) nor the estimation of average path length of unsuccessful search (Equation (2)).

Just a suggestion for you, we are planning Extended Isolation Forest in the next new major release version. But you can experiment with it right now! [https://h2o-release.s3.amazonaws.com/h2o/master/5512/index.html|https://h2o-release.s3.amazonaws.com/h2o/master/5512/index.html|smart-link]

@exalate-issue-sync
Copy link
Author

Adam Valenta commented: Documentation for Isolation Forest and Extended Isolation Forest is updated,

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

JIRA Issue Details

Jira Issue: PUBDEV-8241
Assignee: Adam Valenta
Reporter: Roman Lykhnenko
State: Resolved
Fix Version: 3.32.1.6
Attachments: N/A
Development PRs: Available

@h2o-ops
Copy link
Collaborator

h2o-ops commented May 14, 2023

Linked PRs from JIRA

#5629

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants