Investigate a feasibility of a (GBM) prediction consistency check for constrained models #7325

exalate-issue-sync · 2023-05-11T16:13:54Z

In PUBDEV-8313 we discovered a bug that was impacting GBM performance when monotone constraints were enabled.

In this bug, node-prediction was calculated incorrectly for "NA vs REST" type of splits when a split was constrained. The bug was introduced in code refactoring in this code change e1ab951#diff-f35a4389a0b947507f6e6986dcf7ef4c2f7b997e6b23e9933bf7c158e4cefafbR985 and remained unnoticed - not detected by the existing test suite. Bug of this type are generally hard to detect as they might require a specific dataset to trigger (this claim is supported by the fact that [~accountid:5bd237b8dd3cc64b77e71676] was not able to reproduce issue PUBDEV-8313 using a synthetic dataset based on characteristics of the original customers' dataset).

To prevent similar issues in the future we would like to introduce a computationally in-expensive self-check that would detect problematic node splits with incorrect prediction. With this check enabled we should be able to see the algorithm is not behaving as expected locally even though the overall global behavior (predictions are overall improving) seems fine.

It is not our intention to implement a perfect check (that would likely impact performance). We should leverage data that we already collect and if possible estimate a feasible range of the prediction or in other way show that the prediction seems to be off.

The text was updated successfully, but these errors were encountered:

h2o-ops-ro · 2023-05-14T19:01:13Z

JIRA Issue Details

Jira Issue: PUBDEV-8332
Assignee: Veronika Maurerová
Reporter: Michal Kurka
State: In Progress
Fix Version: Backlog
Attachments: N/A
Development PRs: Available

h2o-ops-ro · 2023-05-14T19:01:15Z

Linked PRs from JIRA

#5837

PUBDEV-8332 implement check in DTree

h2o-ops-ro added the fixVersion/Backlog label May 14, 2023

maurever self-assigned this Nov 1, 2023

maurever mentioned this issue Dec 13, 2023

GH-7325 Prediction consistency check for constrained models #5837

Merged

maurever added this to the 3.46.0.1 milestone Feb 1, 2024

maurever added a commit that referenced this issue Feb 9, 2024

GH-7325 Prediction consistency check for constrained models (#5837)

72ae632

PUBDEV-8332 implement check in DTree

maurever closed this as completed Feb 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate a feasibility of a (GBM) prediction consistency check for constrained models #7325

Investigate a feasibility of a (GBM) prediction consistency check for constrained models #7325

exalate-issue-sync bot commented May 11, 2023

h2o-ops-ro commented May 14, 2023

h2o-ops-ro commented May 14, 2023

Investigate a feasibility of a (GBM) prediction consistency check for constrained models #7325

Investigate a feasibility of a (GBM) prediction consistency check for constrained models #7325

Comments

exalate-issue-sync bot commented May 11, 2023

h2o-ops-ro commented May 14, 2023

h2o-ops-ro commented May 14, 2023