Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate a feasibility of a (GBM) prediction consistency check for constrained models #7325

Closed
exalate-issue-sync bot opened this issue May 11, 2023 · 2 comments
Assignees
Milestone

Comments

@exalate-issue-sync
Copy link

In PUBDEV-8313 we discovered a bug that was impacting GBM performance when monotone constraints were enabled.

In this bug, node-prediction was calculated incorrectly for "NA vs REST" type of splits when a split was constrained. The bug was introduced in code refactoring in this code change e1ab951#diff-f35a4389a0b947507f6e6986dcf7ef4c2f7b997e6b23e9933bf7c158e4cefafbR985 and remained unnoticed - not detected by the existing test suite. Bug of this type are generally hard to detect as they might require a specific dataset to trigger (this claim is supported by the fact that [~accountid:5bd237b8dd3cc64b77e71676] was not able to reproduce issue PUBDEV-8313 using a synthetic dataset based on characteristics of the original customers' dataset).

To prevent similar issues in the future we would like to introduce a computationally in-expensive self-check that would detect problematic node splits with incorrect prediction. With this check enabled we should be able to see the algorithm is not behaving as expected locally even though the overall global behavior (predictions are overall improving) seems fine.

It is not our intention to implement a perfect check (that would likely impact performance). We should leverage data that we already collect and if possible estimate a feasible range of the prediction or in other way show that the prediction seems to be off.

@h2o-ops-ro
Copy link
Collaborator

JIRA Issue Details

Jira Issue: PUBDEV-8332
Assignee: Veronika Maurerová
Reporter: Michal Kurka
State: In Progress
Fix Version: Backlog
Attachments: N/A
Development PRs: Available

@h2o-ops-ro
Copy link
Collaborator

Linked PRs from JIRA

#5837

@maurever maurever self-assigned this Nov 1, 2023
@maurever maurever added this to the 3.46.0.1 milestone Feb 1, 2024
maurever added a commit that referenced this issue Feb 9, 2024
@maurever maurever closed this as completed Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants