You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In PUBDEV-8313 we discovered a bug that was impacting GBM performance when monotone constraints were enabled.
In this bug, node-prediction was calculated incorrectly for "NA vs REST" type of splits when a split was constrained. The bug was introduced in code refactoring in this code change e1ab951#diff-f35a4389a0b947507f6e6986dcf7ef4c2f7b997e6b23e9933bf7c158e4cefafbR985 and remained unnoticed - not detected by the existing test suite. Bug of this type are generally hard to detect as they might require a specific dataset to trigger (this claim is supported by the fact that [~accountid:5bd237b8dd3cc64b77e71676] was not able to reproduce issue PUBDEV-8313 using a synthetic dataset based on characteristics of the original customers' dataset).
To prevent similar issues in the future we would like to introduce a computationally in-expensive self-check that would detect problematic node splits with incorrect prediction. With this check enabled we should be able to see the algorithm is not behaving as expected locally even though the overall global behavior (predictions are overall improving) seems fine.
It is not our intention to implement a perfect check (that would likely impact performance). We should leverage data that we already collect and if possible estimate a feasible range of the prediction or in other way show that the prediction seems to be off.
The text was updated successfully, but these errors were encountered:
Jira Issue: PUBDEV-8332
Assignee: Veronika Maurerová
Reporter: Michal Kurka
State: In Progress
Fix Version: Backlog
Attachments: N/A
Development PRs: Available
In PUBDEV-8313 we discovered a bug that was impacting GBM performance when monotone constraints were enabled.
In this bug, node-prediction was calculated incorrectly for "NA vs REST" type of splits when a split was constrained. The bug was introduced in code refactoring in this code change e1ab951#diff-f35a4389a0b947507f6e6986dcf7ef4c2f7b997e6b23e9933bf7c158e4cefafbR985 and remained unnoticed - not detected by the existing test suite. Bug of this type are generally hard to detect as they might require a specific dataset to trigger (this claim is supported by the fact that [~accountid:5bd237b8dd3cc64b77e71676] was not able to reproduce issue PUBDEV-8313 using a synthetic dataset based on characteristics of the original customers' dataset).
To prevent similar issues in the future we would like to introduce a computationally in-expensive self-check that would detect problematic node splits with incorrect prediction. With this check enabled we should be able to see the algorithm is not behaving as expected locally even though the overall global behavior (predictions are overall improving) seems fine.
It is not our intention to implement a perfect check (that would likely impact performance). We should leverage data that we already collect and if possible estimate a feasible range of the prediction or in other way show that the prediction seems to be off.
The text was updated successfully, but these errors were encountered: