[ML] Avoid log spam when we only have missing values for a feature computing candidate splits for regression and classification #1500

tveasey · 2020-09-22T19:54:36Z

Since we downsample the rows when computing candidate splits it's possible that a feature with non-zero probability of being selected ends up with no non-missing feature values when we compute candidate splits. This is harmless and we can happily initialise the candidate splits to an empty set in this case. However, it was generating log spam when trying to compute quantiles. In particular, we'd get repeated errors and warnings during training of the form:

[CBoostedTreeImpl.cc@737] Failed to compute quantile 86.7027: ignoring split
[CQuantileSketch.cc@295] No values added to quantile sketch

This change simply checks that there are values before trying to compute quantiles.

…ting candidate splits

dimitris-athanasiou

LGTM

…mputing candidate splits for regression and classification (elastic#1500) Since we downsample the rows when computing candidate splits it's possible that a feature with non-zero probability of being selected ends up with no non-missing feature values when we compute candidate splits. This is harmless and we can happily initialise the candidate splits to an empty set in this case. However, it was generating log spam when trying to compute quantiles. In particular, we'd get repeated errors and warnings during training of the form: [CBoostedTreeImpl.cc@737] Failed to compute quantile 86.7027: ignoring split [CQuantileSketch.cc@295] No values added to quantile sketch This change simply checks that there are values before trying to compute quantiles.

…mputing candidate splits for regression and classification (#1500) (#1512) Backport #1500.

Check if there were any non-missing values for a feature before compu…

f8ba599

…ting candidate splits

tveasey added >bug review v8.0.0 :ml/DataFrameAnalysis v7.10.0 labels Sep 22, 2020

Docs

07a73c8

dimitris-athanasiou approved these changes Sep 23, 2020

View reviewed changes

tveasey merged commit 2a35061 into elastic:master Sep 25, 2020

tveasey deleted the log-spam branch September 25, 2020 19:29

tveasey mentioned this pull request Sep 25, 2020

[7.10][ML] Avoid log spam when we only have missing values for a feature computing candidate splits for regression and classification #1512

Merged

tveasey added a commit that referenced this pull request Oct 1, 2020

[ML] Avoid log spam when we only have missing values for a feature co…

b695f0f

…mputing candidate splits for regression and classification (#1500) (#1512) Backport #1500.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ML] Avoid log spam when we only have missing values for a feature computing candidate splits for regression and classification #1500

[ML] Avoid log spam when we only have missing values for a feature computing candidate splits for regression and classification #1500

Uh oh!

tveasey commented Sep 22, 2020

Uh oh!

dimitris-athanasiou left a comment

Uh oh!

Uh oh!

[ML] Avoid log spam when we only have missing values for a feature computing candidate splits for regression and classification #1500

[ML] Avoid log spam when we only have missing values for a feature computing candidate splits for regression and classification #1500

Uh oh!

Conversation

tveasey commented Sep 22, 2020

Uh oh!

dimitris-athanasiou left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!