Skip to content

Conversation

tveasey
Copy link
Contributor

@tveasey tveasey commented Sep 22, 2020

Since we downsample the rows when computing candidate splits it's possible that a feature with non-zero probability of being selected ends up with no non-missing feature values when we compute candidate splits. This is harmless and we can happily initialise the candidate splits to an empty set in this case. However, it was generating log spam when trying to compute quantiles. In particular, we'd get repeated errors and warnings during training of the form:

[CBoostedTreeImpl.cc@737] Failed to compute quantile 86.7027: ignoring split
[CQuantileSketch.cc@295] No values added to quantile sketch

This change simply checks that there are values before trying to compute quantiles.

Copy link
Contributor

@dimitris-athanasiou dimitris-athanasiou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tveasey tveasey merged commit 2a35061 into elastic:master Sep 25, 2020
@tveasey tveasey deleted the log-spam branch September 25, 2020 19:29
tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Sep 25, 2020
…mputing candidate splits for regression and classification (elastic#1500)

Since we downsample the rows when computing candidate splits it's possible that a feature with non-zero probability of
being selected ends up with no non-missing feature values when we compute candidate splits. This is harmless and we
can happily initialise the candidate splits to an empty set in this case. However, it was generating log spam when trying to
compute quantiles. In particular, we'd get repeated errors and warnings during training of the form:

[CBoostedTreeImpl.cc@737] Failed to compute quantile 86.7027: ignoring split
[CQuantileSketch.cc@295] No values added to quantile sketch

This change simply checks that there are values before trying to compute quantiles.
tveasey added a commit that referenced this pull request Oct 1, 2020
…mputing candidate splits for regression and classification (#1500) (#1512)

Backport #1500.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants