-
Notifications
You must be signed in to change notification settings - Fork 66
[ML] Improve quantile estimation performance for train #2390
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Great results!
std::size_t fastSketchSize(double reductionFactor, std::size_t size) { | ||
size = static_cast<std::size_t>( | ||
static_cast<double>(size) * CQuantileSketch::REDUCTION_FACTOR / reductionFactor + 0.5); | ||
return size + (3 - (size + 1) % 3) % 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where does 3
come from? Can we use a constant here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ensures that the loop which computes the weights never have any left over costs to compute, since we take steps of size 3.
This is the second part of #2380.
Quantile estimation during training is expensive for data sets with many numeric features. This change makes various speed ups in this area:
inplace_merge
,Together these drop the amortised add cost from around 100ns to 50ns per item on my i9. The remaining costs are 50%
std::nth_element
and 30%std::sort
+std::inplace_merge
. Short of dropping the requirement we extract the smallest k merge costs it is unlikely we'd be able to make big inroads on these costs. Doing this it was useful to extendCFloatStorage
to be useable in constexprs. I also mark all methods noexcept which they are and always will be, it might in theory allow some extra optimisations, although the compiler should be able to deduce these are no throw.Separately, I cap the maximum number of values we'll use to estimate quantiles. The accuracy of the quantiles we care about converges quickly with sample size. For example, the count of values less (greater) than the sample 1st (99th) percentile would be ~ Binomial(n, 0.01), for sample size n, so relative error in count would be O(10.0 / n^(1/2)). I therefore cap the maximum sample size to 50000 for which this error is around 4%.
Finally, following on from #2364 we don't really need to expose interpolation outside the class (all uses were linear and that would be unlikely to change). I took the opportunity to remove the option from the constructor.