ENH sort before binning in HGBT #28102
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reference Issues/PRs
Suggested in #28064 (comment).
What does this implement/fix? Explain your changes.
HGBT's
_BinMappersearches for bin thresholds. This callsnp.uniqueand thenpercentilewhich both internally usenp.sort.This PR first applies
np.sortsuch that the subsequent internal calls tonp.sortare applied on an already sorted array.Any other comments?
The binning part of HGBT is very small compared to the actual fitting of trees (5s out of 60s for Higgs with 100 trees).