Skip to content

Conversation

tveasey
Copy link
Contributor

@tveasey tveasey commented Jun 11, 2020

The feature derivatives for boosted tree splits are laid out in increasing feature index order, but when we sample a feature bag we don't ensure it is sorted. This means we are getting random access rather than linear scan over the derivatives when computing the best split. This switches to sorting the feature bag after sampling and rejigs the function signatures to avoid allocating a vector for every node.

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tveasey
Copy link
Contributor Author

tveasey commented Jun 11, 2020

retest

@tveasey
Copy link
Contributor Author

tveasey commented Jun 12, 2020

Thanks for the review @droberts195! While I was in the area I realised:

  1. I could avoid all allocations in categorical sample without replacement.
  2. There was a bug in the case more samples than values were asked for: missing return. (This wasn't actually being exercised in the code, but it is clearly worth fixing.)

I also added a unit test. I decided it isn't worth breaking this out into a separate change (it is also essentially related to trying to improve cache performance), but it is worth also having a look at e548cc6

Copy link
Contributor

@droberts195 droberts195 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still LGTM

Neither of the comments is essential to change if the CI goes green and you need to merge quickly

@tveasey tveasey merged commit 029e232 into elastic:master Jun 12, 2020
@tveasey tveasey deleted the sort-feature-bag branch June 12, 2020 13:46
tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Jun 12, 2020
tveasey added a commit that referenced this pull request Jun 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants