Skip to content

Conversation

tveasey
Copy link
Contributor

@tveasey tveasey commented Jul 3, 2020

We weren't accounting for the padding we include since we memory aligned columns, #1142, in CBoostedTreeFactory when computing the number of extra columns. This meant we were using presenting extra columns to feature selection (which would nearly always ignore them) and occasionally changing the initialisation of the learn rate.

The problem we hit which caused the SIGSEGV was when there was only one feature which itself carried no information, i.e. it was a constant. We could then erroneously select an extra column as a feature.

EInitializationStage m_InitializationStage = E_NotInitialized;
std::size_t m_NumberThreads;
std::size_t m_DependentVariable = std::numeric_limits<std::size_t>::max();
TOptionalSize m_PaddedExtraColumns;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this doesn't need persisting because it is recalculated every time.

Copy link
Contributor

@valeriy42 valeriy42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tveasey tveasey merged commit 9481d1b into elastic:master Jul 6, 2020
@tveasey tveasey deleted the bug-number-extra-column branch July 6, 2020 08:42
tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Jul 6, 2020
tveasey added a commit to tveasey/ml-cpp-1 that referenced this pull request Jul 6, 2020
@droberts195 droberts195 added v7.8.1 and removed v7.8.2 labels Jul 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants