Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Correct acceptance criterion for incremental training #1951

Merged

Conversation

tveasey
Copy link
Contributor

@tveasey tveasey commented Jul 15, 2021

When we decide whether to accept the results of incremental training we compare the loss calculated for the best candidate with the loss calculated for the original model. Since the data summary comprises a subset of the training data we are in effect comparing training error on old data + validation error on new training data with something closer to validation error on all data. If we don't have much new data, or the improvement we can make on it is small, this typically causes us to reject models which actually perform better in test.

This pull request records the gap between the train and validation loss on the old training data and adds it on to the threshold to accept. This is recorded in the model meta data and needs to be added to the configuration parameters when Java calls incremental training. Also it became clear that there is no value in storing the per fold training percentage in the instrumentation since it is fixed for the duration of train and is determined either by user override or from the training data size and it is only needed for the initial train so I just removed it.

Copy link
Contributor

@valeriy42 valeriy42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Good job. Just a few comments regarding readability.

include/maths/CBoostedTreeImpl.h Outdated Show resolved Hide resolved
include/maths/CBoostedTreeImpl.h Outdated Show resolved Hide resolved
include/maths/CBoostedTreeImpl.h Show resolved Hide resolved
@tveasey tveasey merged commit 63c4044 into elastic:feature/incremental-learning Jul 27, 2021
@tveasey tveasey deleted the acceptance-criterion branch July 27, 2021 10:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants