-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ML] Correct acceptance criterion for incremental training #1951
Merged
tveasey
merged 12 commits into
elastic:feature/incremental-learning
from
tveasey:acceptance-criterion
Jul 27, 2021
Merged
[ML] Correct acceptance criterion for incremental training #1951
tveasey
merged 12 commits into
elastic:feature/incremental-learning
from
tveasey:acceptance-criterion
Jul 27, 2021
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…in incremental train.
valeriy42
approved these changes
Jul 26, 2021
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Good job. Just a few comments regarding readability.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When we decide whether to accept the results of incremental training we compare the loss calculated for the best candidate with the loss calculated for the original model. Since the data summary comprises a subset of the training data we are in effect comparing training error on old data + validation error on new training data with something closer to validation error on all data. If we don't have much new data, or the improvement we can make on it is small, this typically causes us to reject models which actually perform better in test.
This pull request records the gap between the train and validation loss on the old training data and adds it on to the threshold to accept. This is recorded in the model meta data and needs to be added to the configuration parameters when Java calls incremental training. Also it became clear that there is no value in storing the per fold training percentage in the instrumentation since it is fixed for the duration of train and is determined either by user override or from the training data size and it is only needed for the initial train so I just removed it.