[ML] Correct acceptance criterion for incremental training #1951

tveasey · 2021-07-15T17:41:49Z

When we decide whether to accept the results of incremental training we compare the loss calculated for the best candidate with the loss calculated for the original model. Since the data summary comprises a subset of the training data we are in effect comparing training error on old data + validation error on new training data with something closer to validation error on all data. If we don't have much new data, or the improvement we can make on it is small, this typically causes us to reject models which actually perform better in test.

This pull request records the gap between the train and validation loss on the old training data and adds it on to the threshold to accept. This is recorded in the model meta data and needs to be added to the configuration parameters when Java calls incremental training. Also it became clear that there is no value in storing the per fold training percentage in the instrumentation since it is fixed for the duration of train and is determined either by user override or from the training data size and it is only needed for the initial train so I just removed it.

…in incremental train.

valeriy42

LGTM. Good job. Just a few comments regarding readability.

include/maths/CBoostedTreeImpl.h

tveasey added 2 commits July 15, 2021 15:10

Correct acceptance criterion

6f3ab8b

Comment

0bccf90

tveasey added review >non-issue :ml/DataFrameAnalysis labels Jul 15, 2021

tveasey added 9 commits July 15, 2021 21:04

Write out properties we need in model metadata

b3bf04c

Formatting

88188b2

Missing persistence. Better naming. Scale regularisers appropriately …

9836829

…in incremental train.

Fix test

fefc50c

No need to override default

e05a6a3

Fix unit tests properly

95fadb9

Bug fix

bf27095

Relax test threshold

a4bcd6e

Merge branch 'feature/incremental-learning' into acceptance-criterion

8e6108d

tveasey requested a review from valeriy42 July 22, 2021 11:11

valeriy42 mentioned this pull request Jul 23, 2021

[ML] Visualize Incremental learning #1953

Merged

valeriy42 approved these changes Jul 26, 2021

View reviewed changes

include/maths/CBoostedTreeImpl.h Outdated Show resolved Hide resolved

include/maths/CBoostedTreeImpl.h Outdated Show resolved Hide resolved

include/maths/CBoostedTreeImpl.h Show resolved Hide resolved

Review comments

3b63510

tveasey merged commit 63c4044 into elastic:feature/incremental-learning Jul 27, 2021

tveasey deleted the acceptance-criterion branch July 27, 2021 10:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Correct acceptance criterion for incremental training #1951

[ML] Correct acceptance criterion for incremental training #1951

tveasey commented Jul 15, 2021 •

edited

Loading

valeriy42 left a comment

[ML] Correct acceptance criterion for incremental training #1951

[ML] Correct acceptance criterion for incremental training #1951

Conversation

tveasey commented Jul 15, 2021 • edited Loading

valeriy42 left a comment

Choose a reason for hiding this comment

tveasey commented Jul 15, 2021 •

edited

Loading