[ML] Write out training fraction and number of train rows in model meta data #1947

tveasey · 2021-07-13T20:49:15Z

This finishes up the TODOs from #1941. In particular, I remove all reference to train fraction per fold from instrumentation since it doesn't change from round to round and so isn't really a useful addition to the model stats. I also start writing out the number of rows used in training in the model meta data because these are needed to handle scaling hyperparameters properly when rerunning training on a different amount of data. This will be important, for example, when running incremental training.

To do this I've introduced a new section train_parameters into the model metadata object. Ultimately this should include other parameters the user overrides which are needed to reproduce training. These can be read directly from the model config and so can be managed in Java.

Note this PR can't be merged until the Java is updated to read the new metadata result format.

tveasey · 2021-07-27T09:33:15Z

This is primarily needed for incremental training so I'll just make the change on that feature branch (we need other stats there as well).

Write out number of train rows in model meta data

75bb6bd

tveasey added review >non-issue :ml/DataFrameAnalysis v7.15.0 labels Jul 13, 2021

tveasey added 2 commits July 14, 2021 12:26

Test fix

38ad0f5

Improve name

869527b

tveasey mentioned this pull request Jul 26, 2021

[ML] Correct acceptance criterion for incremental training #1951

Merged

tveasey closed this Jul 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Write out training fraction and number of train rows in model meta data #1947

[ML] Write out training fraction and number of train rows in model meta data #1947

tveasey commented Jul 13, 2021

tveasey commented Jul 27, 2021

[ML] Write out training fraction and number of train rows in model meta data #1947

[ML] Write out training fraction and number of train rows in model meta data #1947

Conversation

tveasey commented Jul 13, 2021

tveasey commented Jul 27, 2021