[ML] Write out training fraction and number of train rows in model meta data #1947
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This finishes up the TODOs from #1941. In particular, I remove all reference to train fraction per fold from instrumentation since it doesn't change from round to round and so isn't really a useful addition to the model stats. I also start writing out the number of rows used in training in the model meta data because these are needed to handle scaling hyperparameters properly when rerunning training on a different amount of data. This will be important, for example, when running incremental training.
To do this I've introduced a new section
train_parameters
into the model metadata object. Ultimately this should include other parameters the user overrides which are needed to reproduce training. These can be read directly from the model config and so can be managed in Java.Note this PR can't be merged until the Java is updated to read the new metadata result format.