Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Write out training fraction and number of train rows in model meta data #1947

Closed
wants to merge 3 commits into from

Conversation

tveasey
Copy link
Contributor

@tveasey tveasey commented Jul 13, 2021

This finishes up the TODOs from #1941. In particular, I remove all reference to train fraction per fold from instrumentation since it doesn't change from round to round and so isn't really a useful addition to the model stats. I also start writing out the number of rows used in training in the model meta data because these are needed to handle scaling hyperparameters properly when rerunning training on a different amount of data. This will be important, for example, when running incremental training.

To do this I've introduced a new section train_parameters into the model metadata object. Ultimately this should include other parameters the user overrides which are needed to reproduce training. These can be read directly from the model config and so can be managed in Java.

Note this PR can't be merged until the Java is updated to read the new metadata result format.

@tveasey
Copy link
Contributor Author

tveasey commented Jul 27, 2021

This is primarily needed for incremental training so I'll just make the change on that feature branch (we need other stats there as well).

@tveasey tveasey closed this Jul 27, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant