Add a LearningCurveSplitter to test sample efficiency #23817

CompRhys · 2022-07-01T22:51:03Z

Describe the workflow you want to enable

In applications of machine learning the error of the model typically follows a power law with the number of training examples. In the ML for material science and molecules community there is lots of interest in using such plots to asses model performance however as of yet there is no standard splitting tool to make such splits. A standard splitter would improve the reproducibility and uptake of such studies.

Describe your proposed solution

A learning-curve splitter that returns a series of training sets of various sizes alongside a fixed test set. Kwargs would control whether the splitting size was logarithmic, how many splits, whether to take multiple splits for each training set size, etc

Describe alternatives you've considered, if relevant

No response

Additional context

This is an example of the type of plot used to compare models. https://www.nature.com/articles/s41467-020-18556-9

CompRhys added Needs Triage Issue requires triage New Feature labels Jul 1, 2022

Micky774 added Needs Decision - Include Feature Requires decision regarding including feature and removed Needs Triage Issue requires triage labels Jul 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a LearningCurveSplitter to test sample efficiency #23817

Add a LearningCurveSplitter to test sample efficiency #23817

CompRhys commented Jul 1, 2022

Add a LearningCurveSplitter to test sample efficiency #23817

Add a LearningCurveSplitter to test sample efficiency #23817

Comments

CompRhys commented Jul 1, 2022

Describe the workflow you want to enable

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context