Train, validation, and test #36

mauricioaniche · 2020-02-12T18:26:44Z

For now, we are only dividing in training and testing. Adding the validation set will give us "one more opportunity" to test the model in unseen data!

Suggestion:

We split in training and testing
The training, we split again (K times) and find the best hyper-parameters using these splits.
As soon as we have the best model here (and the K accuracies we collected from the Grid/Random search), we then test it again in the never seen training. We will have one more accuracy number here.
At the end, we have 10 accuracy numbers for the model tuning, and one for the unseen data.

mauricioaniche · 2020-03-02T10:35:38Z

Also: Unseen projects (do we need it? Because we do the comparison among different datasets already)

After discussion: ignore this comment!

mauricioaniche · 2020-04-02T12:53:38Z

@Dahny after our talk today:

We should:

Split the model in 80%-20%. (We discussed about doing it inside the _run_single_model, but that's not a good idea; we want all the different models to be trained on the same training and test set, so this needs to be done one layer before... Who calls this method?)
Call the search, with k=10. (From here on, all changes are in the _run_single_model method.
Extract the best estimator as well as its scores on the 10 folds.
Use the best estimator to predict results in the unseen test set.
Collect precision, recall, accuracy, and the confusion matrix (TP, TN, FP, FN)
Re-train the model on the full 100% dataset, so that we have a "production-ready" model.

mauricioaniche added enhancement New feature or request machine-learning labels Feb 12, 2020

Dahny self-assigned this Mar 5, 2020

Dahny mentioned this issue May 25, 2020

Train, validation, and test #182

Merged

Dahny linked a pull request May 25, 2020 that will close this issue

Train, validation, and test #182

Merged

jan-gerling closed this as completed in #182 Jul 17, 2020

mauricioaniche mentioned this issue Aug 4, 2020

Collect the IDs of the predicted methods in the test set refactoring-ai/Machine-Learning#3

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train, validation, and test #36

Train, validation, and test #36

mauricioaniche commented Feb 12, 2020

mauricioaniche commented Mar 2, 2020 •

edited

mauricioaniche commented Apr 2, 2020

Train, validation, and test #36

Train, validation, and test #36

Comments

mauricioaniche commented Feb 12, 2020

mauricioaniche commented Mar 2, 2020 • edited

mauricioaniche commented Apr 2, 2020

mauricioaniche commented Mar 2, 2020 •

edited