Skip to content
This repository has been archived by the owner on Aug 12, 2020. It is now read-only.

Train, validation, and test #36

Closed
mauricioaniche opened this issue Feb 12, 2020 · 2 comments · Fixed by #182
Closed

Train, validation, and test #36

mauricioaniche opened this issue Feb 12, 2020 · 2 comments · Fixed by #182
Assignees
Labels
enhancement New feature or request machine-learning

Comments

@mauricioaniche
Copy link
Contributor

For now, we are only dividing in training and testing. Adding the validation set will give us "one more opportunity" to test the model in unseen data!

Suggestion:

  • We split in training and testing
  • The training, we split again (K times) and find the best hyper-parameters using these splits.
  • As soon as we have the best model here (and the K accuracies we collected from the Grid/Random search), we then test it again in the never seen training. We will have one more accuracy number here.
  • At the end, we have 10 accuracy numbers for the model tuning, and one for the unseen data.
@mauricioaniche
Copy link
Contributor Author

mauricioaniche commented Mar 2, 2020

Also: Unseen projects (do we need it? Because we do the comparison among different datasets already)

After discussion: ignore this comment!

@Dahny Dahny self-assigned this Mar 5, 2020
@mauricioaniche
Copy link
Contributor Author

@Dahny after our talk today:

We should:

  • Split the model in 80%-20%. (We discussed about doing it inside the _run_single_model, but that's not a good idea; we want all the different models to be trained on the same training and test set, so this needs to be done one layer before... Who calls this method?)
  • Call the search, with k=10. (From here on, all changes are in the _run_single_model method.
  • Extract the best estimator as well as its scores on the 10 folds.
  • Use the best estimator to predict results in the unseen test set.
  • Collect precision, recall, accuracy, and the confusion matrix (TP, TN, FP, FN)
  • Re-train the model on the full 100% dataset, so that we have a "production-ready" model.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request machine-learning
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants