New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Any way to benefit from retraining? #26
Comments
Hi Jose, There are different reasons and different solutions to train multiple models on the same (or similar dataset). Can you give details of your objective? For example, if you are re-training multiple models to test hyper-parameters, or to create an ensemble of sub-models, there is no generic solution to avoid re-training the full models. However, if you want to update an existing model with a small amount of new training examples, there are existing works. Take a look at the "online learning" domain. Currently, YDF does not offer any direct solution for that. However, and while this is likely not as good as an online learning method, you can either resume training of a model on a new dataset, or ensemble a set of models, each trained on a different snapshot of data. |
Hi Mathieu Thank for the reply. It's the second option that I'm interested in, updating an existing model several times with small amounts of new training examples. I'd be interested to look into the possibility you mention regarding resuming training of a model with a new dataset. I assume the new dataset would be only the small extra training examples. Can you point me to the code that does this? |
You need to experiment to figure what works.
Set the temp argument (e.g. temp_directory="/tmp/training_cache") and enable resuming training (try_resume_training=True). model.learner_params["num_trees"] = 200 + 200 # Add an extra 200 trees
Train the two models independently on the different datasets, and then combine them (e.g. average the predictions) using the keras functional API. Look at the composition colab for some example of model composition. |
Hi Mathieu Thanks for the suggestions. I'll think through these... |
Most users who try to train a classifier have to carry out several attempts at training until they get acceptable results. This means that in each consecutive attempt they have to resend and retrain the classifier using almost the same training data, with only a few added samples. This seems wasteful. Is there any way to use Yggdrasil to benefit from this knowledge?
The text was updated successfully, but these errors were encountered: