Join GitHub today
Can classifier update() be faster than training from scratch? #123
I am building a dataset and am training NaiveBayesClassifier as the dataset grows. Instead of retraining the classifier every time after adding few new entries, I was hoping to use the update() method just to add new entries and retrain the model with them, in order to cut training time when new data added. What I discovered that loading a pickled trained classifier and updating it just with new entries is not faster than re-training it from scratch. Re-reading the docs they do say that update() "Update the classifier with new training data and re-trains the classifier", which implies re-training on the entire data set...
Question: is there such thing as incremental re-training, or realistically it is processing the entire dataset from scratch, every time I want to update the classifier with new data?