-
Notifications
You must be signed in to change notification settings - Fork 106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How can we create pool of classifiers? #167
Comments
@sara-eb Hello, Sorry for the delay response. The library accepts any list of classifiers as the pool of classifiers so it does accept a combination of ensemble methods with single classifier models. There are two ways of doing that: `X ,y = make_classification() pool1 = [rf, adaboost , svm, tree] In the case, pool1 is a pool of classifiers composed of 4 estimators (although random forest and adaboost are composed of multiple base estimators, the DS method looks at them as a being a single one). pool2 treats each member of random forest/adaboost as a single, independent model instead of their combination. So, the DS model sees it as a pool composed of 22 models (10 coming from rf, 10 from adaboost, 1 svm and 1 decision tree). You may want to check our heterogeneous example too in which we use classifiers of different types in the pool: https://deslib.readthedocs.io/en/latest/auto_examples/example_heterogeneous.html#sphx-glr-auto-examples-example-heterogeneous-py |
@Menelau Thank you very much sir, Thanks again |
@Menelau I created a pool of classifiers for my data including a random forest with 200 estimators and an AdaBoost classifier with 600 decision trees, and I am using faiss technique as
Since my validation (i.e., DSEL) dataset is quite big number of samples, I was trying to fit the DS model on validation data and save the model for later prediction on test dataset. However, I am facing an issue of saving it: What could be the reason? |
@sara-eb Hello, I have a feeling that it happens because of the information stored in the faiss knn but I'm not sure. I will investigate that and get back to you asap. You can try using dill instead of pickle for saving the model: https://pypi.org/project/dill/ I believe that should work for you. |
Thanks for recommendation,
However, still getting error; I have traind RandomForest classifier in parallel, can this be the reason? |
@sara-eb , Parallel random forest shouldn’t be a problem at all. I dig deeper into this issue and I found a problem with the serialization of the Faiss KNN. In the case, the index computed by the faiss knn needs to be converted to a string before it is written to a file (see facebookresearch/faiss#914). So I prepared a workaround with functions for saving and loading DS models that should solve this problem (save_ds, load_ds). In the case, they just check whether faiss is being used for the knn calculation in the DS models and if yes, do the conversions before saving/loading. I added the code in this gist: https://gist.github.com/Menelau/0cde51c3622be6313fd96b4dffb17996 Now I will see how to add to DESlib a saving/loading functionality for the DS methods (that can handle Faiss knn automatically) as soon as possible. |
@Menelau Thank you very much sir, It works perfectly |
@Menelau I am facing new issue with scoring now on the test set. What could be the reason.
|
Hello, How did you load the ds model? Did you use the load_ds function I provided in the gist: https://gist.github.com/Menelau/0cde51c3622be6313fd96b4dffb17996 ? I believe the error is in the way you are loading the DS model. In order to save the Faiss model, it's index is converted to a numpy array, so that it can be pickled. In the case, the self.index_ variable is the one containing the indexes, so it is serialized in the save_ds function (by converting to numpy array). Then, in order to load it back the conversion to numpy array back to Faiss index needs to be done (which the load_ds function in the Gist performs). |
@Menelau Thanks a lot sir, sorry I did not realize that I need to reload the model since the model is already in the variable list in the memory. |
As you have mentioned in your examples, BaggingClassifier or RandomeForest classifier are considered as a pool of classifier itself.
I am wondering is it possible if I create a pool of classifiers including traditional ensemble methods like RF, Adaboost in combination of single classifiers like SVM, kNN?
Thanks
The text was updated successfully, but these errors were encountered: