Skip to content

Auto Classification

Rafael Garcia Leiva edited this page Dec 31, 2019 · 6 revisions

Auto Classification

The fastautoml.AutoClassifier class automatically select the best model for a classification problem. In particular, it computes the optimal subset of features, select the best family of models, and the best hyperparameters for the model selected. Please, refer to the Reference API (TDB) for a list of supported families of models.

Classification of Digits

In this example we are going to apply our auto classification class to the problem of recognize hand written digits, that is, the digits dataset included with sckit-learn.

>>> from fastautoml.fastautoml import AutoClassifier
>>> from sklearn.datasets import load_digits
>>> from sklearn.model_selection import train_test_split

>>> (X, y) = load_digits(return_X_y=True)
>>> X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=1)

>>> model = AutoClassifier()
>>> model.fit(X_train, y_train)
AutoClassifier()
>>> model.score(X_test, y_test)
0.9622222222222222
>>> type(model.model)
sklearn.svm.classes.LinearSVC

A key difference between the fastautoml library and other AutoML libraries is that we return one single model as the best possible model, instead of an ensamble of models. In this sense, the data scientst can reuse the results of the AutoClassifier and continue with the analysis.

For more information about how to select the optimal classifier see the following blog entries:

  • A Comparision of Time and Accuracy in AutoML libraries (TBD)
  • AutoML for Financial Data (TBD)

Supported Models

The following families of models are currently supported for the auto-classification part:

Clone this wiki locally