# Classification Using Auto-SKLearn

[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1Z5s0WXnjtxSi2oLxKG1ZTTcpVXqIjLyv#scrollTo=-ZrgwiL9kR_L)

In [None]:
!pip install auto-sklearn

If running on Google Colab: After running this cell, Click Runtime -> Restart runtime. Then you can run the following cells.

In [4]:
# imports
from pprint import pprint

import sklearn.datasets
import sklearn.metrics
import pickle

import autosklearn.classification

In [5]:
# split the dataset
X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = \
    sklearn.model_selection.train_test_split(X, y, random_state=1)

In [6]:
# Fit the classifier
automl = autosklearn.classification.AutoSklearnClassifier(
    time_left_for_this_task=120,
    per_run_time_limit=30,
    tmp_folder='/tmp/autosklearn_classification_example_tmp',
)
automl.fit(X_train, y_train, dataset_name='breast_cancer')

AutoSklearnClassifier(per_run_time_limit=30, time_left_for_this_task=120,
                      tmp_folder='/tmp/autosklearn_classification_example_tmp')

In [7]:
# Different Models run by autosklearn
print(automl.leaderboard())

          rank  ensemble_weight                 type      cost  duration
model_id                                                                
7            1             0.10          extra_trees  0.014184  1.502508
2            2             0.02        random_forest  0.028369  2.024807
36           3             0.06  k_nearest_neighbors  0.028369  0.853534
26           4             0.04          extra_trees  0.028369  2.240347
19           5             0.02          extra_trees  0.028369  2.791073
22           6             0.02    gradient_boosting  0.028369  1.149980
3            7             0.14                  mlp  0.028369  1.667622
12           8             0.04    gradient_boosting  0.035461  1.240657
17           9             0.02    gradient_boosting  0.035461  1.510491
8           10             0.02        random_forest  0.035461  1.958862
37          11             0.06    gradient_boosting  0.035461  1.585859
5           12             0.04        random_fores

In [None]:
# Show the different models
pprint(automl.show_models(), indent=4)

In [9]:
# Predict the test labels
predictions = automl.predict(X_test)
print("Accuracy score:", sklearn.metrics.accuracy_score(y_test, predictions))

Accuracy score: 0.9440559440559441


In [10]:
# Export the model with the highest rank
clf = automl.show_models()[7]['sklearn_classifier']
pickle.dump(clf,open('model.pickle','wb'))