# AutoKeras Tutorial

In this tutorial, the use of AutoKeras will be explained for an example dataset: MNIST. Ofcourse, all the functionalities directly apply on real-life data. Before you start, it is important to note that for the current AutoKeras version 0.4, a Linux system working with Python 3.6 is needed for this tutorial.

In [0]:
! python --version

For the setup, we will import the needed packages and the dataset

In [0]:
! pip3 install autokeras --no-cache-dir  # install the autokeras package: this is quite big and we try to avoid memory issues I got in the past

In [0]:
import autokeras   # as a test
from autokeras.image.image_supervised import load_image_dataset
import cv2
import sklearn
from autokeras.image.image_supervised import ImageClassifier

# The MNIST example

In [0]:
from keras.datasets import mnist
from autokeras.image.image_supervised import ImageClassifier

(x_train, y_train), (x_test, y_test) = mnist.load_data()    # download the data
x_train = x_train.reshape(x_train.shape + (1,))
x_test = x_test.reshape(x_test.shape + (1,))         # A 4D vector is expected

Lets initiate our classifier! We will take one hour of training which should normally be enough for 1GPU to get a good result. Of course for MNIST this works very well but for real-world data, you might need to search longer. As you will see, the logs printed in verbose mode will allow you to follow the process quite nicely. At first, the API will receive the call and preprocesses the dataset. From then on your CPU will be looking for new models based on network morphism guided by Bayesian optimization. Meanwhile, your GPU will be training those in order to find the best performing one. 

NOTE THAT IF YOU ARE NOT RUNNING ON A GPU, THE SEARCH TIME WILL BE TOO SHORT TO FIND A MODEL (in Google collab you can change this in the settings)

In [0]:
clf = ImageClassifier(path="mnist_automodels/", verbose=True)    # declare your classifier, with a path to save it's found models
clf.fit(x_train, y_train, time_limit=1 * 60 * 60)    # we will be training for one hour

Once we reached our time limit or the training converged, we are ready to train our best found model further for best performance. Since it will now include validation data in its training set, we can choose to restart training the weights from scratch with this little more data. This takes a long time, so in this example we will just keep training it while starting from the weights found so far.

In [0]:
clf.final_fit(x_train, y_train, x_test, y_test, retrain=False)
result = clf.evaluate(x_test, y_test)
print('The resulting accuracy is ' + str(result))

Hurray, we are done training! Hopefully, we are happy with the results and we can export the model for other uses. Note that this is an AutoKeras model and can not be saved as say a Keras model. This is due to the automated preprocessing steps which are not included in conventional ML software.

In [0]:
clf.export_autokeras_model('automodel.h5')
print('exporting done')

As a last step, let's consult the confusion matrix to get some insight in our performance.

In [0]:
from sklearn.metrics import confusion_matrix
y_prediction = clf.predict(x_test)
print('The confusion matrix is: ')
print(str(confusion_matrix(y_test, y_prediction)))

Further notes:
we can invest the output of our search in our own defined folder: mnist_automodels
In there we find a file for the models from every iteration. The best found model is specified in the file best_model.txt . In the log file we can view the added operations in every iteration including the father model. 

# Visualization

To visualize our attained model, some specific software  is used as recommended by the AutoKeras authors. First, install the requirements

In [0]:
! sudo apt-get install libcairo2-dev
! sudo apt-get install libpango1.0-dev

In [0]:
! sudo apt install curl
! curl -O https://graphviz.gitlab.io/pub/graphviz/stable/SOURCES/graphviz.tar.gz

Install Graphviz:

In [0]:
! tar -xzf graphviz.tar.gz
! dir
! graphviz-2.40.1/configure
! make
! sudo make install
! pip3 install graphviz

Import the packages

In [0]:
import os
from graphviz import Digraph

from autokeras.utils import pickle_from_file

And we define the needed functions

In [0]:
def to_pdf(graph, path):
    dot = Digraph(comment='The Round Table')

    for index, node in enumerate(graph.node_list):
        dot.node(str(index), str(node.shape))

    for u in range(graph.n_nodes):
        for v, layer_id in graph.adj_list[u]:
            dot.edge(str(u), str(v), str(graph.layer_list[layer_id]))

    dot.render(path)

In [0]:
def visualize(path):
    cnn_module = pickle_from_file(os.path.join(path, 'module'))
    cnn_module.searcher.path = path
    for item in cnn_module.searcher.history:
        model_id = item['model_id']
        graph = cnn_module.searcher.load_model_by_id(model_id)
        to_pdf(graph, os.path.join(path, str(model_id)))

Doing this, the visualization of our model will be saved as a pdf file.

In [0]:
visualize('mnist_automodels')