Keras models in modAL workflows
=============================

Thanks for the scikit-learn API of Keras, you can seamlessly integrate Keras models into your modAL workflow. In this tutorial, we shall quickly introduce how to use the scikit-learn API of Keras and we are going to see how to do active learning with it. More details on the Keras scikit-learn API [can be found here](https://keras.io/scikit-learn-api/).

The executable script for this example can be [found here](https://github.com/cosmic-cortex/modAL/blob/master/examples/keras_integration.py)!

Keras' scikit-learn API
-----------------------

By default, a Keras model's interface differs from what is used for scikit-learn estimators. However, with the use of its scikit-learn wrapper, it is possible to adapt your model.

In [1]:
!pip install modal

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [2]:
!pip install scikeras

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


In [9]:
import numpy as np
import keras
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPooling2D
from scikeras.wrappers import KerasClassifier

# read training data
(X_train, y_train), (X_test, y_test) = mnist.load_data()
X_train = X_train.reshape(60000, 28, 28, 1).astype('float32') / 255
X_test = X_test.reshape(10000, 28, 28, 1).astype('float32') / 255
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

# assemble initial data
n_initial = 100
initial_idx = np.random.choice(range(len(X_train)), size=n_initial, replace=False)
X_initial = X_train[initial_idx]
y_initial = y_train[initial_idx]

# generate the pool
# remove the initial data from the training dataset
X_pool = np.delete(X_train, initial_idx, axis=0)[:5000]
y_pool = np.delete(y_train, initial_idx, axis=0)[:5000]

In [10]:
# build function for the Keras' scikit-learn API
def get_model():
    """
    This function compiles and returns a Keras model.
    Should be passed to KerasClassifier in the Keras scikit-learn API.
    """

    model = Sequential()
    model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
    model.add(Conv2D(64, (3, 3), activation='relu'))
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))
    model.add(Flatten())
    model.add(Dense(128, activation='relu'))
    model.add(Dropout(0.5))
    model.add(Dense(10, activation='softmax'))

    return model

For our purposes, the ``classifier`` which we will initialize now acts just like any scikit-learn estimator.

In [11]:
# create the classifier
classifier = KerasClassifier(get_model,
                             loss="categorical_crossentropy",
                             optimizer = 'adam',
                             metrics='accuracy')

Active learning with Keras
---------------------------------------

In this example, we are going to use the famous MNIST dataset, which is available as a built-in for Keras.

Active learning with data and classifier ready is as easy as always. Because training is *very* expensive in large neural networks, this time we are going to query the best 200 instances each time we measure the uncertainty of the pool.

In [12]:
from modAL.models import ActiveLearner

# initialize ActiveLearner
learner = ActiveLearner(
    estimator=classifier,
    X_training=X_initial, y_training=y_initial,
    verbose=1
)



In [13]:
learner.score(X_test, y_test)



0.2996

To make sure that you train only on newly queried labels, pass ``only_new=True`` to the ``.teach()`` method of the learner.

In [14]:
# the active learning loop
n_queries = 10
for idx in range(n_queries):
    print('Query no. %d' % (idx + 1))
    query_idx, query_instance = learner.query(X_pool, n_instances=100, verbose=0)
    learner.teach(
        X=X_pool[query_idx], y=y_pool[query_idx],
        verbose=1
    )
    # remove queried instance from pool
    X_pool = np.delete(X_pool, query_idx, axis=0)
    y_pool = np.delete(y_pool, query_idx, axis=0)

    print(f"Iteration: {idx} - Test Score: {learner.score(X_test, y_test)}")

Query no. 1
Iteration: 0 - Test Score: 0.2241
Query no. 2
Iteration: 1 - Test Score: 0.3396
Query no. 3
Iteration: 2 - Test Score: 0.5685
Query no. 4
Iteration: 3 - Test Score: 0.7088
Query no. 5
Iteration: 4 - Test Score: 0.6919
Query no. 6
Iteration: 5 - Test Score: 0.7554
Query no. 7
Iteration: 6 - Test Score: 0.7723
Query no. 8
Iteration: 7 - Test Score: 0.8077
Query no. 9
Iteration: 8 - Test Score: 0.8098
Query no. 10
Iteration: 9 - Test Score: 0.8536
