# Using keras models in sklearn's pipelines

One of the most powerful features of `scikit-learn` is the ability to create _pipelines_ to structure the data flow. These pipelines can be exported from one system to another like a black box.

Keras models are not directly integrated into scikit-learn pipelines, but from the library itself there are two classes that act as _wrappers_ with the necessary methods for this integration to work: `keras.wrappers.scikit_learn.KerasClassifier(...)` and `keras.wrappers.scikit_learn.KerasRegressor(...)`, for classification and regression problems respectively.

Let's import the necessary libraries that we will use in the example.

In [1]:
import numpy as np
import pandas as pd
import tensorflow as tf
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

And the constants

In [2]:
RANDOM_SEED=42

We will now give a small demonstration of how one of these _wrappers_ would work. We will create a pipeline with a `scikit-learn` data normaliser and a classifier based on a multilayer perceptron of `keras`.

To do this, we will first create some data that will serve as our dataset.

In [3]:
X_train = np.random.random((1000, 3))
y_train = np.eye(3)[np.random.choice(3, 1000)]
X_test = np.random.random((100, 3))
y_test = np.eye(3)[np.random.choice(3, 100)]

Next, we create our model. The _wrapper_ requires a first parameter with the function that returns our model, so we encapsulate it in a function.

In [4]:
def build_model():
    model = tf.keras.models.Sequential([
        tf.keras.layers.Dense(32, activation='relu', input_dim=3),
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(3, activation='softmax'),
    ])
    
    model.compile(
        loss='categorical_crossentropy',
        optimizer='sgd',
        metrics=["accuracy"],
    )

    return model

classifier = tf.keras.wrappers.scikit_learn.KerasClassifier(
    build_fn=build_model,
    epochs=10,
    batch_size=32,
    verbose=0,
)

The next step is to create our normaliser

In [5]:
scaler = StandardScaler()

Finally, let's create the pipeline

In [6]:
pipeline = Pipeline([
    ('scaler',scaler),
    ('classifier',classifier),
])

We are now going to launch an adjustment (a training) in the pipeline and see how it performs with the test set.

In [7]:
pipeline.fit(
    X_train,
    y_train,
    classifier__batch_size=1000,
    classifier__epochs=1000,
)
print(f'Accuracy on test: {pipeline.score(X_test, y_test)}')

Accuracy on test: 0.25999999046325684
