# Using Keras and Steps

In this notebook we show how a Keras model for image recognition can be incorporated into Steps pipeline.

In [None]:
%load_ext autoreload
%autoreload 2

import numpy as np
import pandas as pd
from pathlib import Path

from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.metrics import log_loss
from sklearn.externals import joblib

from keras.models import Sequential, Model, load_model
from keras.layers import Input, Conv2D, MaxPooling2D, Flatten, Dense, Dropout
from keras import optimizers, regularizers
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ModelCheckpoint
from keras.optimizers import Adam

import matplotlib.pyplot as plt
%matplotlib inline

In [None]:
from steppy.base import Step, BaseTransformer
from steppy.adapter import Adapter, E
from toolkit.keras_recipes.models import ClassifierGenerator

EXPERIMENT_DIR = './ex5'

In [None]:
import shutil

# By default pipelines will try to load previously trained models so we delete the cache to ba sure we're starting from scratch
shutil.rmtree(EXPERIMENT_DIR, ignore_errors=True)

We start off by loading our favourite dataset for digits recognition.

In [None]:
digits = load_digits()
X_digits, y_digits = digits.data, digits.target

X_train, X_valid, y_train, y_valid = train_test_split(X_digits, y_digits, test_size=0.2, stratify=y_digits, random_state=643793)

print('{} samples for training'.format(len(y_train)))
print('{} samples for test'.format(len(y_valid)))

data = {
    'input': {
        'images': X_train,
        'labels': y_train,
    },
    'input_valid': {
        'images': X_valid,
        'labels': y_valid
    }
}

For convenience let's define a few constants.

In [None]:
TARGET_SHAPE = (8, 8, 1)  # Shape of images. In this dataset we have 8x8 pictures.
                          # Third dimension stands for the number of channels. We uses grayscale images, so 1 channel only.
N_CLASSES = 10  # Number of categories in this classification problem

## Data loader

Before we train a neural net we have to prepare the data properly.

Sklearn keeps the digit images as one-dimensional vectors. It's fine for models like XGBoost or RandomForest, because they ignore the two-dimensional nature of images anyway. However, CNNs don't. That's why the first transformer that we define recovers this structure.

In [None]:
class ReshapeData(BaseTransformer):        
    def transform(self, X, y, **kwargs):
        X = X.reshape((X.shape[0], ) + TARGET_SHAPE)
        return {
            'X': X,
            'y': y
        }

Further we use Keras' tool, ImageDataGenerator, for preparation of image stream. It takes care of mundane tasks like standarization, shuffling, augmenting or portioning the stream into batches. Let's create a generator with quite a few online augmentations.

In [None]:
class PrepareDatagen(BaseTransformer):
    def fit(self, X, **kwargs): 
        self.datagen = ImageDataGenerator(
            featurewise_center=True,
            featurewise_std_normalization=True,
            rotation_range=10,
            width_shift_range=0.1,
            height_shift_range=0.1)
        self.datagen.fit(X)
        
    def transform(self, X, **kwargs):        
        return {
            'datagen': self.datagen,
        }
    
    def persist(self, filepath):
        joblib.dump(self.datagen, filepath)
        
    def load(self, filepath):
        self.datagen = joblib.load(filepath)
        return self

Now we can put together the first steps of the pipeline.

In [None]:
reshape_step = Step(
    name="reshape",
    transformer=ReshapeData(),
    input_data=['input'],
    adapter=Adapter({
        'X': E('input', 'images'),
        'y': E('input', 'labels')
    }),
    experiment_directory=EXPERIMENT_DIR
)

reshape_valid_step = Step(
    name="reshape_valid",
    transformer=ReshapeData(),
    input_data=['input_valid'],
    adapter=Adapter({
        'X': E('input_valid', 'images'),
        'y': E('input_valid', 'labels')
    }),
    experiment_directory=EXPERIMENT_DIR
)

datagen_step = Step(
    name="loader",
    transformer=PrepareDatagen(),
    input_steps=[reshape_step],
    experiment_directory=EXPERIMENT_DIR
)

First, we created a step that reshapes vector representations of train images into two-dimensional arrays. Later we will need validation images in the same form, so we also created an analogous step for them. The third step creates an instance of ImageDataGenerator. It takes as input reshaped train images, so that it can calculate means and variances for standarization.

To check that what we did actually works, let's define an auxilliary step that displays the generated image data stream.

In [None]:
class DataDisplay(BaseTransformer):
    def transform(self, datagen, X, y, **kwargs):
        img_batch, lbl_batch = datagen.flow(X, y, batch_size=32).next()
        n_row = 4
        fix, axs = plt.subplots(n_row, 8, figsize=(8, 2 * n_row))
        for i, ax in enumerate(axs.ravel()):
            ax.imshow(img_batch[i].reshape(8, 8), cmap='gray')
            ax.axis('off')
            ax.set_title('lbl = {}'.format(lbl_batch[i]))
            
display_step = Step(
    name="display",
    transformer=DataDisplay(),
    input_steps=[reshape_step, datagen_step],
    adapter=Adapter({
        'datagen': E(datagen_step.name, 'datagen'),
        'X': E(reshape_step.name, 'X'),
        'y': E(reshape_step.name, 'y')
    }),
    experiment_directory=EXPERIMENT_DIR
)

In [None]:
display_step

In [None]:
display_step.fit_transform(data)

## Steps for CNN training

We proceed to the crux of this notebook: step/transformer wrapping a Keras model. Steps library contains classes that facilitate this task. We will use `ClassifierGenerator` which extends `KerasModelTransformer`. Their design follows a _template method pattern_ which means that the main part of the code is defined in abstract classes and the user has to derive from them and implement some auxiliary methods, in this case: `_build_optimizer`, `_build_loss`, `_build_model`, `_create_callbacks`. That's what we do below.

In [None]:
class KerasCnn(ClassifierGenerator):
    def _build_optimizer(self, **kwargs):
        return Adam(lr=kwargs['learning_rate'])

    def _build_loss(self, **kwargs):
        return 'sparse_categorical_crossentropy'
    
    def _build_model(self, **kwargs):
        dropout_ratio = kwargs['dropout_ratio']
        regularization = kwargs['regularization']
        
        input_img = Input(shape=TARGET_SHAPE)

        layer = Conv2D(8, kernel_size=(3, 3), padding='same', activation='relu')(input_img)
        layer = Conv2D(8, kernel_size=(3, 3), padding='same', activation='relu')(layer)
        layer = MaxPooling2D((2, 2), padding='same')(layer)

        layer = Conv2D(16, kernel_size=(3, 3), padding='same', activation='relu')(layer)
        layer = Conv2D(16, kernel_size=(3, 3), padding='same', activation='relu')(layer)
        layer = MaxPooling2D((2, 2), padding='same')(layer)

        layer = Conv2D(32, kernel_size=(3, 3), padding='same', activation='relu')(layer)
        layer = Conv2D(32, kernel_size=(3, 3), padding='same', activation='relu')(layer)
        layer = MaxPooling2D((2, 2), padding='same')(layer)

        layer = Flatten()(layer)
        layer = Dense(64, activation='relu', kernel_regularizer=regularizers.l2(regularization))(layer)
        if dropout_ratio > 0:
            layer = Dropout(dropout_ratio)(layer)
        predictions = Dense(N_CLASSES, activation='softmax')(layer)

        model = Model(input_img, predictions)
        return model

    def _create_callbacks(self, **kwargs):
        checkpoint_filepath = kwargs['model_checkpoint']['filepath']
        Path(checkpoint_filepath).parents[0].mkdir(parents=True, exist_ok=True)
        model_checkpoint = ModelCheckpoint(**kwargs['model_checkpoint'])
        return [model_checkpoint]

`KerasModelTransformer`'s initializer takes 3 arguments.
1. `architecture_config` - contains model and optimizer parameters.
2. `training_config` - contains parameters for model's `fit_generator` and generator's `flow` methods.
3. `callbacks_config` - contains parameters for callbacks instantiated in `_create_callbacks` methods.

The exact structure of these arguments is best explained on an example.

In [None]:
architecture_config = {
    'model_params': {
        'dropout_ratio': 0.5,
        'regularization': 0.01
    },
    'optimizer_params': {
        'learning_rate': 1e-3
    }
}

training_config = {
    'fit_args': {
        'epochs': 100,
        'verbose': True
    },
    'flow_args': {
        'batch_size': 64,
    }
}

callbacks_config = {
    'model_checkpoint': {
        'filepath': str(Path(EXPERIMENT_DIR) / 'checkpoints' / 'best_model.hdf5'),
        'save_best_only': True
    }
}

Now we have all dependencies necessary to add the crucial step.

In [None]:
cnn_step = Step(
    name="CNN",
    transformer=KerasCnn(architecture_config, training_config, callbacks_config),
    input_steps=[datagen_step, reshape_step, reshape_valid_step],
    experiment_directory=EXPERIMENT_DIR,
    adapter=Adapter({
        'datagen': E(datagen_step.name, 'datagen'),
        'X': E(reshape_step.name, 'X'),
        'y': E(reshape_step.name, 'y'),
        'X_valid': E(reshape_valid_step.name, 'X'),
        'y_valid': E(reshape_valid_step.name, 'y')
    }),
)

In [None]:
cnn_step

Since we didn't specify `datagen_valid` the same generator will be used for train and validation data. In particular it means that validation images are augmented as well.

In [None]:
result = cnn_step.fit_transform(data)

A short function below summarizes the results.

In [None]:
def eval_pred(title, y_true, y_pred):
    print(title)
    print("  Log-loss: ", log_loss(y_true=y_true, y_pred=y_pred))
    choices = np.argmax(y_pred, axis=1)
    print("  Accuracy: {:.2%}".format(np.sum(choices == y_true) / len(y_true)))
    
eval_pred("Results on training", y_true=y_train, y_pred=result['output'])
eval_pred("Results on validation", y_true=y_valid, y_pred=result['output_valid'])

Because we do test time augmentation, it makes sense to run prediction phase a few times and average the results.
As we can see below it improves the overall score.

In [None]:
results_valid = []
for i in range(10):
    print("Iteration {}/10".format(i+1))
    results_valid.append(cnn_step.transform(data)['output_valid'])
y_avg_pred = np.mean(np.array(results_valid), axis=0)
eval_pred("Results on averaged predictions", y_true=y_valid, y_pred=y_avg_pred)