# Chargement du dataset Kuzushiji-MNIST

**Motivation :** Kuzushiji-MNIST propose une alternative plus difficile à MNIST. En effet, la plupars des modèles de Deep Learning récents sont capables d'obtenir plus de 99.5% d'accuracy sur MNIST, c'est pourquoi il peut être intéressant d'évaluer un modèle sur des dataset plus challengeant. 

Kuzushiji-MNIST propose 70 000 images de caractères issus de la littérature japonaise classique. A l'instar de MNIST, le dataset original comporte 10 classes de caractères. K-MNIST est également décliné en 2 autres dataset plus difficiles :
- Kuzushiji-49 contient 270,912 images divisées en 49 classes *imbalanced*.
- Kuzushiji-Kanji contient 140 426 images divisées en 3832 classes de caractères très *imbalanced* : certains caractères ne sont présents qu'une et unique fois dans le dataset, rendant impossible un découpage *train* / *valid*.

### Download K-MNIST

In [1]:
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

In [5]:
# Training images & labels
!wget -nc http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-imgs.npz
!wget -nc http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-labels.npz
    
# Test images & labels
!wget -nc http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-imgs.npz
!wget -nc http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-labels.npz

--2021-03-21 00:07:10--  http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-imgs.npz
Résolution de codh.rois.ac.jp (codh.rois.ac.jp)… 136.187.88.58
Connexion à codh.rois.ac.jp (codh.rois.ac.jp)|136.187.88.58|:80… connecté.
requête HTTP transmise, en attente de la réponse… 200 OK
Taille : 18384171 (18M)
Sauvegarde en : « kmnist-train-imgs.npz »


2021-03-21 00:07:31 (854 KB/s) — « kmnist-train-imgs.npz » sauvegardé [18384171/18384171]

--2021-03-21 00:07:31--  http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-train-labels.npz
Résolution de codh.rois.ac.jp (codh.rois.ac.jp)… 136.187.88.58
Connexion à codh.rois.ac.jp (codh.rois.ac.jp)|136.187.88.58|:80… connecté.
requête HTTP transmise, en attente de la réponse… 200 OK
Taille : 29700 (29K)
Sauvegarde en : « kmnist-train-labels.npz »


2021-03-21 00:07:32 (95,2 KB/s) — « kmnist-train-labels.npz » sauvegardé [29700/29700]

--2021-03-21 00:07:32--  http://codh.rois.ac.jp/kmnist/dataset/kmnist/kmnist-test-imgs.npz
Résolution de codh.

In [2]:
X_train = np.load('data/kmnist-train-imgs.npz')['arr_0']
Y_train = np.load('data/kmnist-train-labels.npz')['arr_0']

X_test = np.load('data/kmnist-test-imgs.npz')['arr_0']
Y_test = np.load('data/kmnist-test-labels.npz')['arr_0']

In [3]:
# Preprocess data (normalization)
X_train = (X_train / 256)[...,None].astype('float32')
Y_train = tf.keras.utils.to_categorical(Y_train, num_classes=10)

X_test = (X_test / 256)[...,None].astype('float32')
Y_test = tf.keras.utils.to_categorical(Y_test, num_classes=10)

### CapsNet

In [29]:
from capsulelayers import CapsuleLayer, PrimaryCap, Length, Mask

In [34]:
# All hyperparameters in a single cell
epochs = 50
batch_size = 100
learning_rate = 0.001
decay = 0.9 # to decrease learning rate at each epoch
lam_recon = 0.392 # decoder loss coefficient
routings = 3 # number of routing by agreement iterations

In [30]:
def CapsNet(input_shape, n_class, routings):
    
    # input layer
    x = tf.keras.layers.Input(input_shape)
    
    # layer 1 : regular Convolutionnal layer
    conv1 = tf.keras.layers.Conv2D(filters = 256, kernel_size = (9,9), activation = 'relu', name = 'conv1')(x)
    
    # layer 2 : PrimaryCaps, which is a convolution layer with Squash activation
    # dim_capsule : corresponds to the dimension of the capsule output vector
    # n_channels : number of capsule types
    primarycaps = PrimaryCap(conv1, dim_capsule = 8, n_channels = 32, kernel_size = 9, strides = 2, padding = 'valid')
    
    # layer 3 : CapsuleLayer (involves routing by agreement)
    # each capsule in this layer represents one of the Kuzushiji symbol
    kcaps = CapsuleLayer(num_capsule = n_class, dim_capsule = 16, routings = routings, name = 'kcaps')(primarycaps)
    
    # layer 4 : layer that takes the length of each capsule
    out_caps = Length(name='capsnet')(kcaps)
    
    # Let's build the decoder network
    # 2 reconstructions are performed :
    # - first one is to reconstruct image according to the true label
    # - second one is to reconstruct image according to the vector with maximal length (prediction)
    y = tf.keras.layers.Input((n_class,))
    masked_by_y = Mask()([kcaps, y])
    masked = Mask()(kcaps)
    
    # Dense layers of the decoder architecture as described in the paper
    decoder = tf.keras.models.Sequential(name = 'decoder')
    decoder.add(tf.keras.layers.Dense(512, activation = 'relu', input_dim = 16 * n_class))
    decoder.add(tf.keras.layers.Dense(1024, activation = 'relu'))
    decoder.add(tf.keras.layers.Dense(input_shape[0]*input_shape[1], activation = 'sigmoid'))
    decoded.add(tf.keras.layers.Reshape(input_shape, name = 'out_recon'))
    
    # Models used for training and evaluation
    # train_model involves training of the decoder
    # while evaluation model, given an input x, outputs his prediction and his reconstruction using the trained decoder
    train_model = tf.keras.models.Model([x, y], [out_caps, decoder(masked_by_y)])
    eval_model = tf.keras.models.Model(x, [out_caps, decoder(masked)])
                
    return train_model, eval_model

In [31]:
# a custom loss is used for training
def margin_loss(y_true, y_pred):
    loss = y_true * tf.square(tf.maximum(0., 0.9 - y_pred)) + \
           0.5 * (1 - y_true) * tf.square(tf.maximum(0., y_pred - 0.1))
    
    return tf.reduce_mean(tf.reduce_sum(loss, 1))

In [35]:
log = tf.keras.callbacks.CSVLogger('callbacks/log.csv')

# weigths of the model that has the best validation accuracy are saved during training
checkpoint = tf.keras.callbacks.ModelCheckpoint('callbacks/weights-{epoch:02d}.h5',
                                                monitor = 'val_capsnet_acc', save_best_only=True, 
                                                save_weights_only=True, verbose=1)

# learning rate is reduced during training for optimal gradient descent
lr_decay = tf.keras.callbacks.LearningRateScheduler(lambda epoch : learning_rate * (decay ** epoch))