In [1]:
import numpy as np
import tensorflow as tf

<b>First pass</b>: ordinary, small convolutional networks given image data.

In [None]:
train_img_small = np.load('datasets/train_img_64.npy')
train_lbl_small = np.load('datasets/train_lbl_64.npy')
val_img_small = np.load('datasets/val_img_64.npy')
val_lbl_small = np.load('datasets/val_lbl_64.npy')
train_img_small = np.float32(train_img_small) / 255
val_img_small = np.float32(val_img_small) / 255
print(train_img_small.shape, train_lbl_small.shape)
print(val_img_small.shape, val_lbl_small.shape)

In [None]:
# synthetic data pre-generated by albumentations
train_img_synth = np.concatenate([np.load('datasets/train_img_64_synth.npy'), np.load('datasets/train_img_64.npy')], axis=0)
train_lbl_synth = np.concatenate([np.load('datasets/train_lbl_64_synth.npy'), np.load('datasets/train_lbl_64.npy')], axis=0)
train_img_synth = np.float32(train_img_synth) / 255
print(train_img_synth.shape, train_lbl_synth.shape)

In [19]:
dropout_prob = 0.2
spatial_dropout_prob = 0.05
reg_coef = 0.01   # experiments suggest this coef might be far too large
noise_sigma = 0.04
regulator = tf.keras.regularizers.L2(reg_coef)
this_model3 = tf.keras.Sequential([ # tf.keras.layers.Rescaling(1. / 255),
                                tf.keras.layers.GaussianNoise(noise_sigma),
                                    tf.keras.layers.Convolution2D(64, 5, activation='relu', 
                                                               padding='same', use_bias = True,
                                            input_shape = (64,64,3), kernel_regularizer=regulator),
                                 tf.keras.layers.Dropout(dropout_prob),
                                 tf.keras.layers.SpatialDropout2D(spatial_dropout_prob),
                                 tf.keras.layers.Convolution2D(64, 3, activation='relu', 
                                                               padding='same', use_bias = True,
                                                               kernel_regularizer=regulator),
                                 tf.keras.layers.MaxPool2D(strides=(2,2)), # default pool size (2,2); cuts down to 32x32xch
                                 tf.keras.layers.Convolution2D(128, 3, activation='relu', 
                                                               padding='same', use_bias = True,
                                                              kernel_regularizer=regulator),
                                 tf.keras.layers.SpatialDropout2D(spatial_dropout_prob),
                                 tf.keras.layers.Dropout(dropout_prob),
                                 tf.keras.layers.Convolution2D(128, 3, activation='relu', 
                                                               padding='same', use_bias = True,
                                                              kernel_regularizer=regulator),
                                 tf.keras.layers.MaxPool2D(strides=(2,2)), # cuts down to 16x16xch
                                 tf.keras.layers.Convolution2D(128, 3, activation='relu', 
                                                               padding='same', use_bias = True,
                                                              kernel_regularizer=regulator),
                                 tf.keras.layers.SpatialDropout2D(spatial_dropout_prob),
                                 tf.keras.layers.Dropout(dropout_prob),
                                 tf.keras.layers.Convolution2D(128, 3, activation='relu', 
                                                               padding='same', use_bias = True,
                                                              kernel_regularizer=regulator),
                                 tf.keras.layers.MaxPool2D(strides=(2,2)), # cuts down to 8x8xch
                                 tf.keras.layers.Flatten(), # 8192 outputs coming here
                                 tf.keras.layers.Dense(512, activation='relu'),
                                 tf.keras.layers.Dropout(dropout_prob),
                                 tf.keras.layers.Dense(50, activation='softmax')])
                                 
this_model3.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
                loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                metrics=['accuracy'])

Results: On greyscale, 0.58 val (0.81 tr) after 11 epoch. Compare 0.718 val (0.899 tr) after 19 epochs, back on colour; with dropout 0.2, reg 0.01. Regressed with dropout 0.3, reg 0.02, with 0.65 val (0.81 tr) after 17 epochs. With 64/128/256 filters, 0.72 val (0.92 tr) after 18 epochs. With 3 layers in blocks 2 and 3 (back at 64/128/128), 0.675 (0.86) after 14 epochs. With Scharr filters in input channels, 0.66 (0.87) after 14 epochs.

Moving from tanh to relu got us to 0.767 (0.95) after 19 epochs. Added GaussianNoise(0.1) and replaced Dropout with SpatialDropout(0.1). Ended at 0.658 (0.853) after 18 epochs. At this point I realised some the image set hadn't been standardised (as RGB). So I tried that again with SpatialDropout turned down to 0.05. Tried some synthetic data, things got worse. Back up some... take out all but L^2 reg, get 0.614 (0.930) after 8 epochs.

Since we still get high scores on the training set it appears the network is expressive enough (at blocks of 2, with 64/128/128 filters) to handle most of that, and getting this generalisation difference down is what we need.

So, 0.65 (0.93) after 14 epochs, with regular dropout. Next try, reintroduce gaussian noise at sigma=0.04; got 0.536 (0.88) at epoch 16. Add Dense(512) before the end; got 0.58 (0.95) at epoch 20. Return SpatialDropout, got 0.582 (0.95). Adding some synthetics, 0.608 (0.956).

In [23]:
this_model3.fit(train_img_synth, train_lbl_synth, validation_data=(val_img_small,val_lbl_small), epochs=20)

Epoch 1/20
Epoch 2/20
Epoch 3/20
Epoch 4/20
Epoch 5/20
Epoch 6/20
Epoch 7/20
Epoch 8/20
Epoch 9/20
Epoch 10/20

KeyboardInterrupt: 

<b>Second pass</b>: The next model is a test of making a skip ("residual") connection in the network. The output of the first layer of each block becomes part of the output of the next layer. Since my blocks only have 2 layers in them this involves shrinking the output to match the size at the next block. Results were not encouraging, but I didn't try for too long.

In [19]:
class testSkipModel(tf.keras.Model):
    def __init__(self, labels, filters, rec_field, dropout_prob = 0.2, reg_coef = 0.001):
        super(testSkipModel, self).__init__()
        filters_1, filters_2, filters_3 = filters
        regulator = tf.keras.regularizers.L2(reg_coef)

        self.conv_1a = tf.keras.layers.Convolution2D(filters_1, 5, padding='same', use_bias=True, 
                                                     activation='tanh', kernel_regularizer=regulator)
        self.conv_1b = tf.keras.layers.Convolution2D(filters_1, 3, padding='same', use_bias=True, 
                                                     activation='tanh', kernel_regularizer=regulator)
    
        self.conv_2a = tf.keras.layers.Convolution2D(filters_2, 3, padding='same', use_bias=True, 
                                                     activation='tanh', kernel_regularizer=regulator)
        self.conv_2b = tf.keras.layers.Convolution2D(filters_2, 3, padding='same', use_bias=True, 
                                                     activation='tanh', kernel_regularizer=regulator)
        
        self.conv_3a = tf.keras.layers.Convolution2D(filters_3, 3, padding='same', use_bias=True, 
                                                     activation='tanh', kernel_regularizer=regulator)
        self.conv_3b = tf.keras.layers.Convolution2D(filters_3, 3, padding='same', use_bias=True, 
                                                     activation='tanh', kernel_regularizer=regulator)
        
        self.collate = tf.keras.layers.Dense(labels, kernel_regularizer=regulator, activation='softmax')
        
    def call(self, input_tensor):
        #out = tf.keras.layers.Rescaling(1. / 255)(input_tensor)
        out = tf.keras.layers.Dropout(0.2)(self.conv_1a(input_tensor))
        out_temp = tf.keras.layers.MaxPool2D(strides=(2,2))(out)
        out = tf.keras.layers.MaxPool2D(strides=(2,2))(self.conv_1b(out))
        
        out = tf.raw_ops.Concat(concat_dim=3, values=[out, out_temp]) # skip connection from 1a
        out = tf.keras.layers.Dropout(0.2)(self.conv_2a(out))
        out_temp = tf.keras.layers.MaxPool2D(strides=(2,2))(out)
        out = tf.keras.layers.MaxPool2D(strides=(2,2))(self.conv_2b(out))
        
        out = tf.raw_ops.Concat(concat_dim=3, values=[out, out_temp]) # skip connection from 2a
        out = tf.keras.layers.Dropout(0.2)(self.conv_3a(out))
        out_temp = tf.keras.layers.MaxPool2D(strides=(2,2))(out)
        out = tf.keras.layers.MaxPool2D(strides=(2,2))(self.conv_3b(out))
        
        out = tf.raw_ops.Concat(concat_dim=3, values=[out, out_temp]) # skip connection from 3a        
        out = tf.keras.layers.Flatten()(out)
        out = self.collate(out)
        return out  

In [20]:
testSkipper = testSkipModel(50, (64,128,128),3)
testSkipper.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=0.0001),
                loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                metrics=['accuracy'])

In [None]:
testSkipper.fit(train_img_small, train_lbl_small, validation_data=(val_img_small, val_lbl_small), epochs=20)

<b>Third pass</b>: At this point I started experiments with the "hand geometry" output of the MediaPipe detector, which places its 21 landmarks in space. Curiously, the detector has a bit of trouble with my working dataset, only detecting a hand in about 80% of it. I do know that the detector is sensitive to colour: swapping blue/red channels will lead to non-detection. Likewise greyscale is a problem. These are not the conditions it was trained for, apparently.

Where the hand landmark data is available, it's enough alone for better results than the short CNNs I tried before. (The landmark data was previously normalised in position, orientation, and chirality.) The best score I got was 0.867 val_acc (0.94 train), with three dense layers of 256/256/256 units.

In [65]:
train_geom = np.load("datasets/train_geom.npy")
train_lbl = np.load("datasets/train_geom_lbl.npy")
val_geom = np.load("datasets/val_geom.npy")
val_lbl = np.load("datasets/val_geom_lbl.npy")
train_geom = train_geom.reshape((train_geom.shape[0],63))
val_geom = val_geom.reshape((val_geom.shape[0], 63))

In [62]:
dropout_prob = 0.2
reg_coef = 0.0001
regulator = tf.keras.regularizers.L2(reg_coef)
rng = np.random.default_rng()
layers = 4
seeds = [rng.integers(0,1024) for j in range(layers)]
inits = [tf.keras.initializers.Orthogonal(seeds[j]) for j in range(layers)]

geomModel = tf.keras.models.Sequential([#tf.keras.layers.Flatten(),
                                       tf.keras.layers.Dense(256, activation='tanh', 
                                                             #kernel_initializer = inits[0],
                                                             kernel_regularizer=regulator),
                                        #tf.keras.layers.Dropout(dropout_prob),
                                       tf.keras.layers.Dense(256, activation='tanh', 
                                                             #kernel_initializer = inits[1],
                                                             kernel_regularizer=regulator),
                                        #tf.keras.layers.Dropout(dropout_prob),
                                       tf.keras.layers.Dense(256, activation='tanh', 
                                                             #kernel_initializer = inits[2],
                                                             kernel_regularizer=regulator),
                                        #tf.keras.layers.Dropout(0.5),
                                       tf.keras.layers.Dense(50, activation='softmax', 
                                                             #kernel_initializer = inits[3],
                                                             kernel_regularizer=regulator)])

I looked into tensorflow's options for weight initialisation. Almost all of them are random initialisers, with various distributions (uniform or normal) and variances (people have looked at different normalisations in the quest to make training networks more tractable). The exception is the orthogonal initialiser, which essentially generates a random matrix like the others and then performs Gram-Schmidt/singular value decomposition on it to give an orthogonal matrix of weights.

In terms of val_acc achieved, orthogonal initialisation did not yield improvement. It did yield a puzzle: although its accuracy scores are very close to the ordinary random initialisers given like amounts of training time, the reported cross-entropy loss was much higher, by a factor of tens of thousands. (In principle there is no upper-limit to the cross-entropy, the model simply needs to give high enough confidence to a particular wrong answer.) Curious, I tried letting it run for a long time, hundreds of epochs (with the network small enough that this was a matter of minutes rather than days). The cross-entropy does eventually come down, but the accuracy does nothing special. This sort of behaviour makes me think there must be interesting things to say about (for lack of a better expression) the dynamics of NN learning, but I don't know what they might be.

In [63]:
learning_rate=0.0001
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
                    learning_rate,
                    decay_steps=20000,
                    decay_rate=0.9,
                    staircase=True)

geomModel.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=lr_schedule),
                loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                metrics=['accuracy'])

In [72]:
geomModel.fit(train_geom, train_lbl, validation_data=(val_geom, val_lbl), epochs=10)

Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<keras.callbacks.History at 0x1ea6e230340>

In [None]:
epochs = 200
for j in range(0, epochs):
    geomModel.fit(train_geom, train_lbl, epochs=1)
    geomModel.evaluate(val_geom, val_lbl, verbose=2)
# 200 epochs later... "you haven't converged or blown up yet? another round! Adam, what a dogged searcher."

<b>Fourth pass</b>: models combining image and geometric data. I'm looking at an attention-type mechanism where a short network uses the geometry to make weights for the convolutional network. Since the geometric data isn't there for every frame it also tries to train a 'back-up' layer just from the image data. It works better than previous tries. There's still a lot I don't know.

In [None]:
# basic datasets: train and val at 64x64
train_img_small = np.load('datasets/train_img_64.npy')
train_img_small = np.float32(train_img_small) / 255
train_geom = np.concatenate([np.load("datasets/train_geom_img.npy"), np.load("datasets/train_geom_wrl.npy")], axis=1)
train_geom = train_geom.reshape((-1, 21*6))
train_lbl = np.load('datasets/train_lbl.npy')
print(train_img_small.shape, train_geom.shape, train_lbl.shape)

In [9]:
val_img_small = np.load('datasets/val_img_64.npy')
val_img_small = np.float32(val_img_small) / 255
val_geom = np.concatenate([np.load("datasets/val_geom_img.npy"), np.load("datasets/val_geom_wrl.npy")], axis=1)
val_geom = val_geom.reshape((-1,21*6))
val_lbl = np.load('datasets/val_lbl.npy')
print(val_img_small.shape, val_geom.shape, val_lbl.shape)

(5793, 64, 64, 3) (5793, 126) (5793,)


In [12]:
# synthetic data pre-generated by albumentations
train_img_synth = np.concatenate([np.load('datasets/train_img_64_synth.npy'), 
                                  np.load('datasets/train_img_64.npy')], axis=0)
train_lbl_synth = np.concatenate([np.load('datasets/train_lbl_64_synth.npy'), 
                                  np.load('datasets/train_lbl.npy')], axis=0)
train_geom_synth = np.concatenate([np.concatenate([np.load('datasets/train_geom_img_64_synth.npy'), 
                                                   np.load('datasets/train_geom_img.npy')], axis=1),
                                   np.concatenate([np.load('datasets/train_geom_wrl_64_synth.npy'),
                                                   np.load('datasets/train_geom_wrl.npy')], axis=1)],
                                   axis=0)
train_img_synth = np.float32(train_img_synth) / 255
train_geom_synth = train_geom_synth.reshape((-1, 21*6))
print(train_img_synth.shape, train_geom_synth.shape, train_lbl_synth.shape)

(53118, 64, 64, 3) (53118, 126) (53118,)


Tensorboard is a profiling add-on. it can tell you lots of things about the statistics of your model's weights,
how much time it takes doing what operations, and a lot more. I've barely taken a look.

https://www.tensorflow.org/tensorboard

In [58]:
%load_ext tensorboard

In [59]:
import datetime
log_dir = "logs/fit/" + datetime.datetime.now().strftime("%Y%m%d-%H%M%S")
tensorboard_callback = tf.keras.callbacks.TensorBoard(log_dir=log_dir, histogram_freq=1, profile_batch = '2000,2010')
# the data it logs can take up a lot of space, so they recommend using it only for 10 or 20 steps to gather its statistics,
# and not steps at the beginning, where there can be overhead etc.

In [None]:
%tensorboard --logdir logs/fit

In [None]:
# Regularisation stuff...
# tf.keras.layers.GaussianNoise(noise_sigma) (last used sigma = 0.04)
# tf.keras.regularizers.L2(reg_coef) (last used coef 0.001 or 0.0001?)
# tf.keras.layers.Dropout(dropout_prob) (last used prob = 0.2)
# tf.keras.layers.SpatialDropout2D(spatial_dropout_prob) (last used prob = 0.05)

# Augmentation stuff -- when using gpu it's advised to stick this on the dataset; as long as the preprocessing
# consists only of tensorflow Graph-able operations it'll be executed in parallel when data is about to be called from it

# train_img_tf = tf.data.Dataset.from_tensor_slices(train_img_small)
# train_img_tf.map(lambda x: pre_process(x)), where pre_process could be a keras.Sequential object

# tf.keras.layers.RandomBrightness(factor, value_range=(0, 1)) (factor = pair of floats in [-1,1])
# tf.keras.layers.RandomContrast(factor in [0,1])
# tf.keras.layers.RandomFlip(mode='horizontal')
# tf.keras.layers.RandomRotation(fill_mode='constant', factor in [0,1]), rotation up to angle factor*2pi
# tf.keras.layers.RandomZoom(height_factor=0.2, fill_mode='constant')  default arg width_factor=None preserves aspect ratio

In [281]:
def random_flip(seq):
    if tf.random.categorical(tf.math.log([[0.5, 0.5]]), 1).numpy()[0][0]:
        return tf.raw_ops.Reverse(tensor=seq, dims=[False,False,False,True,False])
    return seq

def random_augment(seq):
    seq = random_flip(seq)
    seq = tf.image.random_brightness(seq, 0.15)
    seq = tf.image.random_saturation(seq, 0.85, 1.15)
    seq = tf.image.random_contrast(seq, 0.85, 1.15)
    #seq = tf.image.random_hue(seq, 0.01)
    return tf.raw_ops.ClipByValue(t=seq, clip_value_min=0, clip_value_max=1)

In [13]:
synthesiser_train = tf.keras.Sequential([tf.keras.layers.RandomBrightness(0.2, value_range=(0,1)),
                                    tf.keras.layers.RandomFlip(mode = 'horizontal'),
                                    tf.keras.layers.RandomRotation(0.05, fill_mode='constant'),
                                    tf.keras.layers.RandomZoom(height_factor=0.2, fill_mode='constant')])
batch_size = train_img_synth.shape[0]
train_img_tf = tf.data.Dataset.from_tensor_slices(train_img_synth)
train_geom_tf = tf.data.Dataset.from_tensor_slices(train_geom_synth).batch(batch_size).get_single_element()


# this makes a dataset object with an attached function, rather than just applying a function once to its tensors
train_synth = train_img_tf.map(lambda x: synthesiser_train(x),
                                 num_parallel_calls=batch_size).batch(batch_size)
train_proc = train_synth.get_single_element()

In [19]:
class testAttentionModel(tf.keras.Model):
    def __init__(self, conv_filters, reg_coef=0, labels=50, use_geom_backup=True):
        super(testAttentionModel, self).__init__()
        filters_1, filters_2, filters_3 = conv_filters
        conv_out_size = filters_3
        self.reg = tf.keras.regularizers.L2(reg_coef)
        self.spatial_dropout_prob = 0.02
        self.dropout_prob = 0.1
        self.use_geom_backup = use_geom_backup
        
        
        # 64x64xch
        self.conv_1a = tf.keras.layers.Convolution2D(filters_1, 5, padding='same', use_bias=True, 
                                                     activation='relu',
                                                     kernel_regularizer=self.reg)
        self.conv_1b = tf.keras.layers.Convolution2D(filters_1, 3, padding='same', use_bias=True, 
                                                     activation='relu',
                                                     kernel_regularizer=self.reg)
        # 32x32xch
        self.conv_2a = tf.keras.layers.Convolution2D(filters_2, 3, padding='same', use_bias=True, 
                                                     activation='relu',
                                                     kernel_regularizer=self.reg)
        self.conv_2b = tf.keras.layers.Convolution2D(filters_2, 3, padding='same', use_bias=True,
                                                     activation='relu',
                                                     kernel_regularizer=self.reg)
        # 16x16xch
        self.conv_3a = tf.keras.layers.Convolution2D(filters_3, 3, padding='same', use_bias=True, 
                                                     activation='relu',
                                                    kernel_regularizer=self.reg)
        self.conv_3b = tf.keras.layers.Convolution2D(filters_3, 3, padding='same', use_bias=True, 
                                                     activation='relu',
                                                     kernel_regularizer=self.reg)
        # 8x8xch
        #self.conv_4a = tf.keras.layers.Convolution2D(filters_4, 3, padding='same', use_bias=True, activation='relu')
                                                    # activation='tanh', kernel_regularizer=regulator)
        #self.conv_4b = tf.keras.layers.Convolution2D(filters_4, 3, padding='same', use_bias=True, activation='relu')
                                                     #activation='tanh', kernel_regularizer=regulator)
        # out: 4x4xch
        
        self.geom1 = tf.keras.layers.Dense(64, use_bias=True, activation='relu', kernel_regularizer=self.reg)
        self.geom_backup = tf.keras.layers.Dense(64, use_bias=True, activation='relu', kernel_regularizer=self.reg)
        self.geom2 = tf.keras.layers.Dense(64, use_bias=True, activation='relu', kernel_regularizer=self.reg)
        self.attention = tf.keras.layers.Dense(conv_out_size, use_bias=True, 
                                               activation='softmax', kernel_regularizer=self.reg)

        self.classifier = tf.keras.layers.Dense(labels, activation='softmax')
    
    def call(self, input_list, training=True):
        c_out = tf.keras.layers.GaussianNoise(0.03)(input_list[0], #start 64x64x3
                                                    training=training)
        c_out = tf.keras.layers.SpatialDropout2D(self.spatial_dropout_prob)(self.conv_1a(c_out),
                                                                            training=training)
        c_out = tf.keras.layers.MaxPool2D(strides=(2,2))(self.conv_1b(c_out)) # to 32x32xch
        c_out = tf.keras.layers.SpatialDropout2D(self.spatial_dropout_prob)(self.conv_2a(c_out),
                                                                            training=training)
        c_out = tf.keras.layers.MaxPool2D(strides=(2,2))(self.conv_2b(c_out))
        c_out = tf.keras.layers.SpatialDropout2D(self.spatial_dropout_prob) (self.conv_3a(c_out),
                                                                            training=training)
        c_out = tf.keras.layers.MaxPool2D(strides=(2,2))(self.conv_3b(c_out)) # to 16x16xch
        #c_out = self.conv_4a(c_out)
        #c_out = tf.keras.layers.MaxPool2D(strides=(2,2))(self.conv_4b(c_out))
       
        if tf.math.reduce_max(input_list[1]) == 0 and self.use_geom_backup:
            g_out = self.geom_backup(tf.keras.layers.Flatten()(tf.keras.layers.AveragePooling2D(pool_size=(4, 4),
                                                                                                strides=(4,4),
                                                                                                padding='valid')(input_list[0])))
        else:
            g_out = self.geom1(input_list[1]) # if use_geom_backup is off this will output max(0, bias)
        g_out = tf.keras.layers.Dropout(self.dropout_prob)(g_out, training=training)
        g_out = tf.keras.layers.Dropout(self.dropout_prob)(self.geom2(g_out),training=training)
        g_out = tf.keras.layers.Dropout(self.dropout_prob)(self.attention(g_out),training=training)
        g_out = tf.expand_dims(tf.expand_dims(g_out, axis=-2), axis=-2)
       
        return self.classifier(tf.keras.layers.Flatten()(tf.math.multiply(c_out, g_out)))
    
    #def build_graph(self):
    #    in1 = tf.keras.layers.Input(shape=(64,64,3))
    #    in2 = tf.keras.layers.Input(shape=(63))
    #    return tf.keras.Model(inputs=[in1,in2], 
    #                          outputs=self.call([in1,in2]))

In [20]:
testAttender = testAttentionModel((64,128,256), reg_coef=0.0001)

In [263]:
# using the 'functional API' of Keras 
testAttenderX.build_graph().summary()

Model: "model_9"
__________________________________________________________________________________________________
 Layer (type)                   Output Shape         Param #     Connected to                     
 input_34 (InputLayer)          [(None, 64, 64, 3)]  0           []                               
                                                                                                  
 gaussian_noise_7 (GaussianNois  (None, 64, 64, 3)   0           ['input_34[0][0]']               
 e)                                                                                               
                                                                                                  
 conv2d_80 (Conv2D)             (None, 64, 64, 64)   4864        ['gaussian_noise_7[0][0]']       
                                                                                                  
 spatial_dropout2d_11 (SpatialD  (None, 64, 64, 64)  0           ['conv2d_80[1][0]']        

In [17]:
# to get previously saved weights, make a model, start and stop fit, then call this to load
testAttender.load_weights("testAttender.h5")

In [21]:
learning_rate=0.0001
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
                    learning_rate,
                    decay_steps=5000,
                    decay_rate=0.9,
                    staircase=True)

testAttender.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=lr_schedule),
                loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                metrics=['accuracy'])

Results: 0.790 (0.964) with 3 blocks 64/128/256. Reg at 0.001 didn't help, 0.77 (0.966). (At this point cut half of the dataset grass.) Added spatial/regular dropout at 0.04/0.2, reg=0.001. Slower, val stalled around .745 (tr continued up to .93). adding in synth data, tr_acc (on the same model) went down to .745 too. but though it recovered val did not.

With everything on (noise, dropouts, reg, pre-built synth, train-time synth), up to .82 (.96).

In [None]:
testAttender.fit([train_proc, train_geom_tf], train_lbl_synth,
                 validation_data=([val_img_small, val_geom], val_lbl),
                 epochs=80)

Epoch 1/80

In [51]:
# I have not yet been able to save and load a model in any other form than this; the tf native options require
# some syntax knowledge I do not possess
testAttender.save_weights("testAttender.h5")

In [40]:
# why two copies of the same object? because tensorflow handles batch size in a way I don't understand, and
# using the same one in two places raises errors
synthesiser_val = tf.keras.Sequential([tf.keras.layers.RandomBrightness(0.2, value_range=(0,1)),
                                    tf.keras.layers.RandomFlip(mode = 'horizontal'),
                                    tf.keras.layers.RandomRotation(0.05, fill_mode='constant'),
                                    tf.keras.layers.RandomZoom(height_factor=0.2, fill_mode='constant')])
batch_val = val_img_small.shape[0]
val_img_tf = tf.data.Dataset.from_tensor_slices(val_img_small)
val_geom_tf = tf.data.Dataset.from_tensor_slices(val_geom_full).batch(batch_val).get_single_element()
val_lbl_tf = tf.data.Dataset.from_tensor_slices(val_lbl_small).batch(batch_val).get_single_element()

val_synth_ds = val_img_tf.map(lambda x: synthesiser_val(x),
                             num_parallel_calls=batch_val).batch(batch_val)
val_proc = val_synth_ds.get_single_element()

In [244]:
# this model tries out test-time data augmentation; that is, given an image it generates some random synthetic frames 
# from it, gives those to the underlying trained model, and returns their averaged probabilities.

# however, I was never able to get it even to run. the basic difficulty is: (1) tf tensors are usually immutable, so can't act
# as accumulators; there is a tf.Variable object, but (2) tf does not allow the creation of tf.Variables within a function
# that gets called more than once, while (3) a variable created at object initialisation doesn't know what shape it's supposed
# to be, and whatever I try to tell it later it tells me I'm wrong.

# After all that, I think the correct way to do this is (1) the variable must be passed to the model at execution time,
# while (2) making a custom training loop procedure that makes explicit various stuff that tf/keras does automatically for 
# other types of models. I haven't tried yet.

class testPollModel(tf.keras.Model):
    def __init__(self, polled_model, size):
        super(testPollModel, self).__init__()
        self.size = size
        self.polled_model = polled_model
        
        #self.vote = tf.Variable(tf.zeros_initializer(), shape=tf.TensorShape([None,50]))
        self.vote = None
        
    def call(self, input_list):
        #self.vote = self.vote*0
        #self.vote.set_shape(tf.TensorShape((None,50)))
        #self.vote.assign_add(-vote)
        #if self.vote is None:
        #    self.vote = tf.Variable(tf.zeros(shape=(1,50), dtype='float32'), trainable=False)
            #self.vote = self.initialiser(shape=[-1,50])
        #self.vote.set_shape(tf.TensorShape((None,50)))
        #self.vote.assign(tf.transpose(self.polled_model(input_list=input_list, training=False))[0])
        #tf.squeeze() ? 
        #self.vote.assign(self.polled_model(input_list=input_list,
        #                                   training=False))
        ta = tf.TensorArray(dtype=tf.float32, size=0, dynamic_size=True)
      
        for j in range(self.size):
            #self.vote.assign_add
            ta = ta.write(j, self.polled_model([synthesiser_train(input_list[0]), input_list[1]], training = False))
            #self.vote.assign_add(self.polled_model([synthesiser_train(input_list[0]), input_list[1]], 
            #                                      training = False))
        #ret = self.vote / self.size
        #self.vote.assign_add(-self.vote)
        print(ta.size)
        return tf.keras.layers.Average()(ta)

In [245]:
# never yet worked
for j in range(2, 5):
    testPoller = testPollModel(testAttender,j)
    testPoller.compile()
    testPoller.evaluate([val_img_small, val_geom_full], val_lbl_small, verbose=2)

<bound method TensorArray.size of <tensorflow.python.ops.tensor_array_ops.TensorArray object at 0x000001F77C3F5AF0>>


TypeError: in user code:

    File "C:\Users\balin\anaconda3\lib\site-packages\keras\engine\training.py", line 1727, in test_function  *
        return step_function(self, iterator)
    File "C:\Users\balin\anaconda3\lib\site-packages\keras\engine\training.py", line 1713, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "C:\Users\balin\anaconda3\lib\site-packages\keras\engine\training.py", line 1701, in run_step  **
        outputs = model.test_step(data)
    File "C:\Users\balin\anaconda3\lib\site-packages\keras\engine\training.py", line 1665, in test_step
        y_pred = self(x, training=False)
    File "C:\Users\balin\anaconda3\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler
        raise e.with_traceback(filtered_tb) from None
    File "C:\Users\balin\AppData\Local\Temp\__autograph_generated_filemal1hvsh.py", line 28, in tf__call
        retval_ = ag__.converted_call(ag__.converted_call(ag__.ld(tf).keras.layers.Average, (), None, fscope), (ag__.ld(ta),), None, fscope)

    TypeError: Exception encountered when calling layer "test_poll_model_42" "                 f"(type testPollModel).
    
    in user code:
    
        File "C:\Users\balin\AppData\Local\Temp\ipykernel_7156\1035348579.py", line 34, in call  *
            return tf.keras.layers.Average()(ta)
        File "C:\Users\balin\anaconda3\lib\site-packages\keras\utils\traceback_utils.py", line 70, in error_handler  **
            raise e.with_traceback(filtered_tb) from None
        File "C:\Users\balin\anaconda3\lib\site-packages\keras\layers\merging\base_merge.py", line 84, in build
            if not isinstance(input_shape[0], tuple):
    
        TypeError: 'NoneType' object is not subscriptable
    
    
    Call arguments received by layer "test_poll_model_42" "                 f"(type testPollModel):
      • input_list=('tf.Tensor(shape=(None, 64, 64, 3), dtype=float32)', 'tf.Tensor(shape=(None, 63), dtype=float32)')


In [227]:
# Keras functional API style attempt at the same thing. this also did not work.
input_img = tf.keras.Input(shape=(64,64,3))
input_geom = tf.keras.Input(shape=(63,))
y1 = testAttender([input_img, input_geom], training=False)
y2 = testAttender([input_img, input_geom], training=False)
y3 = testAttender([input_img, input_geom], training=False)
outputs = tf.keras.layers.Average()([y1, y2, y3])
ensemble_model = tf.keras.Model(inputs=[input_img, input_geom], outputs=outputs)

In [106]:
# nor could I get it to work on even a single example, because tf objects to calling testAttender on a single image rather
# than a batch.
vote = tf.Variable(np.zeros((1, 50), dtype='float32'), trainable=False)
vote.set_shape(tf.TensorShape([None,50]))

<b>Fifth pass</b>: stick in one of the pre-trained models carefully made by various google teams and see how it does. 

In [1]:
import numpy as np
import tensorflow as tf
import tensorflow_hub as hub

In [6]:
# first problem: these models have much larger input sizes, like 224x224. just generating our data takes a lot more available
# memory. (I think there are ways to get python to do explicit file-swapping to cache part of the variable, but I don't
# know how they work.) trying to put it in memory as float32, which is the standard the pre-trained image models use, is
# four times as big a problem. the answer here is to keep it in memory as uint8 and do the rescaling dynamically; this probably
# entails a slight performance hit, since I'm running everything on CPU, but it's negligable compared to the work it does
# computing all those convolutional filters
train_img_224 = np.load('datasets/train_img_224.npy')
train_geom = np.concatenate([np.load("datasets/train_geom_img.npy"), np.load("datasets/train_geom_wrl.npy")], axis=1)
train_geom = train_geom.reshape((-1, 21*6))
train_lbl = np.load('datasets/train_lbl.npy')
print(train_img_224.shape, train_geom.shape, train_lbl.shape)

In [7]:
val_img_224 = np.load('datasets/val_img_224.npy')
val_geom = np.concatenate([np.load("datasets/val_geom_img.npy"), np.load("datasets/val_geom_wrl.npy")], axis=1)
val_geom = val_geom.reshape((-1,21*6))
val_lbl = np.load('datasets/val_lbl.npy')
print(val_img_224.shape, val_geom.shape, val_lbl.shape)

The first try was one of google's 'lightweight' models meant to work on devices with little computing power (like phones, I think). 

The model's page at tfhub, https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/feature_vector/5, explains how its input
and output work: "The output is a batch of feature vectors. For each input image, the [output] feature vector has size num_features = 1280.... The input images are expected to have color values in the range [0,1], following the common image input conventions. For this model, the size of the input images is fixed to height x width = 224 x 224 pixels."

In [18]:
# for the first try, an older version where the attention layer didn't take in the imported net's output.
# we eventually got a small improvement, up to 0.84 val accuracy (compared to 0.82 from the best of my 'toy' models)
testImport.fit([train_img_224, train_geom], train_lbl,
                 validation_data=([val_img_224, val_geom], val_lbl),
                 epochs=80)

Epoch 1/80
Epoch 2/80
Epoch 3/80
Epoch 4/80
Epoch 5/80
Epoch 6/80
Epoch 7/80
Epoch 8/80
Epoch 9/80
Epoch 10/80
Epoch 11/80
Epoch 12/80
Epoch 13/80
Epoch 14/80
Epoch 15/80
Epoch 16/80
Epoch 17/80
Epoch 18/80
Epoch 19/80
Epoch 20/80
Epoch 21/80
Epoch 22/80
Epoch 23/80
Epoch 24/80
Epoch 25/80
Epoch 26/80
Epoch 27/80
Epoch 28/80
Epoch 29/80
Epoch 30/80
Epoch 31/80
Epoch 32/80
Epoch 33/80
Epoch 34/80
Epoch 35/80
Epoch 36/80
Epoch 37/80
Epoch 38/80
Epoch 39/80
Epoch 40/80
Epoch 41/80
Epoch 42/80
Epoch 43/80
Epoch 44/80
Epoch 45/80
Epoch 46/80
Epoch 47/80
Epoch 48/80
Epoch 49/80
Epoch 50/80
Epoch 51/80
Epoch 52/80
Epoch 53/80
Epoch 54/80
Epoch 55/80
Epoch 56/80


Epoch 57/80
Epoch 58/80
Epoch 59/80
Epoch 60/80
Epoch 61/80
Epoch 62/80
Epoch 63/80
Epoch 64/80
Epoch 65/80
Epoch 66/80
Epoch 67/80
Epoch 68/80
Epoch 69/80
Epoch 70/80
Epoch 71/80
Epoch 72/80
Epoch 73/80
Epoch 74/80
Epoch 75/80
Epoch 76/80
Epoch 77/80
Epoch 78/80
Epoch 79/80
Epoch 80/80


<keras.callbacks.History at 0x258e3d0c760>

Next try is https://tfhub.dev/google/imagenet/inception_resnet_v2/feature_vector/5. This one has input size 299x299, "but other input sizes are possible (within limits)," and output size 1536. There's no chance I can store the images at that size but it turns out the 224x224 I now have is within limits. 

Since this one takes about 35m an epoch I had time to go back to google's <b>Colaboratory</b>, https://colab.research.google.com/, which it turns out is very easy to use. Colab takes only a couple lines to mount your google Drive, 

    from google.colab import drive
    drive.mount('/content/drive')

from which it can load files efficiently (compared to uploading them for an individual runtime session, which takes a long time). The root directory is 

    /content/drive/MyDrive/

followed by whatever folder/file structure you put on Drive. After that, exactly the same code as here will run on a server far away. My first try there is https://tfhub.dev/google/imagenet/inception_v3/feature_vector/5, but it takes about 55m an epoch, so I'm waiting on results.

In [21]:
testImport.fit([train_img_224, train_geom], train_lbl,
                 validation_data=([val_img_224, val_geom], val_lbl),
                 epochs=50)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50

KeyboardInterrupt: 

In [33]:
# one thing I haven't incorporated yet is putting the CNN output as input to the attention mechanism, which is what people
# usually do.

class testImporter(tf.keras.Model):
    def __init__(self, import_url="https://tfhub.dev/google/imagenet/mobilenet_v2_100_224/feature_vector/5",
                       import_output = 1280,
                       dropout_prob=0.5,
                       reg_coef = 0.0001,
                        use_geom_backup=True):
        super(testImporter2, self).__init__()
        self.dropout_prob = dropout_prob
        self.reg = tf.keras.regularizers.L2(reg_coef)
        
        self.pretrained = hub.KerasLayer(import_url, trainable=False)
        self.geom1 = tf.keras.layers.Dense(64, activation='relu', kernel_regularizer=self.reg)
        self.geom_backup = tf.keras.layers.Dense(64, activation='relu', kernel_regularizer=self.reg)
        self.geom2 = tf.keras.layers.Dense(64, activation='relu', kernel_regularizer=self.reg)
        self.geom3 = tf.keras.layers.Dense(import_output,activation='softmax', kernel_regularizer=self.reg)
        self.classifier = tf.keras.layers.Dense(50, activation='softmax')
        
        self.flattener = tf.keras.layers.Flatten()
        self.avgpooler = tf.keras.layers.AveragePooling2D(pool_size=(4, 4), strides=(4,4), padding='valid')
    
    def call(self, input_list):
        c_out = self.pretrained(tf.image.convert_image_dtype(input_list[0], dtype=tf.float32))
        if tf.math.reduce_max(input_list[1]) == 0 and use_geom_backup:
            g_out = self.geom_backup(self.flattener(self.avgpooler(self.scaler(input_list[0]))))
        else:
            g_out = self.geom1(input_list[1])
        g_out = tf.keras.layers.Dropout(self.dropout_prob)(g_out)
        g_out = tf.keras.layers.Dropout(self.dropout_prob)(self.geom2(g_out))
        g_out = tf.concat([g_out, c_out], axis=-1)
        g_out = tf.keras.layers.Dropout(self.dropout_prob)(self.geom3(g_out))
       
        return self.classifier(tf.math.multiply(c_out, g_out))

In [34]:
testImport = testImporter()
testImport.compile(optimizer=tf.keras.optimizers.Adam(learning_rate=lr_schedule),
                loss=tf.keras.losses.SparseCategoricalCrossentropy(),
                metrics=['accuracy'])

In [35]:
testImport.fit([train_img_224, train_geom], train_lbl,
                 validation_data=([val_img_224, val_geom], val_lbl),
                 epochs=50)

Epoch 1/50
Epoch 2/50
Epoch 3/50
Epoch 4/50
Epoch 5/50
Epoch 6/50
Epoch 7/50
Epoch 8/50
Epoch 9/50
Epoch 10/50
Epoch 11/50
Epoch 12/50
Epoch 13/50
Epoch 14/50
Epoch 15/50
Epoch 16/50
Epoch 17/50
Epoch 18/50
Epoch 19/50
Epoch 20/50
Epoch 21/50
Epoch 22/50
Epoch 23/50
Epoch 24/50
Epoch 25/50
Epoch 26/50
Epoch 27/50
Epoch 28/50
Epoch 29/50
Epoch 30/50
Epoch 31/50
Epoch 32/50
Epoch 33/50
Epoch 34/50
Epoch 35/50
Epoch 36/50
Epoch 37/50
Epoch 38/50
Epoch 39/50
Epoch 40/50
Epoch 41/50
Epoch 42/50
Epoch 43/50
Epoch 44/50
Epoch 45/50
Epoch 46/50
Epoch 47/50
Epoch 48/50
Epoch 49/50
Epoch 50/50


<keras.callbacks.History at 0x258d40c5760>

tf hub also has models made to work on video input. I don't know anything about them, I'm just writing this down to investigate later. 

https://tfhub.dev/shoaib6174/swin_base_patch244_window877_kinetics600_22k/1

shape_of_input = [1,3,32,224,224]   # [batch_size, channels, frames, height, width]

"output shape will be [1,768*******]" (I don't know what that means).

another example: https://tfhub.dev/deepmind/i3d-kinetics-600/1, https://github.com/deepmind/kinetics-i3d

...

An alternate mp.Hands version for cropped images? https://tfhub.dev/mediapipe/tfjs-model/handpose_3d/landmark/full/1

"PNAS" https://tfhub.dev/google/imagenet/pnasnet_large/feature_vector/5

The first block (2 layers) of vgg-19 https://tfhub.dev/emilutz/vgg19-block1-conv2-unpooling-encoder/1

The first 3 blocks (6 layers) of vgg19. this is exactly the architecture I was using before, so the comparison here may give an idea of how much performance gain there is to get from the pre-trained models. https://tfhub.dev/emilutz/vgg19-block3-conv2-unpooling-encoder/1

and the first 3.5 blocks (8 layers) of vgg19. https://tfhub.dev/emilutz/vgg19-block4-conv2-unpooling-encoder/1

and the first 5 blocks (12 layers) of vgg19 https://tfhub.dev/emilutz/vgg19-block5-conv2-unpooling-encoder/1

"EfficientNet" apparently gets top-quality results with far less computation required. https://tfhub.dev/google/imagenet/efficientnet_v2_imagenet21k_b0/feature_vector/2

MobileNet: 40% more features than the one we looked at previously https://tfhub.dev/google/imagenet/mobilenet_v2_140_224/feature_vector/5

MobileNetV3: https://tfhub.dev/google/imagenet/mobilenet_v3_large_100_224/feature_vector/5