# PalmTouch using a Convolutional Neural Network (TensorFlow)
CNNs are the recent state-of-the-art method for image classification. As blobs are represented by low-resolution images, we implemented a CNNs using TensorFlow. We performed a grid search over the number of layers, filters and their sizes in steps of 1, the number of neurons in the fully connected layer in steps of 50, as well as activation functions and optimizers provided by TensorFlow. Our final network architecture is shown in Table 1 in the paper. We trained the CNN using AdaGrad as the optimizer with a batch size of 100 and used the Xavier initialization scheme to initialize the network weights. We initialized the biases with random values from a normal distribution. An exponential decay (rate = 0.2 in 1000 steps) was used to decrease the initial learning rate of 0.009. We used L2 Regularization to compensate overfitting by adding 0.01 of the weights to the cost function. Moreover, we used an early stopping approach as proposed by Caruana et al. to further avoid overfitting. While we experimented with batch normalization, this did not improve the overall accuracy. Our CNN achieved an accuracy of 99.58% (prec = 99.38% , rec = 99.28%) which is the highest of all presented approaches.

The following code is based on an example for convolutional neural networks (https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/02_Convolutional_Neural_Network.ipynb).

In [1]:
tensorflowFolder = "."
STORAGE_NAME = 'CNN-PalmTouch'

GPU_USE= '/gpu:0'
DATA_PATH = './Dataset/'

In [2]:
import tensorflow as tf
import numpy as np
import time
from datetime import timedelta
import math

from tensorflow.contrib.learn.python.learn.datasets import base
from tensorflow.python.framework import dtypes
import pandas as pd
import numpy as np

from tensorflow.python.ops import variable_scope as vs
from tensorflow.python.ops import init_ops

from sklearn import preprocessing
from sklearn.metrics import recall_score, precision_score, accuracy_score

In [3]:
class DataSet(object):
    def __init__(self, images, labels, one_hot=False, dtype=dtypes.float32):
        """Construct a DataSet.
        one_hot arg is used only if fake_data is true.    `dtype` can be either
        `uint8` to leave the input as `[0, 255]`, or `float32` to rescale into
        `[0, 1]`.
        """
        dtype = dtypes.as_dtype(dtype).base_dtype
        if dtype not in (dtypes.uint8, dtypes.float32):
            raise TypeError('Invalid image dtype %r, expected uint8 or float32' % dtype)
        assert images.shape[0] == labels.shape[0], ('images.shape: %s labels.shape: %s' % (images.shape, labels.shape))
        self._num_examples = images.shape[0]

        # Convert shape from [num examples, rows, columns, depth]
        # to [num examples, rows*columns] (assuming depth == 1)
        if dtype == dtypes.float32:
            # Convert from [0, 255] -> [0.0, 1.0].
            images = images.astype(np.float32)
            images = np.multiply(images, 1.0 / 255.0)
        self._images = images
        self._labels = labels
        self._epochs_completed = 0
        self._index_in_epoch = 0

    @property
    def images(self):
        return self._images

    @property
    def labels(self):
        return self._labels

    @property
    def num_examples(self):
        return self._num_examples

    @property
    def epochs_completed(self):
        return self._epochs_completed
    
    def reset(self):
        self._epochs_completed = 0
        self._index_in_epoch = 0

    def next_batch(self, batch_size):
        """Return the next `batch_size` examples from this data set."""
        start = self._index_in_epoch
        self._index_in_epoch += batch_size
        if self._index_in_epoch > self._num_examples:
            # Finished epoch
            self._epochs_completed += 1
            # Shuffle the data
            perm = np.arange(self._num_examples)
            np.random.shuffle(perm)
            self._images = self._images[perm]
            self._labels = self._labels[perm]
            # Start next epoch
            start = 0
            self._index_in_epoch = batch_size
            assert batch_size <= self._num_examples
        end = self._index_in_epoch
        return self._images[start:end], self._labels[start:end]

In [4]:
def get_one_hot(labels):
    oh = np.zeros((labels.size, labels.max()+1))
    oh[np.arange(labels.size), labels] = 1
    return oh

In [5]:
def read_data_sets(datapath, 
                   one_hot=False,
                   dtype=dtypes.float32,
                   split = (80, 20),):
    
    OH_TASKS = ["TAP", "DRAG", "SCROLL"]
    TH_TASKS = ["PINCH", "ROTATE"]
    PALM_TASK = ["PALM"]
    
    # Reading pickles from disk
    df_train = ""
    for i in range(1, 19):
        print("Adding P%i to the training set." % (i))
        df_temp = pd.read_pickle(datapath + "p" + str(i) + "_blobimages.pkl", compression="gzip")
        if (type(df_train) is str):
            df_train = df_temp
        else:
            df_train = df_train.append(df_temp)

    df_test = ""
    for i in range(19, 23):
        print("Adding P%i to the test set." % (i))
        df_temp = pd.read_pickle(datapath + "p" + str(i) + "_blobimages.pkl", compression="gzip")
        if (type(df_test) is str):
            df_test = df_temp
        else:
            df_test = df_test.append(df_temp)
            
    # Training set        
    train_finger = df_train[((df_train.Blobcount == 1) & (df_train.Task.isin(OH_TASKS))) | ((df_train.Blobcount == 2) & (df_train.Task.isin(TH_TASKS)))].BlobImages
    train_palm = df_train[(df_train.Blobcount == 1) & (df_train.Task.isin(PALM_TASK))].BlobImages
    train_x = []
    for i in range(len(train_finger)):
        train_x.extend(train_finger.iloc[i])
    amount_finger_blobs = len(train_x)
    for i in range(len(train_palm)):
        train_x.extend(train_palm.iloc[i])
    amount_finger_palms = len(train_x) - amount_finger_blobs
    print(amount_finger_blobs, amount_finger_palms)
    train_y = get_one_hot(np.append(np.zeros(amount_finger_blobs, dtype=np.int), np.ones(amount_finger_palms, dtype=np.int)))
 

    # Test set        
    test_finger = df_test[((df_test.Blobcount == 1) & (df_test.Task.isin(OH_TASKS))) | ((df_test.Blobcount == 2) & (df_test.Task.isin(TH_TASKS)))].BlobImages
    test_palm = df_test[(df_test.Blobcount == 1) & (df_test.Task.isin(PALM_TASK))].BlobImages
    test_x = []
    for blobs in test_finger:
        test_x.extend(blobs)
    amount_finger_blobs = len(test_x)
    for blobs in test_palm:
        test_x.extend(blobs)
    amount_finger_palms = len(test_x) - amount_finger_blobs
    test_y = get_one_hot(np.append(np.zeros(amount_finger_blobs, dtype=np.int), np.ones(amount_finger_palms, dtype=np.int)))
 

    # Shuffling the data
    print("Shuffling training set")
    seed = np.random.randint(0, 21789704)
    np.random.seed(seed)
    np.random.shuffle(train_x)
    np.random.seed(seed)
    np.random.shuffle(train_y)
    print("Shuffling test set")
    seed = np.random.randint(0, 21789704)
    np.random.seed(seed)
    np.random.shuffle(test_x)
    np.random.seed(seed)
    np.random.shuffle(test_y)
    
    train_x = np.array(train_x)
    test_x = np.array(test_x)
    train_y = np.array(train_y)
    test_y = np.array(test_y)
    
    print("Loading into DataSet objects ...")
    train = DataSet(train_x, train_y, dtype=dtype)
    validation = DataSet(test_x, test_y, dtype=dtype)
    test = DataSet(test_x, test_y, dtype=dtype)
    print("Done!")
    return base.Datasets(train=train, validation=validation, test=test)

In [6]:
%%time
data = read_data_sets(DATA_PATH)

Adding P1 to the training set.
Adding P2 to the training set.
Adding P3 to the training set.
Adding P4 to the training set.
Adding P5 to the training set.
Adding P6 to the training set.
Adding P7 to the training set.
Adding P8 to the training set.
Adding P9 to the training set.
Adding P10 to the training set.
Adding P11 to the training set.
Adding P12 to the training set.
Adding P13 to the training set.
Adding P14 to the training set.
Adding P15 to the training set.
Adding P16 to the training set.
Adding P17 to the training set.
Adding P18 to the training set.
Adding P19 to the test set.
Adding P20 to the test set.
Adding P21 to the test set.
Adding P22 to the test set.
336251 151500
Shuffling training set
Shuffling test set
Loading into DataSet objects ...
Done!
CPU times: user 32.3 s, sys: 4.61 s, total: 36.9 s
Wall time: 36.9 s


In [7]:
# specify image size
img_size_hight = data.test.images[0].shape[0]
img_size_width = data.test.images[0].shape[1]
img_size_flat = img_size_hight * img_size_width
img_shape = (img_size_hight, img_size_width)

# We have grayscale images
num_channels = 1

# Number of classes (0 = finger; 1 = palm)
num_classes = 2

data.test.cls = np.argmax(data.test.labels, axis=1)

In [22]:
tf.reset_default_graph()

# Hyperparameters and Helper Functions

These are the best performing hyperparameters as reported in our paper. 

In [23]:
# Convolution Layer 1 (7x7x16)
filter_size1 = 7  
num_filters1 = 16 

# Convolution Layer 1 (7x7x36)
filter_size2 = 7
num_filters2 = 36

# Neurons in the fully connected layers
fc_size1 = 350
fc_size2 = 350

# Optimizer settings
initial_learning_step = 0.009
num_iterations = 1000
decay_rate = 0.2

# Batch size
train_batch_size = 50

# GPU Configurations (i.e., restricting the memory use to allow other models to be trained simultaneously)
config = tf.ConfigProto(log_device_placement=True, allow_soft_placement=True)
config.gpu_options.allow_growth=False
config.gpu_options.per_process_gpu_memory_fraction=0.05
config.gpu_options.allocator_type = 'BFC'

#### Helper functions to create new layers of weights and biases

In [24]:
weightsCounter = -1
def new_weights(shape):
    #return tf.Variable(tf.truncated_normal(shape, stddev=0.05))
    global weightsCounter, GPU_USE
    weightsCounter = weightsCounter + 1
    return vs.get_variable("weights"+str(weightsCounter), shape, initializer=tf.contrib.layers.xavier_initializer())

biasesCounter = -1
def new_biases(length):
    global biasesCounter, GPU_USE
    biasesCounter = biasesCounter + 1
    return vs.get_variable("bias"+str(biasesCounter), [length], initializer=init_ops.constant_initializer(0.01))

#### Helper functions to create new convolution layers

In [25]:
def new_conv_layer(input,              # The previous layer.
                   num_input_channels, # Num. channels in prev. layer.
                   filter_size,        # Width and height of each filter.
                   num_filters,        # Number of filters.
                   layer_name,
                   use_pooling=True):  # Use 2x2 max-pooling.

    global GPU_USE
    with tf.device(GPU_USE):
        # Shape of the filter-weights for the convolution.
        # This format is determined by the TensorFlow API.
        shape = [filter_size, filter_size, num_input_channels, num_filters]

        # Create new weights aka. filters with the given shape.
        weights = new_weights(shape=shape)

        # Create new biases, one for each filter.
        biases = new_biases(length=num_filters)

        # Create the TensorFlow operation for convolution.
        # Note the strides are set to 1 in all dimensions.
        # The first and last stride must always be 1,
        # because the first is for the image-number and
        # the last is for the input-channel.
        # But e.g. strides=[1, 2, 2, 1] would mean that the filter
        # is moved 2 pixels across the x- and y-axis of the image.
        # The padding is set to 'SAME' which means the input image
        # is padded with zeroes so the size of the output is the same.
        
        # strides = [batch, height, width, challens]
        layer = tf.nn.conv2d(input=input, filter=weights, strides=[1, 1, 1, 1], padding='SAME', name="conv2d_" + layer_name)

        # Add the biases to the results of the convolution.
        # A bias-value is added to each filter-channel.
        layer += biases

        # Use pooling to down-sample the image resolution?
        if use_pooling:
            # This is 2x2 max-pooling, which means that we
            # consider 2x2 windows and select the largest value
            # in each window. Then we move 2 pixels to the next window.
            layer = tf.nn.max_pool(value=layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME', name="max_pool_" + layer_name)

        # Rectified Linear Unit (ReLU).
        # It calculates max(x, 0) for each input pixel x.
        # This adds some non-linearity to the formula and allows us
        # to learn more complicated functions.
        layer = tf.nn.relu(layer, name="relu_" + layer_name)

        # Note that ReLU is normally executed before the pooling,
        # but since relu(max_pool(x)) == max_pool(relu(x)) we can
        # save 75% of the relu-operations by max-pooling first.

        # We return both the resulting layer and the filter-weights
        # because we will plot the weights later.
        return layer, weights

#### Helper functions to flatten layers

In [26]:
def flatten_layer(layer):
    # Get the shape of the input layer.
    layer_shape = layer.get_shape()

    # The shape of the input layer is assumed to be:
    # layer_shape == [num_images, img_height, img_width, num_channels]

    # The number of features is: img_height * img_width * num_channels
    # We can use a function from TensorFlow to calculate this.
    num_features = layer_shape[1:4].num_elements()
    
    # Reshape the layer to [num_images, num_features].
    # Note that we just set the size of the second dimension
    # to num_features and the size of the first dimension to -1
    # which means the size in that dimension is calculated
    # so the total size of the tensor is unchanged from the reshaping.
    layer_flat = tf.reshape(layer, [-1, num_features])

    # The shape of the flattened layer is now:
    # [num_images, img_height * img_width * num_channels]

    # Return both the flattened layer and the number of features.
    return layer_flat, num_features

#### Helper functions to create a new fully connected layer

In [27]:
def new_fc_layer(input,          # The previous layer.
                 num_inputs,     # Num. inputs from prev. layer.
                 num_outputs,    # Num. outputs.
                 use_relu=True): # Use Rectified Linear Unit (ReLU)?
    global GPU_USE
    with tf.device(GPU_USE):
        # Create new weights and biases.
        weights = new_weights(shape=[num_inputs, num_outputs])
        biases = new_biases(length=num_outputs)

        # Calculate the layer as the matrix multiplication of
        # the input and weights, and then add the bias-values.
        layer = tf.matmul(input, weights) + biases

        # Use ReLU?
        if use_relu:
            #layer = tf.contrib.layers.batch_norm(layer, center=True, scale=True, is_training=_is_training)
            layer = tf.nn.softplus(layer)

        return layer

In [28]:
tf.reset_default_graph()
data.train.reset()
data.test.reset()

#### Placeholder variables

In [29]:
with tf.device(GPU_USE):
    x = tf.placeholder(tf.float32, shape=[None, img_size_flat], name="input_tensor")
    x_image = tf.reshape(x, [-1, img_size_hight, img_size_width, num_channels])
    y_true = tf.placeholder(tf.float32, shape=[None, num_classes], name='y_true')
    y_true_cls = tf.argmax(y_true, dimension=1)

#### Using the helper functions to create all layers of the convolutional neural network

In [30]:
layer_conv1, weights_conv1 = new_conv_layer(input=x_image,
                   num_input_channels=num_channels,
                   filter_size=filter_size1,
                   num_filters=num_filters1,
                   layer_name="layer_conv1",
                   use_pooling=True)

layer_conv2, weights_conv2 = new_conv_layer(input=layer_conv1,
                   num_input_channels=num_filters1,
                   filter_size=filter_size2,
                   num_filters=num_filters2,
                   layer_name="layer_conv2",
                   use_pooling=True)

layer_flat, num_features = flatten_layer(layer_conv2)

layer_fc1 = new_fc_layer(input=layer_flat,
                         num_inputs=num_features,
                         num_outputs=fc_size1,
                         use_relu=True)

layer_fc2 = new_fc_layer(input=layer_fc1,
                         num_inputs=fc_size2,
                         num_outputs=num_classes,
                         use_relu=False)

In [31]:
with tf.device(GPU_USE):
    y_pred = tf.nn.softmax(layer_fc2, name="output_tensor")
    y_pred_cls = tf.argmax(y_pred, dimension=1)

### Cost-function to be optimized

In [32]:
with tf.device(GPU_USE):
    cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=layer_fc2, labels=y_true)
    cost = tf.reduce_mean(cross_entropy)

### Optimization Method

In [33]:
with tf.device(GPU_USE):
    decay_steps = num_iterations
    _global_step = tf.Variable(0, dtype=tf.int32, name="global_step", trainable=False) 
    learning_rate = tf.train.exponential_decay(initial_learning_step, _global_step, decay_steps, decay_rate)    
    optimizer =  tf.train.AdagradOptimizer(learning_rate=learning_rate).minimize(cost)    
    
    correct_prediction = tf.equal(y_pred_cls, y_true_cls)
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

## Training the CNN

In [34]:
init = tf.global_variables_initializer()

merged_summary = tf.summary.merge_all()

timestamp = int(time.time())
print ("Starting session, TS = %i" % timestamp)
SAVE_PATH = tensorflowFolder + '/' + STORAGE_NAME + '/'+ str(timestamp) + '/'
print ("SAVE_PATH: %s" % SAVE_PATH)
train_writer = tf.summary.FileWriter(SAVE_PATH, graph=tf.get_default_graph())

saver = tf.train.Saver()
with tf.device(GPU_USE):
    with tf.Session(config=config) as sess:
        sess.run(init)

        # Start-time used for printing time-usage below.
        start_time = time.time()
        
        for i in range(num_iterations):
            epochs = data.train.epochs_completed
            while True:
                x_batch, y_true_batch = data.train.next_batch(train_batch_size)

                x_batch = x_batch.reshape([train_batch_size, x_batch[0].shape[0] * x_batch[0].shape[1]])
                feed_dict_train = {x: x_batch, y_true: y_true_batch}

                _ = sess.run(optimizer, feed_dict=feed_dict_train)
                
                if epochs != data.train.epochs_completed:    
                    break

            epochs = data.test.epochs_completed
            accs = []
            sk_accs = []
            precisions = []
            recalls = []
            while True:
                x_batch, y_true_batch = data.test.next_batch(train_batch_size)
                x_batch = x_batch.reshape([train_batch_size, x_batch[0].shape[0] * x_batch[0].shape[1]])

                feed_dict_train = {x: x_batch, y_true: y_true_batch}    
                acc, preds = sess.run([accuracy, y_pred], feed_dict=feed_dict_train)
                accs.append(acc)
                
                predictions = np.argmax(preds,1)
                precisions.append(precision_score(np.argmax(y_true_batch, 1), predictions))
                recalls.append(recall_score(np.argmax(y_true_batch, 1), predictions))
                
                if epochs != data.test.epochs_completed:    
                    break

            accs = np.array(accs)
            sk_accs = np.array(sk_accs)
            precisions = np.array(precisions)
            recalls = np.array(recalls)

            msg = "Epoch: {0:>2}, Accuracy: {1:>4.2%}, Precision: {2:>4.2%}, Recall: {3:>4.2%}"
            print (msg.format((i + 1), accs.mean(), precisions.mean(), recalls.mean()))
            
            op_assign_global_step = _global_step.assign(i)
            _ = sess.run([op_assign_global_step])

            saver.save(sess, SAVE_PATH + "model", global_step=i, write_meta_graph=True)

Starting session, TS = 1523550576
SAVE_PATH: ./CNN-PalmTouch/1523550576/
Epoch:  1, Accuracy: 99.09%, Precision: 99.00%, Recall: 98.05%
Epoch:  2, Accuracy: 99.21%, Precision: 98.63%, Recall: 98.84%
Epoch:  3, Accuracy: 99.38%, Precision: 99.18%, Recall: 98.80%
Epoch:  4, Accuracy: 99.34%, Precision: 98.91%, Recall: 98.96%
Epoch:  5, Accuracy: 99.45%, Precision: 99.41%, Recall: 98.83%
