# Training a model
In this notebook, we attempt to a train a model to generate a embedding based on keystroke dyanmics based on facenets triplet loss


## Imports & Setup

In [41]:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import random

import tensorflow as tf
from tensorflow.keras.layers import Lambda
from tensorflow.keras.callbacks import TensorBoard, ReduceLROnPlateau
from dataprep import generate_pair_features
from keyprint.model import *

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split

%load_ext autoreload
%autoreload 2
%matplotlib inline

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


In [42]:
DATA_DIR="/tf/data/preped/greyc_web/"
TF_LOG_DIR="/tf/logs"
!mkdir -p {TF_LOG_DIR}
ENCODING_DIM = 16

## Loading & Preparing the data
In section, we load the prepared Greyc dataset and prepare it for machine learning.

### Load the dataset

In [43]:
# Load the metadata frame
meta_df = pd.read_feather(f"{DATA_DIR}/meta.feather")
userids = meta_df.userid

# load keystroke features
with open(f"{DATA_DIR}/keystroke.npz", "rb") as f:
    keystroke_features = np.load(f, allow_pickle=True)["keystroke"]
N_FEATURES = keystroke_features[0].shape[-1]

## Generate Pairs
Since our model will take 2 keystoke features and compare them, we generate pairs of keystroke features and label them 1 if they are from the same user, 0 if they are not

In [44]:
%%time
feature_pairs, labels = generate_pair_features(keystroke_features, meta_df)

CPU times: user 3.13 s, sys: 4.43 s, total: 7.56 s
Wall time: 14.5 s


### Split Test Train
Split the dataset between test and train subsets randomly

In [63]:
train_features, valid_features, train_labels, valid_labels = \
    train_test_split(feature_pairs, labels, shuffle=True, test_size=0.2)

### Preprocessing the dataset
Since we are training on a neural network, we have to normalise our features

In [64]:
%%time
scaler = StandardScaler()

train_shape = train_features.shape
valid_shape = valid_features.shape

scaler.fit(train_features.reshape((train_shape[0], -1)))

    
train_features = scaler.transform(train_features.reshape((train_shape[0], -1))).reshape(train_shape)
valid_features = scaler.transform(valid_features.reshape((valid_shape[0], -1))).reshape(valid_shape)

CPU times: user 284 ms, sys: 87.8 ms, total: 372 ms
Wall time: 369 ms


### Unpacking the dataset

In [65]:
train_refs, train_evals =  train_features[:, 0], train_features[:, 1]
valid_refs, valid_evals = valid_features[:, 0], valid_features[:,1]

## Building the Model
The model be built is a siamese network composed of two components:
- encoder that transform the keystroke features to embedding
- evaluator model that computes the predicted distance between embeddings
- objective (define by loss function) that ensures that embedding produced can be used to identify the user

### Encoder
The encoder is a simple 1D CNN:

In [66]:
encoder = build_encoder(
              n_input_dim=N_FEATURES,
              n_encoding_dim=ENCODING_DIM,
              n_conv_block=3,
              n_conv_layers=[1, 1, 1],
              n_conv_filters=[8, 16, 32],
              conv_filter_size=[7, 3, 3],
              n_dense_layers=2,
              n_dense_units=32,
              activation=(lambda: Activation("selu")),
              batch_norm=False,
              l2_lambda=0,
              dropout_prob=0)
encoder.summary()

Model: "model_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
input_10 (InputLayer)        [(None, 64, 5)]           0         
_________________________________________________________________
conv1d_9 (Conv1D)            (None, 64, 8)             288       
_________________________________________________________________
activation_15 (Activation)   (None, 64, 8)             0         
_________________________________________________________________
max_pooling1d_9 (MaxPooling1 (None, 32, 8)             0         
_________________________________________________________________
conv1d_10 (Conv1D)           (None, 32, 16)            400       
_________________________________________________________________
activation_16 (Activation)   (None, 32, 16)            0         
_________________________________________________________________
max_pooling1d_10 (MaxPooling (None, 16, 16)            0   

### Evaluator model
The evaluator uses the encoder to compute the distance between predicted embeddings

In [67]:
model = build(N_FEATURES, encoder)
model.summary()

Model: "model_7"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
input_11 (InputLayer)           [(None, None, 5)]    0                                            
__________________________________________________________________________________________________
input_12 (InputLayer)           [(None, None, 5)]    0                                            
__________________________________________________________________________________________________
model_6 (Model)                 (None, 16)           4896        input_11[0][0]                   
                                                                 input_12[0][0]                   
__________________________________________________________________________________________________
lambda_3 (Lambda)               ()                   0           model_6[1][0]              

### Objective & Loss Function
The objective of this model is to drive the distance of embeddings of different users as apart (up to margin), while makeing the distance of the embeddings of the same user as close as possible

As such, the model utilies **constrastive loss** as its loss function
![Loss Function](https://cdn-images-1.medium.com/max/1600/1*Uo5IovRsjW86b-vCBZGRvg.jpeg)

## Training the model
Setup is complete, we can finally train the model and tune its hyperparamters

In [68]:
# compile the model
optimizer = Adam(learning_rate=1e-3)
model.compile(loss=contrastive_loss,
              optimizer=optimizer,
             accuracy

# train the model
model.fit([train_refs, train_evals], train_labels,
          batch_size=128,
          epochs=10,
          validation_data=([valid_refs, valid_evals], valid_labels),
          callbacks=[TensorBoard(TF_LOG_DIR),
                     ReduceLROnPlateau(factor=0.1, 
                                       patience=10,
                                       verbose=1,
                                       min_lr=1e-9)])

Train on 24170 samples, validate on 6043 samples
Epoch 1/10
Epoch 2/10
Epoch 3/10
Epoch 4/10
Epoch 5/10
Epoch 6/10
Epoch 7/10
Epoch 8/10
Epoch 9/10
Epoch 10/10


<tensorflow.python.keras.callbacks.History at 0x7fc942541080>