```python
#!/usr/bin/env python
# coding: utf-8 

#   This software component is licensed by ST under BSD 3-Clause license,
#   the "License"; You may not use this file except in compliance with the
#   License. You may obtain a copy of the License at:
#                        https://opensource.org/licenses/BSD-3-Clause
  

'''
Training script of human activity recognition system (HAR), based on two different Convolutional Neural Network (CNN) architectures 
'''
```

# Step by Step HAR Training STM32CubeAI
This notebook provides a step by step demonstration of a simple <u>H</u>uman <u>A</u>ctivity <u>R</u>ecognition system (HAR), based on a convolutional networks (CNN). This script provides a simple data preperation script through `DataHelper` class and let user to preprocess, split, and segment the dataset to bring it into the form which can be used for training and validation of the HAR CNN. It also has a `CNNHandler` class which builds, trains and validate a CNN for a given set of input and output tensors. The `CNNHandler` can create one of the two provided CNN architectures namely, **IGN** and **GMP**.

All the implementations are done in Python using [Keras](https:keras.io/) with [Tensorflow](https://www.tensorflow.org/) as backend.

For demonstration purposes this script uses two datasets created for HAR using accelerometer sensor. 

* WISDM, a public dataset provided by <u>WI</u>reless <u>S</u>ensing <u>D</u>ata <u>M</u>ining group. The details of the dataset are available [here](http://www.cis.fordham.edu/wisdm/dataset.php).

* AST our own propritery dataset.

**Note**: We are not providing any dataset in the function pack. The user can download WISDM dataset from [here](http://www.cis.fordham.edu/wisdm/dataset.php), while AST is a private dataset and is not provided.

Following figure shows the detailed workflow of HAR.


<p align="center">
<img width="760" height="400" src="workflow.png">
</p>

Let us start the implementation now.

## Step1 : Import necessary dependencies
Following section imports all the required dependencies. This also sets seeds for random number generators in Numpy and Tensorflow environments to make the results reproducibile.

In [None]:
import numpy as np
np.random.seed(611)

import argparse, os, logging, warnings
from os.path import isfile, join
from datetime import datetime

# private libraries
from PrepareDataset import DataHelper
from HARNN import ANNModelHandler

# for using callbacks to save the model during training and comparing the results at every epoch
from keras.callbacks import ModelCheckpoint

# disabling annoying warnings originating from Tensorflow
logging.getLogger('tensorflow').disabled = True

import tensorflow as tf
tf.compat.v1.set_random_seed(611)

# disabling annoying warnings originating from python
warnings.simplefilter("ignore")

## Step2: Set environment variables
Following section sets some user variables which will later be used for:

* preparing the dataset.
* preparing the neural networks.
* training the neural networks.
* validating the neural network.

In [None]:
# data variables
dataset = 'WISDM'
merge = True
segmentLength = 24
stepSize = 24
dataDir = 'datasets/ai_logged_data'
preprocessing = True

# neural network variables
modelName = 'IGN'

# training variables
trainTestSplit = 0.6
trainValidationSplit = 0.7
nEpochs = 20
learningRate = 0.0005
decay = 1e-6
batchSize = 64
verbosity = 1
nrSamplesPostValid = 2

## Step3: Result directory
Each run can have different variables and to compare the results of different choices, such as different segment size for the window for data, different overlap settings etc, we need to save the results. Following section creates a result directory to save results for the current run. The name of the directory has following format. `Mmm_dd_yyyy_hh_mm_ss_dataset_model_seqLen_stepSize_epochs_results`, and example name for directory can be `Oct_24_2019_14_31_20_WISDM_IGN_24_16_20_results`, which shows the process was started at October 24, 2019, at 14:31:20, the dataset used was WISDM, with segment size = 24, segment step = 16, and Nr of epochs = 20.

In [None]:
# if not already exist create a parent directory for results.
if not os.path.exists( './results/'):
    os.mkdir( './results/' )
resultDirName = 'results/{}/'.format(datetime.now().strftime( "%Y_%b_%d_%H_%M_%S" ) )
os.mkdir( resultDirName )
infoString = 'runTime : {}\nDatabase : {}\nNetwork : {}\nSeqLength : {}\nStepSize : {}\nEpochs : {}\n'.format( datetime.now().strftime("%Y-%b-%d at %H:%M:%S"), dataset, modelName, segmentLength, stepSize, nEpochs )
with open( resultDirName + 'info.txt', 'w' ) as text_file:
    text_file.write( infoString )

## Step4: Create a `DataHelper` object
The script in the following section creates a `DataHelper` object to preprocess, segment and split the dataset as well as to create one-hot-code labeling for the outputs to make the data training and testing ready using the choices set by the user in **Step2**.

In [None]:
myDataHelper = DataHelper( dataset = dataset, loggedDataDir = dataDir, merge = merge,
                            modelName = modelName, seqLength = segmentLength, seqStep = stepSize,
                            preprocessing = preprocessing, trainTestSplit = trainTestSplit,
                            trainValidSplit = trainValidationSplit, resultDir = resultDirName )

## Step5: Prepare the dataset
Following section prepares the dataset and create six tensors namely `TrainX`, `TrainY`, `ValidationX`, `ValidationY`, `TestX`, `TestY`. Each of the variables with trailing `X` are the inputs with shape `[_, segmentLength, 3, 1 ]`and each of the variables with trailing `Y` are corresponding outputs with shape `[ _, NrClasses ]`. `NrClasses` for `WISDM` can be `4` or `6` and for `AST` is `5`.

In [None]:
TrainX, TrainY, ValidationX, ValidationY, TestX, TestY = myDataHelper.prepare_data()

## Step6: Create a `ANNModelHandler` object
The script in the following section creates a `ANNModelHandler` object to create, train and validate the <u>A</u>rtificial <u>N</u>eural <u>N</u>etwork (ANN) using the variables created in **Step2**.

In [None]:
myHarHandler = ANNModelHandler( modelName = modelName, classes = myDataHelper.classes, resultDir = resultDirName,
                              inputShape = TrainX.shape, outputShape = TrainY.shape, learningRate = learningRate,
                              decayRate = decay, nEpochs = nEpochs, batchSize = batchSize,
                              modelFileName = 'har_' + modelName, verbosity = verbosity )

## Step6: Create a ANN model
Following script creates the ANN and prints its summary to show the architecture and nr of parameters for ANN.

In [None]:
harModel = myHarHandler.build_model()
harModel.summary()

## Step7: Create a Checkpoint for ANN training
The following script creates a check point for the training process of ANN to save the neural network as `h5` file. The settings are used in a way that the validation accuracy `val_acc` is maximized.

In [None]:
harModelCheckPoint = ModelCheckpoint( filepath = join(resultDirName, 'har_' + modelName + '.h5'),
                                     monitor = 'val_acc', verbose = 0, save_best_only = True, mode = 'max' )

## Step7 : Train the created neural network
The following script trains the created neural network with the provided checkpoint and created datasets.

In [None]:
harModel = myHarHandler.train_model( harModel, TrainX, TrainY, ValidationX, ValidationY, harModelCheckPoint )

## Step8: Validating the trained neural network
The following section validates the created network and creates a confusion matrix for the test dataset to have a detailed picture of the errors.

In [None]:
myHarHandler.make_confusion_matrix(  harModel, TestX, TestY )

## Step9: Create an npz file for validation after conversion from CubeAI.

In [None]:
myDataHelper.dump_data_for_post_validation( TestX, TestY, nrSamplesPostValid )