<a href="https://colab.research.google.com/github/gcosma/DeepLearningTutorials/blob/master/SimpleSequentialModelmMHEALTH.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


**Tutorial: HOW TO Create a Deep Sequential Model in Keras for mHealth data classification in 5 Steps** 

by Dr Georgina Cosma


Learning outcomes:
* Load Data.
* Apply z-score data normalisation.
* Define a Sequential model for detecting human activity in multi-sensor data.
* Compile the Sequential Model.
* Train and Test a mode using k-fold cross validation.  


**Not using Colab?** If you are not using Colab you will need to Setup a Python Environment for 
Machine Learning and Deep Learning with Anaconda. You must have Python 2 or 3 
installed and configured. You must install SciPy (including NumPy) and the relevant
libraries including Keras. 

**Using Colab:** Some difficulties may be
experienced with mounting, but the code and explanation here will help you overcome these. 

I have included the datasets  in the GitHub repository and you can also download the data from: https://archive.ics.uci.edu/ml/datasets/MHEALTH+Dataset

**About the mHealth multi-sensor dataset:** The mHealth (Mobile Health) dataset is a benchmark dataset for human behaviour analysis based on multi-modal body sensing. The mHealth dataset comprises body motion and vital signs recordings for ten volunteers of diverse profile while performing 12 physical activities: Standing still (1 min), Sitting and relaxing (1 min), Lying down (1 min), Walking (1 min), Climbing stairs (1 min), Waist bends forward (20x), Frontal elevation of arms (20x), Knees bending (crouching) (20x), Cycling (1 min), Jogging (1 min), Running (1 min), and Jump front & back (20x). Sensors on each subject's chest, right wrist and left ankle were used to measure the motion experienced by diverse body parts, namely, acceleration, rate of turn and magnetic field orientation. All sensing modalities are recorded at a sampling rate of 50 Hz, which is considered sufficient for capturing human activity. This dataset has been found to generalize to common activities of the daily living, due to the diversity of body parts involved in each activity (e.g., frontal elevation of arms vs. knees bending), the intensity of the actions (e.g., cycling vs. sitting and relaxing) and their execution speed or dynamicity (e.g., running vs. standing still). For more information see: https://archive.ics.uci.edu/ml/datasets/MHEALTH+Dataset




**Step 1: Import libraries** 

In [0]:
from keras.models import Sequential
from keras.utils.np_utils import to_categorical
from keras.layers import Dense, Dropout, Activation
import numpy as np
from sklearn.model_selection import StratifiedKFold
from scipy import stats
#Below several optimisers are imported. This tutorial is using the Adam optimiser
#but you can easily replace Adam with another optimiser from the list. 
from keras.optimizers import SGD, Adam, Adadelta, RMSprop, Adagrad, Adamax, Nadam

**Step 2: Mount to Google Drive in order to access your data file**

In [21]:
from google.colab import drive
drive.mount('/content/drive')
#!ls "/content/drive/My Drive/Colab Notebooks"

#if you need to remount
#drive.mount("/content/drive", force_remount=True)

#If you want to unmount and reset then: 
#Step 1: From the menu select Runtime--->Reset all Runtimes... 
#Step 2: Runtime--->Run all or you can run each Cell at a time. There will be a message 
# "Go to a URL in a browser" and you must click on that and copy and paste the authorisation code 
# from the page into the authorisation code text box.


Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


**Step 3: Choose your dataset.**

In [22]:
#All datasets are available in Github. I added the datasets into a data folder where all data are kept separate from the code. 
# To load the dataset you must uncomment the dataset you want to load. One dataset contains the data of a single person. 
#dataset1 = np.loadtxt("/content/drive/My Drive/Colab Notebooks/data/MHEALTHDATASET/mHealth_subject1txt.csv", delimiter=",")
#dataset1 = np.loadtxt("/content/drive/My Drive/Colab Notebooks/data/MHEALTHDATASET/mHealth_subject2txt.csv", delimiter=",")
#dataset1 = np.loadtxt("/content/drive/My Drive/Colab Notebooks/data/MHEALTHDATASET/mHealth_subject3txt.csv", delimiter=",")
dataset1 = np.loadtxt("/content/drive/My Drive/Colab Notebooks/data/MHEALTHDATASET/mHealth_subject4txt.csv", delimiter=",")
#dataset1 = np.loadtxt("/content/drive/My Drive/Colab Notebooks/data/MHEALTHDATASET/mHealth_subject5txt.csv", delimiter=",")
#dataset1 = np.loadtxt("/content/drive/My Drive/Colab Notebooks/data/MHEALTHDATASET/mHealth_subject6txt.csv", delimiter=",")
#dataset1 = np.loadtxt("/content/drive/My Drive/Colab Notebooks/data/MHEALTHDATASET/mHealth_subject7txt.csv", delimiter=",")
#dataset1 = np.loadtxt("/content/drive/My Drive/Colab Notebooks/data/MHEALTHDATASET/mHealth_subject8txt.csv", delimiter=",")
#dataset1 = np.loadtxt("/content/drive/My Drive/Colab Notebooks/data/MHEALTHDATASET/mHealth_subject9txt.csv", delimiter=",")
#dataset1 = np.loadtxt("/content/drive/My Drive/Colab Notebooks/data/MHEALTHDATASET/mHealth_subject10txt.csv", delimiter=",")
print(dataset1)

[[-2.123     2.1088   -8.9576   ... -3.3198   -3.5417    0.      ]
 [-1.8362    1.9103   -9.0281   ... -6.3904   -3.5403    0.      ]
 [-1.9329    1.6969   -8.9946   ... -6.0134   -3.2045    0.      ]
 ...
 [-9.0266   -0.18408   4.3308   ... -0.21181  -1.7778    0.      ]
 [-8.9453   -0.2106    4.0337   ... -0.56407  -1.4277    0.      ]
 [-8.9806    0.040296  4.1615   ...  1.0924    0.7185    0.      ]]


**Step 4:**
Dataset comprises of 23 inputs, and a vector of categorical target values (0-12). Each target value corresponds to one of 12 physical activities as explained in the first part of this tutorial. 

In [28]:
#Input data and labels need to be separated before training the model.
X = dataset1[:,0:23]        
labels= dataset1[:,23];

#Here zscore is applied to normalise the values of each column. 
# Z-scores are linearly transformed data values having a mean of zero and a standard deviation of 1. 
X=stats.zscore(X)

#Convert the single vector of 13 classes to a matrix of 1s and 0s. 
#This is called one-hot-encoding. With one-hot encoding the integer encoded variable is removed and a new binary 
# variable is added for each unique integer value. For a quick explanation see: https://machinelearningmastery.com/why-one-hot-encode-data-in-machine-learning/
Yenc = to_categorical(labels) 
print(Yenc)

#Check the size of matrix Yenc. The number of rows should be equal to the number of classes classes.
#In this example classes start from 0 to 12. Therefore we have 13 classes. 
Yenc.shape

[[1. 0. 0. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]
 ...
 [1. 0. 0. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]
 [1. 0. 0. ... 0. 0. 0.]]


(116736, 13)

**Step 5: Create the Sequential model**
A Sequential model is a linear stack of layers.
Code for the k-fold cross validation was adapted from https://machinelearningmastery.com/evaluate-performance-deep-learning-models-keras/

fit(X,Y) is for training the model with the given inputs X (and corresponding training labels Y). Adjust the number of epochs to avoid overtraining your network.

evaluate(X,Y) is for evaluating the already trained model using the test data. Returns the accuracy values for each k-fold using the test data. **



In [24]:
# Define k-fold cross validation test. In this application k=10.
# 10-fold allows us to partition the data into k different training and testing datasets. 
# k-fold, when applied to large datasets can result in slow training. 
# However, one of the aims of this tutorial was to show how to use k-fold
# k-fold results to thorough evaluations of the model.  
# For a nice introduction to k-fold cross validation see: https://machinelearningmastery.com/k-fold-cross-validation/
kfold = StratifiedKFold(n_splits=10, random_state=None, shuffle=False)
cvscores = []
#get number of columns in training data
n_cols = X.shape[1]
#Split the dataset into folds of train and test data.
for train, test in kfold.split(X, labels):
  model = Sequential()
  # Dense(256) is a fully-connected layer with 256 hidden units.
  # in the first layer, you must specify the expected input data shape:
  # here, n_cols=23-dimensional vectors. n_cols = X.shape[1] gave 
  # the number of columns in the data. 
  # For more information on the options for constructing a Sequential model see
  # https://keras.io/getting-started/sequential-model-guide/ 
  model.add(Dense(256, activation='relu', input_shape=(n_cols,)))
  model.add(Activation('relu'))
  #model.add(Dropout(0.5))
  model.add(Dense(256, activation='relu'))
  model.add(Activation('relu'))
  model.add(Dropout(0.5))
  model.add(Dense(256,activation='relu'))
  #13 refers to the number of classes. SOFTMAX is used for multi-classification.
  model.add(Dense(13, activation='softmax'))

  #####################################################################################
  #Setup the optimisers - you can try out several one at a time 
  # simply relace optimizer='adam' when you compile the model. 
  sgd = SGD(lr=0.1, decay=1e-6, momentum=0.9, nesterov=True)
  adam=Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
  adadelt=Adadelta(lr=1.0, rho=0.95, epsilon=None, decay=0.0)
  rmsprop=RMSprop(lr=0.001, rho=0.9, epsilon=None, decay=0.0)
  adagrad=Adagrad(lr=0.01, epsilon=None, decay=0.0)
  adamax=Adamax(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0)
  nadam=Nadam(lr=0.002, beta_1=0.9, beta_2=0.999, epsilon=None, schedule_decay=0.004)
  #####################################################################################

  ############### Now you can change the name of the optimiser if you wish. 
  model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

  ############# Fit the model (training)
  model.fit(X[train], Yenc[train], epochs=20, batch_size=50, verbose=0)

  ###################### Test the model using the test data
  score = model.evaluate(X[test], Yenc[test], batch_size=50, verbose=0)
  print("\n%s: %.2f%%" % (model.metrics_names[1], score[1]*100))
  #Displays the accuracy of each fold when applied to the test data.
  cvscores.append(score[1] * 100)
  #Indentation is important here to show the standard deviation
print("%.2f%% (+/- %.2f%%)" % (np.mean(cvscores), np.std(cvscores)))


acc: 81.93%

acc: 96.56%

acc: 96.07%

acc: 96.12%

acc: 75.10%

acc: 77.35%

acc: 66.15%

acc: 79.71%

acc: 81.99%

acc: 83.99%
83.50% (+/- 9.57%)


In [29]:
# Print the structure of the Sequential model
print(model.summary())

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_117 (Dense)            (None, 256)               6144      
_________________________________________________________________
activation_59 (Activation)   (None, 256)               0         
_________________________________________________________________
dense_118 (Dense)            (None, 256)               65792     
_________________________________________________________________
activation_60 (Activation)   (None, 256)               0         
_________________________________________________________________
dropout_30 (Dropout)         (None, 256)               0         
_________________________________________________________________
dense_119 (Dense)            (None, 256)               65792     
_________________________________________________________________
dense_120 (Dense)            (None, 13)                3341      
Total para

**Step 6: How to install the GraphViz Library  http://www.graphviz.org/**



In [0]:
!pip install -q pydot
from keras.utils.vis_utils import plot_model
#Prints the model to a file in the directory
plot_model(model, to_file='/content/drive/My Drive/Colab Notebooks/model_plot3.png', show_shapes=True, show_layer_names=True)

**Step 7: Bonus code.**

In [27]:
#Decode from one hot encoded matrix
res= [np.where(r==1)[0][0] for r in Yenc]
print(res)
max(res)

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

12