## Exercise

*Human Activity Recognition using Smartphones* dataset

Dataset description:

*The experiments have been carried out with a group of 30 volunteers. Each person performed six activities
(WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone.
Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity.
The experiments have been video-recorded to label the data manually.*

**Variables:**
For each record in the dataset it is provided:
* A 561-feature vector with time and frequency domain variables.
* Its activity label.
* An identifier of the subject who carried out the experiment.

More details at: https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones

### Loading and preparing the data

In [2]:
import pandas as pd
import os

In [4]:
folder = ''  ## put here folder where the file HAR_clean.csv is located

Load the dataset that was created in the last session: "HAR_clean.csv"

In [7]:
all_data = pd.read_csv(os.path.join(folder, 'HAR_clean.csv'), index_col=0)

Divide into input and output data

In [10]:
input_data = all_data.iloc[:,:-2]
input_data.shape

(10299, 561)

In [12]:
output_data = all_data.iloc[:,-1]
output_data.shape

(10299,)

Divide the data into train and test, keeping 30% for the test

In [15]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(input_data, output_data, test_size=0.3)

print(X_train.shape, y_train.shape)
print(X_test.shape, y_test.shape)


(7209, 561) (7209,)
(3090, 561) (3090,)


### Best shallow ML model

In [18]:
from sklearn import svm
from sklearn.model_selection import GridSearchCV

parameters = {'kernel':['linear', 'rbf'], 'C':[1, 10, 100,1000], 'gamma':[0.01, 0.001]}

svm_model_d = svm.SVC()
opt_model_d = GridSearchCV(svm_model_d, parameters)

opt_model_d.fit(X_train, y_train)
print (opt_model_d.best_estimator_)

SVC(C=1000, gamma=0.01)


In [19]:
opt_model_d.score(X_test, y_test)

0.9938511326860842

### Deep Learning models

**Ex. 1** - Train a Deep Neural Network model for this dataset and compare its performance with the shallow models from previous sessions.

In [22]:
from sklearn import preprocessing
le = preprocessing.LabelEncoder()

le.fit(y_train)

y_train_encoded = le.transform(y_train)
y_test_encoded = le.transform(y_test)

**Ex. 2** - Play with the different parameters (topologies, training algorithms, etc) and check their performance.

**Ex. 3** - Implement an hyperparameter optimization pipeline using the following functions.

Function definitions

In [26]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Activation, Dropout
from tensorflow.keras import optimizers

## function to setup model - assuming multiclass classification problem
def setup_model(topo, dropout_rate, input_size, output_size):
    model = Sequential()    
    model.add(Dense(topo[0], activation="relu", input_dim = input_size))
    if dropout_rate > 0: model.add(Dropout(dropout_rate))
    for i in range(1,len(topo)):        
        model.add(Dense(topo[i], activation="relu"))
        if dropout_rate > 0: model.add(Dropout(dropout_rate))    
    model.add(Dense(output_size))
    model.add(Activation('softmax'))
    
    return model

## training the DNN - takes algorithm (string) and learning rate; data (X, y), epochs and batch size
def train_dnn(model, alg, lr, Xtrain, Ytrain, epochs = 5, batch_size = 64):
    if alg == "adam":
        optimizer = optimizers.Adam(learning_rate = lr)
    elif alg == "rmsprop":
        optimizer = optimizers.RMSprop(learning_rate = lr)
    elif alg == "sgd_momentum":
        optimizer = optimizers.SGD(learning_rate = lr, momentum = 0.9)
    else: optimizer = optimizers.SGD(learning_rate = lr)
        
    model.compile(optimizer = optimizer, loss = "categorical_crossentropy", metrics = ["accuracy"])
    model.fit(Xtrain, Ytrain, epochs = epochs, batch_size = batch_size, verbose = 0)
    
    return model

## optimizing parameters: topology, algorithm, learning rate, dropout
## randomized search optimization with maximum iterations
## takes as input: dictionary with params to optimizae and possible values; training data(X,y), validation data (X,y), iterations, epochs for training
def dnn_optimization(opt_params, Xtrain, Ytrain, Xval, Yval, iterations = 10, epochs = 5, verbose = True):
    from random import choice
  
    if verbose: 
        print("Topology\tDropout\tAlgorithm\tLRate\tValLoss\tValAcc\n")
    best_acc = None
    
    input_size = Xtrain.shape[1]
    output_size = Ytrain.shape[1]
    
    if "topology" in opt_params:
        topologies = opt_params["topology"]
    else: topologies = [[100]]
    if "algorithm" in opt_params:
        algs = opt_params["algorithm"]
    else: algs = ["adam"]
    if "lr" in opt_params:
        lrs = opt_params["lr"]
    else: lrs = [0.001]
    if "dropout" in opt_params:
        dropouts = opt_params["dropout"]
    else: dropouts= [0.0]
    
    for it in range(iterations):
        topo = choice(topologies)
        dropout_rate = choice(dropouts)
        dnn = setup_model (topo, dropout_rate, input_size, output_size)
        alg = choice(algs)
        lr = choice(lrs)
        dnn = train_dnn(dnn, alg, lr, Xtrain, Ytrain, epochs, 128)
        val_loss, val_acc = dnn.evaluate(Xval, Yval, verbose = 0)
        
        if verbose: 
            print(topo, "\t", dropout_rate, "\t", alg, "\t", lr, "\t", val_loss, "\t", val_acc)
        
        if best_acc is None or val_acc > best_acc:
            best_acc = val_acc
            best_config = (topo, dropout_rate, alg, lr)
        
    return best_config, best_acc

**Ex. 4** - Comment on the results obtained !!