# Assignment 2: Practical Machine Learning Project Report



31005/32513 Machine Learning Spring 2019, Assignment 2



---



> Group members: Pit Wegner, Xianfeng Zhuge, Kailei Wu




Video Pitch: https://www.youtube.com/watch?v=ceS2Spw9g-Y

Github: https://github.com/pitwegner/UTS_ML2019_Project

## Introduction

Nowadays, more and more activity recognition-based machine learning has been studied, especially in human physical activity recognition. The progress in this direction is surprisingly good. Models can recognize simple human exercises, such as walking, running or standing, count steps and detect a general overarching activity. However, it is hard to apply in a specific domain such as healthcare. There are mainly for two reasons. The activities themselves are more complex, potentially involving other subjects, and additionally harder to analyze because of the lack of public data that can be used within the research. In this report, a machine learning (ML) model regarding the "Open Lab Nursing Activity Recognition Challenge" is created, which asks to recognize six different activities within the nursing area (Lago et al., 2019).

In recent years, many activity recognition algorithms in health care applications are focusing on patients or doctors rather than nurses. However, the recognition of nursing activities can be helpful in the nursing domain, such as for automatic record creation and standardization of operation supervision. Until now, there is still a gap for ML models to recognize nursing-related activities due to the problems that we mentioned before. The applications of such a model would be wide-spread and of high impact in the nursing area, given the model achieves an exceptional accuracy in prediction.

In this report, a machine learning technique to tackle the "Open Lab Nursing Activity Recognition Challenge" is proposed and evaluated. First, the data provided will be explored, including data preprocessing and data structure design. Next, the CNN model implementation that challenged the problem will be presented, followed by a test result performance analysis of the model and an evaluation for the whole experiment. Before concluding, methods of improvement and future work regarding the project are formulated. 


## Exploration

### Data Description

For training purposes, the “Open Lab Nursing Activity Recognition Challenge” provides three sets of data from different sensors (acceleration, motion capture, meditag), labelled with 6 different nursing activities performed by 6 persons. In the experiment, only the motion capture data is used, comprised of 29 positional sensors. The dataset is split into multiple labelled segments of 60 seconds each. For each segment, the sensor positions (x, y, z coordinates) have been recorded, resulting in 87 data points and a timestamp. The activity label is inferred from a join of the segment id with the label dataset, which also includes the nurse id (Lago et al., 2019).

### Data preprocessing


The data source for the project can be configured to be Google Drive, GitHub or the downloaded local repository. The official data source is only accessible via a regularly expiring link, inhibiting a direct download. After the download, each input file is read and saved in dataframes. To clean the dataset from NaN values, the techniques of front-filling, back-filling and finally zero-filling are used. The order is important here. Front-filling fills up missing values with the previous one, which makes sense for time series data. The back-filling afterwards fills up empty rows before the capture of the first value, padding the start. If the dataframe still includes NaN values afterwards, it must mean that there is an entire sensor missing for that segment. Then the column is filled with zeros. Since the time intervals are constant for all samples, it is also safe to remove the timestamp from the dataset and maintain ordering by the index. In order to prevent exploding and vanishing gradients, a min-max normalization step is applied before defining the final dataset. When accessing the dataset for training, the segment id is dropped.

In [0]:
#@title
import pandas as pd
import glob
import numpy as np
import math

COLAB = True
DATA_SOURCE = 'github' # can be google, github, or local
LOCATION_PREFIX = '' # path of data (or download location for github)


colab_root = '/content/' if COLAB else ''

if DATA_SOURCE == 'google':
    from google.colab import drive
    drive.mount(colab_root + 'drive')
    LOCATION_PREFIX = colab_root + 'drive/My Drive/' + LOCATION_PREFIX
elif DATA_SOURCE == 'github':
    import urllib.request
    
    dl_location = colab_root + LOCATION_PREFIX
    print("Downloading Data...")
    filename, headers = urllib.request.urlretrieve('https://github.com/pitwegner/UTS_ML2019_Project/archive/master.zip', filename=dl_location + 'master.zip')
    import zipfile
    print("Extracting...")
    with zipfile.ZipFile(filename, 'r') as zip_ref:
        zip_ref.extractall(dl_location)
    LOCATION_PREFIX = dl_location + 'UTS_ML2019_Project-master/data/'
    
print("Reading Labels...")
# Read activity labels for segments and nurse id
activities = pd.read_csv(LOCATION_PREFIX + "activities_train.csv")
activity_arr = activities.activity_id.unique()
activity_arr.sort()

# Read Motion Capture Data
mocap = pd.DataFrame()
print("Reading Mocap Data...")
i = 0
bar_length = 50
files = glob.glob(LOCATION_PREFIX + "mocap/segment*.csv")
for mf in files:
    # Basic NaN value removal
    mocap = mocap.append(pd.read_csv(mf).ffill().bfill().fillna(0))
    i += 1
    progress = math.ceil(bar_length * i / len(files))
    print("\r", "[" + "=" * progress + " " * (bar_length - progress) + "] " + "{0:.2f}".format(100 * i / len(files)) + '%', end="")

print("\nCreating Index...")
# Drop time column, since constant frequency
mocap = mocap.reset_index().drop(columns=['index','time_elapsed'])

# Min-max normalization
print("Normalizing...")
mocap_normalized = (mocap-mocap.min())/(mocap.max()-mocap.min())
mocap_normalized.segment_id = mocap.segment_id
mocap = mocap_normalized
print("Done.")

In [0]:
#@title
import torch
from torch.autograd import Variable
import torch.nn.functional as F
from torch.utils import data
import torch.optim as optim

torch.manual_seed(0)
np.random.seed(0)

class Dataset(data.Dataset):
  
    def __init__(self, train, labels):
        self.labels = labels
        self.data = train

    def __len__(self):
        return len(self.data)

    def __getitem__(self, index):
        X = self.data[index].drop(columns=['segment_id']).values
        sid = self.data[index].segment_id.unique()[0]
        labels = self.labels[self.labels.segment_id == sid]
        aid = labels.activity_id.values[0]
        y = np.array([activity_arr.tolist().index(aid), sid])

        return X, y

dataset = Dataset(mocap, activities)

A data inspection shows the distribution of persons and activity segments in the dataset.

In [0]:
#@title
print("Person", ":", "[(activity_id, segment count), ...]")
for a in activities.subject.unique():
    act = activities[activities.subject == a]
    print(a, ":", [(i, len(act[act.activity_id == i].segment_id)) for i in np.sort(act.activity_id.unique())])

### Data Split and Sampling

After the data preprocessing, two different approaches for the data split are provided: Splitting the data  randomly or splitting the data by  person. For the person split, four persons are chosen for training, one person for validation and one for testing. This results in an approximate split of 70-15-15. For a random split, the segments are shuffled and also split by the same percentages. Afterwards, the window indices are determined by the window length of 200 (twice the frequency) and the stride of 50 for each segment. By saving only the indices, memory can be saved, not  storing  redundant  information. For each resulting dataset, a RandomWindowSampler handles the access during training, selecting the window of the fixed window length from a random permutation  of the indices. This way, each window is selected exactly once per epoch.


In [0]:
from torch.utils.data.sampler import Sampler    

window_length = 200 # = 2*f

# Sampler that iterates a random permutation of start indices and selects window
class RandomWindowSampler(Sampler):
  
    def __init__(self, indices):
        self.indices = indices

    def __iter__(self):
        return (slice(self.indices[i], self.indices[i] + window_length) for i in torch.randperm(len(self.indices)))

    def __len__(self):
        return len(self.indices)

In [0]:
# Split data either randomly by segments or by person (as official test set introduces a new person)
PERSON_SPLIT = True
TEST_PERSON = 2
VAL_PERSON = 4

if PERSON_SPLIT:
    indices = {}
    for sid in dataset.data.segment_id.unique():
        i = list(dataset.data[dataset.data.segment_id == sid].index[0:-window_length:50])
        p = activities[activities.segment_id == sid].subject.item()
        if p not in indices:
            indices[p] = []
        indices[p] += i

    test_indices = indices.pop(TEST_PERSON)
    val_indices = indices.pop(VAL_PERSON)
    train_indices = [item for sublist in indices.values() for item in sublist]
else:
    segments = dataset.data.segment_id.unique()
    np.random.shuffle(segments)
    
    split = int(np.floor(0.15 * len(segments))) # 70% training, 15% validation, 15% testing
    train, val, test = segments[split+split:], segments[split:split+split], segments[:split]
    
    train_indices, val_indices, test_indices = ([],[],[])
    for sid in train:
        train_indices += list(dataset.data[dataset.data.segment_id == sid].index[0:-window_length:50])
    for sid in val:
        val_indices += list(dataset.data[dataset.data.segment_id == sid].index[0:-window_length:50])
    for sid in test:
        test_indices += list(dataset.data[dataset.data.segment_id == sid].index[0:-window_length:50])

train_sampler = RandomWindowSampler(train_indices)
val_sampler = RandomWindowSampler(val_indices)
test_sampler = RandomWindowSampler(test_indices)

# Create data loaders to parallelize batch training to multiple cores
def get_train_loader(batch_size):
    return torch.utils.data.DataLoader(dataset, batch_size=batch_size, sampler=train_sampler, num_workers=3)
val_loader = torch.utils.data.DataLoader(dataset, batch_size=128, sampler=val_sampler, num_workers=3)
test_loader = torch.utils.data.DataLoader(dataset, batch_size=4, sampler=test_sampler, num_workers=3)

### Model Design

The model architecture is a Convolutional Neural Network (CNN) including a convolutional layer, a pooling layer and two fully connected layers. The RELU activation function is used on both the convolution layer and the first fully connected layer. The loss function we used for training and validation loss is cross-entropy loss, a standard for multi-class classification with probability outputs between 0 an 1. The torch implementation also applies the softmax activation function to the output, before calculating the loss. As an optimizer, adaptive moment estimation (ADAM) was applied. The architectural idea is based on the work of Bevilacqua et al. (2018). The code is largely inspired by a pytorch CNN tutorial (Algorithmia, 2019).


![Architecture](https://raw.githubusercontent.com/pitwegner/UTS_ML2019_Project/master/img/architecture.png)

In [0]:
#@title
class SimpleCNN(torch.nn.Module):
    
    def __init__(self):
        super(SimpleCNN, self).__init__()
        
        kernel_size = 3
        stride = 1
        padding = 1
        output_channels = 24
        
        pooling_size = 2
        pooling_stride = 2
        pooling_padding = 0
        hidden_parameters = 64
        
        # Calculate output size after convolution
        conv_output_x = int((window_length - kernel_size + 2 * padding) / stride) + 1
        conv_output_y = int((dataset[0:1][0].shape[1] - kernel_size + 2 * padding) / stride) + 1
        pool_output_x = int((conv_output_x - pooling_size + 2 * pooling_padding) / pooling_stride) + 1
        pool_output_y = int((conv_output_y - pooling_size + 2 * pooling_padding) / pooling_stride) + 1
        self.dense_input = output_channels * pool_output_x * pool_output_y
        
        self.conv1 = torch.nn.Conv2d(1, output_channels, kernel_size=kernel_size, stride=stride, padding=padding)
        self.pool = torch.nn.MaxPool2d(kernel_size=pooling_size, stride=pooling_stride, padding=pooling_padding)
        self.fc1 = torch.nn.Linear(self.dense_input, hidden_parameters)
        self.fc2 = torch.nn.Linear(hidden_parameters, len(activity_arr))
        
    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = self.pool(x)
        x = x.view(-1, self.dense_input)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return(x)

### Test Design

As an activity recognition model, the accuracy of the model is measured by its classification  performance. The challenge requires to predict the activity of a given segment. In  order  to  test the model accuracy, the activity for each window is predicted and compared to the label  data. The final test aggregates the predicitions for each segment, selecting the mode as a final decision (majority vote). The  results are  compared to the activity labels to compute the overall accuracy.


In [0]:
from scipy import stats
np.set_printoptions(suppress=True)

def testNet(net):
    confusion_matrix = np.zeros((6,6))
    votes = {}
    print("Starting Test Run")

    for i, data in enumerate(test_loader, 0):
        inputs, labels = data
        inputs = inputs.reshape((inputs.shape[0], 1, inputs.shape[1], inputs.shape[2]))
        inputs, labels = Variable(inputs), Variable(labels)

        val_outputs = net(inputs)
        predictions = val_outputs.argmax(1)

        # Collect votes
        for s in range(len(labels)):
            segment = labels[s,1].item()
            if segment not in votes:
                votes[segment] = []
            votes[segment].append(predictions[s].item())

        # Compare test vs. label
        confusion_matrix[predictions, labels[:,0]] += 1
        print("\r", "{0:.2f}%".format(100 * i / len(test_loader)), end="")

    # Select prediction as vote majority
    correct_votes = 0
    for sid in votes:
        label = dataset.labels[dataset.labels.segment_id == sid].activity_id.values[0]
        votes[sid] = [activity_arr[stats.mode(votes[sid])[0][0]], label]
        correct_votes += int(votes[sid][0] == votes[sid][1])

    print("\rConfusion matrix for individual windows:")
    print(confusion_matrix)
    print("Accuracy: {0:.2f}%".format(100 * np.trace(confusion_matrix)/np.sum(confusion_matrix)))
    print("Vote prediction for entire segments:")
    print(votes)
    print("Accuracy: {0:.2f}%".format(100 * correct_votes/len(votes)))

## Methodology 

### Model Training

For training the network, a batch size of 32 was selected, resulting in about 500 loops per epoch. Each loop includes a forward pass, backward pass and optimisation step, updating the weights. Training is performed with a learning rate of 0.000001. Comparisons showed unstable learning or no learning at all for larger learning rates. Decreasing the learning rate resulted in a smooth learning curve, making it easier to decide when to stop training.

### Model Validation

At the end of each epoch, a validation pass on the validation set is performed to measure an independent loss during training. The validation result cannot be used for training, but is only an observation to recognize overfitting the training data. Thus, it serves as a measure for when to stop training. As a heuristic, training is stopped when the validation loss has not decreased for 10 epochs. Each time the validation loss improves, the model at that state is saved for future testing.

In [0]:
import time

def trainNet(net, batch_size, n_epochs, learning_rate):
    print("Training started")
    train_loader = get_train_loader(batch_size)
    n_batches = len(train_loader)
    
    # Select loss function and optimizer
    loss = torch.nn.CrossEntropyLoss()
    optimizer = optim.Adam(net.parameters(), lr=learning_rate)
    
    min_val_loss = math.inf
    worse_counter = 0
    
    training_start_time = time.time()
    
    for epoch in range(n_epochs):
        
        running_loss = 0.0
        total_train_loss = 0.0
        print_every = 25
        start_time = time.time()
        
        worse_counter += 1
        
        for i, data in enumerate(train_loader, 0):
            inputs, labels = data
            # reshape data since we only have 1 channel
            inputs = inputs.reshape((inputs.shape[0], 1, inputs.shape[1], inputs.shape[2]))
            inputs, labels = Variable(inputs), Variable(labels)

            # Forward pass, backward pass, optimize
            optimizer.zero_grad()
            outputs = net(inputs)
            loss_size = loss(outputs, labels[:,0])
            loss_size.backward()
            optimizer.step()
            
            # Aggregate losses for plotting and printing
            running_loss += loss_size.data.item()
            total_train_loss += loss_size.data.item()
            
            # Print average running loss every 25th batch of an epoch
            if (i + 1) % print_every == 0:
                print("Epoch {}, {:d}% \t train_loss: {:.2f} took: {:.2f}s".format(epoch + 1, int(100 * (i + 1) / n_batches), running_loss / print_every, time.time() - start_time))
                running_loss = 0.0
                start_time = time.time()
        
        # Run validation pass at end of epoch
        total_val_loss = 0.0
        for i, data in enumerate(val_loader, 0):
            inputs, labels = data
            inputs = inputs.reshape((inputs.shape[0], 1, inputs.shape[1], inputs.shape[2]))
            inputs, labels = Variable(inputs), Variable(labels)

            val_outputs = net(inputs)
            val_loss_size = loss(val_outputs, labels[:,0])
            total_val_loss += val_loss_size.data.item()
            
        loss_avg = total_val_loss / len(val_loader)
        print("Validation loss = {:.2f}{}".format(loss_avg, ' (worse, {})'.format(worse_counter) if loss_avg >= min_val_loss else ' (better)'))
        if loss_avg < min_val_loss:
            min_val_loss = loss_avg
            worse_counter = 0
            
            # Save best model for testing
            best_model = SimpleCNN().load_state_dict(net.state_dict())
        
        # Append average loss for plotting
        val_losses.append(loss_avg)
        train_losses.append(total_train_loss / n_batches)
        
        # Stop training if we haven't improved for 10 epochs
        if worse_counter >= 10:
            break
        
    print("Training finished, took {}s".format(time.time() - training_start_time))

In [0]:
CNN = SimpleCNN()
train_losses = []
val_losses = []
best_model = CNN

# Train run
trainNet(CNN.double(), batch_size=32, n_epochs=150, learning_rate=0.000001)

In [0]:
# Loss visualization
import plotly.graph_objects as go
fig = go.Figure()
fig.add_trace(go.Scatter(x=np.arange(len(train_losses)), y=train_losses, mode='lines', name='train_loss'))
fig.add_trace(go.Scatter(x=np.arange(len(val_losses)), y=val_losses, mode='lines', name='val_loss'))
fig.show()

# Test run
testNet(best_model)

### Alternative Approaches

The traditional way to tackle time series classification is using Recurrent Neural Networks (RNN). A comparison between the two approaches in terms of accuracy and efficiency could yield insight into the relatively new approach of using CNN to detect patterns in time series. Also within the capabilities of CNN, there are more transformations to try for this kind of data. For example, X, Y and Z could be interpreted as channels, similar to pictures. This could make more use of the pattern-recognition abilities of convolutional layers. The challenge summary (Lago et al., 2019) also shows different approaches, such as KNN and random forests, which would be interesting to compare against this model.

## Evaluation

In this project, the data can be split in two different ways. In a random split method, the data is shuffled and randomly sampled into training, validation and test set. This, however, is not realistic. The model needs to learn the abstract activity, independent of the performing subject. This also corresponds to the challenge, which evaluated all participants based on a test set of 2 new subjects. Hence in the person split method, the data is split according to the performing person. This way, there are four persons in the training set, one person in the validation set and one person in the test set. Both methods will be evaluated in this chapter.

### Model Performance

#### Random Split

The random split method results quickly in relatively high accuracy results. The figure below indicates the loss of both validation and training set after 75 epochs of training. The training loss steadily decreases, as does the validation loss. However, the validation loss fluctuates, indicating that further learning should be possible with more epochs.

![Random_Segment_Split](https://raw.githubusercontent.com/pitwegner/UTS_ML2019_Project/master/img/plot_random_segment_split.png)

The table below represents the random split confusion matrix of window prediction, where each column represents the real label of each window and the row represents the prediction of each window. From the table, we observe that about half of the predictions made were accurate, with the most uncertainty for activity 2, 6 and 9. Aggragating the predictions for each segment using the majority vote, the accuracy improves to 71.43%.

Confusion matrix of window prediction with random split (52.71% accuracy):

|     *    | $l_2$ | $l_3$ | $l_4$ | $l_6$ | $l_9$ | $l_{12}$ |
| -------- | ----- | ----- | ----- | ----- | ----- | -------- |
| $p_2$    |   366 |   103 |    68 |   185 |    61 |       39 |
| $p_3$    |    88 |   411 |    36 |     4 |     5 |        3 |
| $p_4$    |    83 |    80 |   181 |    49 |   210 |       62 |
| $p_6$    |    25 |    25 |     2 |   146 |    74 |       92 |
| $p_9$    |    39 |     4 |     6 |    32 |   269 |      103 |
| $p_{12}$ |   171 |    10 |    21 |    27 |   142 |      688 |

Segment prediction using majority vote (71.43% accuracy):

| segment | 4 | 31 | 56 | 57 | 63 | 88 | 101 | 173 | 185 | 194 | 215 | 227 | 241 | 248 | 290 | 295 | 305 | 306 | 336 | 362 | 373 |
| - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| label | 2 | 4 | 12 | 2 | 2 | 2 | 6 | 9 | 9 | 12 | 3 | 9 | 2 | 3 | 4 | 12 | 3 | 12 | 9 | 3 | 9 |
| predict | 2 | 2 | 12 | 2 | 4 | 2 | 2 | 9 | 9 | 12 | 2 | 12 | 2 | 3 | 4 | 12 | 3 | 12 | 6 | 3 | 9 |
| **segment** | **385** | **480** | **508** | **515** | **531** | **559** | **573** | **607** | **620** | **631** | **639** | **640** | **667** | **689** | **703** | **711** | **748** | **753** | **765** | **777** | **795** |
| label | 3 | 2 | 6 | 9 | 6 | 12 | 3 | 6 | 3 | 12 | 12 | 12 | 12 | 2 | 9 | 12 | 2 | 4 | 9 | 12 | 12 |
| predict | 3 | 12 | 4 | 12 | 2 | 12 | 3 | 6 | 4 | 12 | 12 | 12 | 12 | 2 | 4 | 12 | 2 | 4 | 9 | 12 | 12 |

#### Person Split

Contrary to the random split method, the person split method learns much slower, but more stable. The training stopped after ~80 epochs at a validation loss of about 1.25, compared to the still running random split method at a validation loss of about 1.15. This is expected, since the abstraction is harder to learn and more susceptible to overfitting. The result produced an accuracy of 31.04% on single windows. More than half of Activities 2,3,4,9 and 6 are predicted wrong, while only activity 12 has correctly predicted the majority. The reason behind this might be the distinct motion of activity 12, which is 'Indwelling drip retention and connection'. Especially the prediction of activity 6 fails almost for every window. 

![Person_Split](https://raw.githubusercontent.com/pitwegner/UTS_ML2019_Project/master/img/plot_person_split.png)

Confusion matrix of window prediction with person split (31.04% accuracy):

|     *    | $l_2$ | $l_3$ | $l_4$ | $l_6$ | $l_9$ | $l_{12}$ |
| -------- | ----- | ----- | ----- | ----- | ----- | -------- |
| $p_2$    |   186 |   212 |   205 |   231 |    78 |      148 |
| $p_3$    |    60 |   305 |    34 |     0 |     3 |        0 |
| $p_4$    |    46 |    58 |   126 |    17 |    81 |      118 |
| $p_6$    |     7 |    18 |     0 |    45 |    33 |       49 |
| $p_9$    |    51 |   103 |   104 |    13 |   207 |      118 |
| $p_{12}$ |   371 |   437 |    78 |   122 |   138 |      451 |

Segment prediction using majority vote (44.19% accuracy):

| segment | 0 | 56 | 67 | 85 | 99 | 138 | 143 | 144 | 145 | 189 | 196 | 200 | 210 | 213 | 219 | 227 | 229 | 243 | 283 | 305 | 328 |
| - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| label | 6 | 12 | 12 | 2 | 3 | 3 | 3 | 4 | 12 | 6 | 2 | 9 | 9 | 3 | 2 | 9 | 2 | 9 | 4 | 3 | 3 |
| predict | 2 | 12 | 12 | 2 | 3 | 12 | 2 | 2 | 12 | 2 | 12 | 12 | 12 | 12 | 12 | 9 | 2 | 9 | 2 | 2 | 12 |
| **segment** | **340** | **380** | **387** | **406** | **432** | **441** | **459** | **515** | **604** | **612** | **613** | **625** | **639** | **662** | **669** | **674** | **680** | **682** | **685** | **686** | **723** | **773** |
| label | 12 | 9 | 12 | 4 | 12 | 12 | 3 | 9 | 12 | 3 | 2 | 4 | 12 | 3 | 4 | 3 | 6 | 2 | 2 | 12 | 6 | 3 |
| predict | 12 | 2 | 12 | 2 | 12 | 12 | 12 | 9 | 12 | 2 | 2 | 9 | 12 | 3 | 4 | 3 | 2 | 12 | 12 | 4 | 2 | 12 |

Due to the high prediction rate of activity 12 in window prediction, activity 12 got the highest perdiction amount other activities in segment prediction. This leads to an interesting insight that 90% of activity 12 is predicted correctly, while the accuracy of other activities are very low. And also, due to low accuracy of the window prediction of activity 6, the segment prediction accuracy of activity 6 is down to zero.

## Conclusion

Using CNN for the classification of spatial time series for activity recognition has proven to work to some extent in the complex healthcare domain. For the explained use cases of automatic documentation, however, this prototype is not sufficient. The tuning of learning rate is just one example of many, how the model can be further improved, next to architectural changes in depth and type. Majority voting was shown to be efficient for determining a prediction of a larger context based on smaller steps. This experimental project also lead to the insight that overall prediction results may be overshadowed by one single exceptionally good candidate.

Comparing the random split method to the person split, the accuracy is much better for random trainig due to slight personal differences in execution of the same activity. However, the 44.19% accuracy on an unseen person ranks at about the baseline of the challenge. As the labels for the test dataset are not available, a true comparison is unfortunately not possible.

### Future Work

There are three kinds of data provided in the paper, which are motion capture, accelerometer and meditag data. However, in our model, we only use the motion capture data. Therefore, one logical improvement is data augumentation, which could protect from overfitting and increase accuracy by introducing more potentially important features. Also, since we only used the raw data after norminalization, other feature extraction methods could be applied before training.  

Secondly, some changes could be made about the network architecture. Instead of only using one convolution layer in CNN network, a deeper network could be applied to detect higher-level features, potentially combined with more data input. Furthermore, more traditional deep learning techniques can be used in the model, such as learning decay, Xavier & He initialization, and batch normalization. Also, since the the data is the sequence data, which is the domain of RNN network, a combination of a CNN feature extractor and RNN sequence classifier could yield interesting results.

## Ethical Considerations

As a machine learning model, common ethical issues considering the application in a real world domain arise naturally. These typically aim at potential mistakes, training bias and data protection (Osoba & Welser, 2017). Escpecially in the medical domain, mistakes can have a huge impact on a human life. In the case of our model, however, a wrong documentation entry about a nurse activity has a limited capability to directly cause damage. Patient health can be compromised by a left out or wrong procedure, for example developing pressure ulcers as a result of improper turning and repositioning of bedridden patients. The documentation record can then be used to trace back where the mistake has happened. Thus, our model provides support in preventing and retracing clinical incidents without directly impacting patient safety. Additionally, giving the nurse a chance to manually adjust the report after its generation further mitigates the impact of erroneous results. Other benefits of applying the model in nursing include a decreased workload and a more complete activity report for accounting purposes.

Activity recognition in a workplace also adds ethical complications. If not protected, tool could easily be (mis-)used by managers to monitor workforce activity, intruding into their privacy. The insights could result in different treatment for individual employees, even up to terminating the employment. Privacy impairment from the data pool itself is a further issue, since detailed movements are possibly visible from motion capture data, even if much harder to process than, say video. Thus, secure storage and access to data, as well as a proper authorization to the model results are crucial for an implementation.

Considering the results being biased toward training data, the risk of discrimination only leads to more inaccurate results for new employees. As this is known and can be tested beforehand, the impact is minimal. Overall, with proper data protection and sufficient accuracy, the benefits of the technique outweigh the possible risks. From a utalitarian point of view, the use of our model can hence be considered ethically acceptable. A deontological approach also does not object. Since results can be modified, the autonomy and freedom of will are not inhibited by applying the model, as long as privacy is ensured. Furthermore, the universalization of the application neither contradict with the intention nor with itself.

## Reference List

Algorithmia 2019, ‘Convolutional Neural Nets in PyTorch’, *Algorithmia Blog*, weblog, 10 April, viewed 25 September 2019, https://blog.algorithmia.com/convolutional-neural-nets-in-pytorch. 

Bevilacqua, A., MacDonald, K., Rangarej, A., Widjaya, V., Caulfield, B. & Kechadi, T. 2018, 'Human Activity Recognition with Convolutional Neural Networks', in *Joint European Conference on Machine Learning and Knowledge Discovery in Databases*, Springer, Cham, pp. 541-552.

Lago, P., Alia, S. S., Takeda, S., Mairittha, T., Mairittha, N., Faiz, F., Nishimura, Y., Adachi, K., Okita, T., Charpillet, F. & Inoue, S. 2019. 'Nurse care activity recognition challenge: summary and results', in Proceedings of the *2019 ACM International Joint Conference on Pervasive and Ubiquitous Computing* and *Proceedings of the 2019 ACM International Symposium on Wearable Computers*, ACM, pp. 746-751.

Lago, P., Alia, S.S., Shamma, A., Mairittha, T., Mairittha, N. & Inoue, Z. 2019. 'Open Lab Nursing Activity Recognition Challenge', *IEEE Data Port*.

Osoba, O.A. & Welser IV, W. 2017, 'An intelligence in our image: The risks of bias and errors in artificial intelligence', Rand Corporation, Santa Monica, Calif.