# Literature review
## Contents
- Physical activity recognition from accelerometer data using multi-scale ensemble method (Zheng, Wong, Guan, and Trost 2013)
- A comprehensive study of activity recognition using accelerometers (Twomey et al. 2018)
- Summary



# Physical activity recognition from accelerometer data using multi-scale ensemble method (Zheng, Wong, Guan, and Trost 2013)
- simple approach: deployed non-overlapping windows and used them as feature vectors
- improved approach: multi-scale (no single window size) using ensemble method
- model goal: classify an activity of a 10 second time window

In this study, they trained a model to classify a time series with a single activity label. This is different to my task, which is to create a model which can classify the activity in intervals of 1 second, provided accelerometer readings every 0.1 seconds. 

In order to adapt the lessons of this paper to my model I need to deal with the fact that I cannot create 10 second windows on my data for the first couple and last few labels. However, the idea of using multiple time scales to discriminate against differences across multiple time scales is powerful. 

I can add to the 'bag of features' by introducing frequency domain features too (to be investigated in Twomey et al. 2018 in more detail). 

### Datasets
- 30Hz sampling rates
- each time series has a single label for the activity

### Past studies
- reduced the dimensionality of the time series to a set of key statistics and used that to train their supervised learning models
    - tried k-NN algorithms on such reduced dimensionality feature vectors which worked well but not on repetative patterns
- Hidden Markov Models (HMM) appeared to work well for segmenting a time series into different activities, and not just classifying a single time series as one activity

### Multi-scale ensemble method
- features of activities occur at different time scales, so models in the ensemble are tuned to different time scales
- each window has a group of features (18) calculated on each axes, as well as correlation between each pair of axes
- regularisation is used to prevent overfitting
- the motivation of this approach is the 'bag of features' approach where acceleromoter data is turned into a feature vector by computing a whole bunch of potentially useful summary statistics and the model uses regularisation to prevent overfitting on this large number of features
- windows used include integer values between and including 1 and 10 seconds
    - the best performing ensemble used a SVM for each window
        - each model voted on the activity of the time series based on what each time frame was predicted to be
    - then there is a majority vote on the activity based on each ensemble

- the largest time window used was 10 seconds, having been found to be an efffective tradeoff between collecting enough data to make a prediction and having a fast real time detection time
    - **this doesn't quite fit well with my dataset, which has 10 samples per label, effectively constituting 1 second time series each with 10 samples and a single label**

### Experiments
- each model trained on training set and tuned on validation set
- the average macro-F1 (average F1 across multiple classes which each have their own F1) across different training-validation-testing splits
- they tested the following models:
    - 1NN on the raw accelerometer data with different distance metrics
    - ANN with a single hidden layer, tuning the number of hidden layers and decay weights (which is how to incorporate regualrisation into an ANN)
        - using bag of features on a window of 10 seconds
    - a single SVM
        - using bag of features on a window of 10 seconds
        - making this equivalent to the member of the ensseble trained on the 10 second window
    - the ensemble SVM 

### Results
- the emsemble SVM performed the best
- the members of the ensemble were assessed individually too revealing that different time scales could categories different activities better
- the time scales that worked best for he different activities could be used to inform the time scales I need to classify the activities I need to, which are standing, walking, going up and down stairs
    - however, walking appeared to be best for 2, 8, and 9 seconds depending on the dataset, which doesn't provide much insight into which is optimal for my problem 
    - standing worked best with a smaller window of 1 second or slightly more
    - only one of the datasets had going up and down stairs, with the top performing windows being 3 and 2 seconds respectively
        - this works to my advantage because it suggests smaller time scales are required, which is all I have access to for my particualr dataset


# A comprehensive study of activity recognition using accelerometers (Twomey et al. 2018)
- the accuracy of classification is limited by the dataset itself
- accuracy increases with the context that can be provided to the model, which can come in the form of:
    - increased sampling frequency
    - increased window size
    - modelling temporal dependence (structured models)
- the paper recommends sequential classifiers
    - my previsous attempts and those from Zheng et al. 2013 fail to capture the information captured by the order of the data in the time series, which is important as it provides context to the model

## Structured vs. unstructured models
- when we just feed the raw accelerometer data or computed summary statistics to a model we are assuming the the feature vectors are iid, which ignores the temporal dependence (sequence) of the data
    - the accuracy of this assumption is important
- in some contexts both strucured (eg. HMMs) and unstructered (eg. SVM) models perform comparably, while the later are often comutationally cheaper

## Feature windows
- the optimal window duration depends on the placement of the accelerometer
- features used can be split into time and frequency domain:
    - frequency domain examples: entropy, energy, and, coherence (correlation in the frequency domain)

## Summary 
The level of ML acamedia in this paper is beyond my abilities at the moment. There isn't much concrete information I can apply to my own models, since I don't think it's appropriate for me to try and copy the more complicated models from the paper without a proper understanding (which would be required in order to infer the correct way to translate it to my data and tune it). 

Rather, I believe I can draw the followin inferences for my own use:
- I should add frequency domain features to my 'bag of features' approach
- A non-sequential model with sufficient context can perform as good as a sequential model
- If time permits, I can do some research into sequential models and try one in my project and compare

# Summary
- try HMM for segmenting the time series into differenr activities (what my model needs to do)
- try bag of features approach on the time series
    - pair this with regularisation to guard against overfitting
- try an esemble method that uses different time scales
    - this is because patterns that discern activities can occur across different time scales
- walking and going up/down stairs are most accurately classified using windows of 1-3 seconds
    - where as walking appeared to work best for 2 or 8 second windows depending on the data set
    - for Zheng et al. 2013
- however, Zheng et al. 2013 focused on classifying larger time series, and they discuss the need for segmenting models, which my problem seems to need more so

- sensor placement affects what features are most effective 

