# Part VI Time Series Classification

## Overview

This part focuses on the real-world time series classification problem of activity recognition from multivariate accelerometer data recorded from a smartphone, and how to develop machine learning and deep learning classification models to address the problem. The tutorials in this part do not seek to demonstrate the best way to solve the problem, instead the dataset provides a context on which each of the specific methods can be demonstrated. As such, the performance of each method on the dataset are not compared directly. After reading the chapters in this part, you will know:

- A review of recent research highlighting deep learning models and their general configuration that are state-of-the-art for human activity recognition (Chapter 21).
- How to load, summarize and plot a standard human activity recognition dataset comprised of accelerometer data recorded from a smartphone (Chapter 22).
- How to develop nonlinear and ensemble machine learning models from accelerometer data with domain-specific engineered features (Chapter 23).
- How to develop and evaluate a suite of Convolutional Neural Network models for human activity recognition from accelerometer data (Chapter 24).
- How to develop and evaluate a suite of Long Short-Term Memory Neural Network models for human activity recognition from accelerometer data (Chapter 25).

# Chapter 21 Review of Deep Learning Models for Human Activity Recognition

 In this tutorial, you will discover the problem of human activity recognition and the deep learning methods that are achieving state-of-the-art performance on this problem. After reading this tutorial, you will know:
- Activity recognition is the problem of predicting the movement of a person, often indoors, based on sensor data, such as an accelerometer in a smartphone.
- Streams of sensor data are often split into subs-sequences called windows, and each window is associated with a broader activity, called a sliding window approach.
- Convolutional neural networks and long short-term memory networks, and perhaps both together, are best suited to learning features from raw sensor data and predicting the associated movement.

- 活动识别是根据传感器数据（例如智能手机中的加速度计）预测人的运动（通常是在室内）的问题。
- 传感器数据流通常被拆分为称为窗口的子序列，每个窗口都与更广泛的活动相关联，称为滑动窗口方法。
- 卷积神经网络和长短期记忆网络，也许两者结合在一起，最适合从原始传感器数据中学习特征并预测相关的运动。

## 21.1 Overview
This tutorial is divided into five parts; they are: 
1. Human Activity Recognition
2. Benefits of Neural Network Modeling
3. Supervised Learning Data Representation 
4. Convolutional Neural Network Models
5. Recurrent Neural Network Models

## 21.2 Human Activity Recognition
It is a challenging problem as there are no obvious or direct ways to relate the recorded sensor data to specific human activities and each subject may perform an activity with significant variation, resulting in variations in the recorded sensor data. The intent is to record sensor data and corresponding activities for specific subjects, fit a model from this data, and generalize the model to classify the activity of new unseen subjects from their sensor data.

## 21.3 Benefits of Neural Network Modeling

Traditionally, methods from the field of signal processing were used to analyze and distill the collected sensor data. Such methods were for feature engineering, creating domain-specific, sensor-specific, or signal processing-specific features and views of the original data. Statistical and machine learning models were then trained on the processed version of the data. A limitation of this approach is the signal processing and domain expertise required to analyze the raw data and engineer the features required to fit a model. This expertise would be required for each new dataset or sensor modality. In essence, it is expensive and not scalable.

传统上，信号处理领域的方法用于分析和提取收集的传感器数据。这些方法用于特征工程，创建特定于域，特定于传感器或特定于信号处理的特征和原始数据的视图。然后在数据的处理版本上训练统计和机器学习模型。这种方法的局限性是分析原始数据和设计拟合模型所需的特征所需的信号处理和领域专业知识。每个新的数据集或传感器模式都需要这种专业知识。从本质上讲，它很昂贵且不可扩展。

However, in most daily HAR tasks, those methods may heavily rely on heuristic handcrafted feature extraction, which is usually limited by human domain knowledge. Furthermore, only shallow features can be learned by those approaches, leading to undermined performance for unsupervised and incremental tasks. Due to those limitations, the performances of conventional [pattern recognition] methods are restricted regarding classification accuracy and model generalization.

然而，在大多数日常HAR任务中，这些方法可能严重依赖启发式手工特征提取，这通常受到人类领域知识的限制。此外，这些方法只能学习浅层特征，导致无监督和增量任务的性能下降。由于这些限制，传统[模式识别]方法的性能在分类精度和模型泛化方面受到限制。

— Deep Learning for Sensor-based Activity Recognition: A Survey, 2018.

## 21.4 Supervised Learning Data Representation

If the data is recorded at 8 Hz, that means that there will be eight rows of data for one second of elapsed time performing an activity. We may choose to have one window of data represent one second of data; that means eight rows of data for an 8 Hz sensor. If we have x, y, and z data, that means we would have 3 variables. Therefore, a single window of data would be a 2-dimensional array with eight time steps and three features. One window would represent one sample. One minute of data would represent 480 sensor data points, or 60 windows of eight time steps. The total 10 minutes of data would represent 4,800 data points, or 600 windows of data.

[samples, timesteps, features]

One minute of data would represent 480 sensor data points, or 60 windows of eight time steps. The total 10 minutes of data would represent 4,800 data points, or 600 windows of data.

Our example of 10 minutes of accelerometer data recorded at 8 Hz would be summarized as a three-dimensional array with the dimensions:
[600, 8, 3]

There is some risk that the splitting of the stream of sensor data into windows may result in windows that miss the transition of one activity to another. As such, it was traditionally common to split data into windows with an overlap such that the first half of the window contained the observations from the last half of the previous window, in the case of a 50% overlap.

将传感器数据流拆分为窗口可能会导致窗口错过一个活动到另一个活动的转换，这存在一些风险。因此，传统上通常将数据拆分为重叠的窗口，以便在重叠 50% 的情况下，窗口的前半部分包含前一个窗口后半部分的观测值。

## 21.5 Convolutional Neural Network Models

![image.png](attachment:image.png)

When applied to time series classification like HAR, CNN has two advantages over other models: **local dependency** and **scale invariance**. Local dependency means the nearby signals in HAR are likely to be correlated, while scale invariance refers to the scale-invariant for different paces or frequencies.

![image.png](attachment:image.png)

![image.png](attachment:image.png)

## 21.6 Recurrent Neural Network Models

![image.png](attachment:image.png)

Figure 21.4: Depiction of LSTM RNN for Activity Recognition. Taken from Deep Recurrent Neural Networks for Human Activity Recognition.


It may be more common to use an LSTM in conjunction with a CNN on HAR problems, in a CNN-LSTM model or ConvLSTM model. 

**This is where a CNN model is used to extract the features from a subsequence of raw sample data, and output features from the CNN for each subsequence are then interpreted by an LSTM in aggregate.** An example of this is in the 2016 paper by Francisco Javier Ordonez and Daniel Roggen titled Deep Convolutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition.

The figure below taken from the paper makes the architecture clearer. Note that layers 6 and 7 in the image are in fact LSTM layers.

![image.png](attachment:image.png)

Figure 21.5: Depiction of CNN-LSTM Model for Activity Recognition. Taken from Deep Con- volutional and LSTM Recurrent Neural Networks for Multimodal Wearable Activity Recognition.


## 21.7 Extensions

This section lists some ideas for extending the tutorial that you may wish to explore.
- Summarize Problem. Summarize the problem of human activity recognition in 1-3 lines.
- List Methods. Create a list of the methods that are known to perform well for human
activity recognition.
- Example Applications. List 3 examples where models for predicting human activity from sensor data may be useful.

## 21.9 Summary
In this tutorial, you discovered the problem of human activity recognition and the use of deep learning methods that are achieving state-of-the-art performance on this problem. Specifically, you learned:
- Activity recognition is the problem of predicting the movement of a person, often indoors, based on sensor data, such as an accelerometer in a smartphone.
- Streams of sensor data are often split into subs-sequences called windows and each window is associated with a broader activity, called a sliding window approach.
- Convolutional neural networks and long short-term memory networks, and perhaps both together, are best suited to learning features from raw sensor data and predicting the associated movement.