-
Notifications
You must be signed in to change notification settings - Fork 0
/
preparation.tex
309 lines (251 loc) · 23 KB
/
preparation.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
\chapter{Preparation}
This chapter details the work done before the main implementation of the project was started. It
details the devices chosen to implement this project and the reasons for choosing them. It then
discusses the existing libraries and APIs available for those devices and for the required data
processing. The design decisions are supported by some preliminary results on sensor noise. Finally, it describes software engineering techniques used.
\section{Requirements analysis}
The aim of the project is to classify activities based on accelerometer recordings from a consumer smartwatch and smartphone, and evaluate to what extent the smartwatch is better at helping to classify activities. The requirements to accomplish this can be split into two categories: data collection and data processing requirements.
\subsubsection{Data collection requirements}
\begin{enumerate}
\item Access triaxial readings from accelerometer on both the smartwatch and the smartphone;
\item store this accelerometer data temporarily on the internal memory of each device using suitable data structures;
\item transmit this data from the smartwatch to the smartphone using a suitable protocol, because of memory constraints and to enable transfer to a computer;
\item store the data permanently on the smartphone.
\end{enumerate}
\subsubsection{Data processing requirements}
\begin{enumerate}
\item Parse the data into a manipulatable format;
\item preprocess the data, including filtering and splitting into fixed-length bins;
\item extract features from each bin;
\item train classifiers on the extracted features;
\item test classifiers and record classification accuracy metrics for evaluation.
\end{enumerate}
The remainder of this chapter describes work done to ensure these requirements could be fulfilled.
\section{Hardware devices}
The success of this project depends partly on correct selection and understanding of the devices
used to collect data. Both the smartwatch and the smartphone are required to contain accelerometers accessible to
developers.
Android devices were chosen as Android Wear was the most mature platform for developing with
wearable devices at the time. It runs on the widest variety of devices and provides developer
access to its sensors.
\subsection{Smartphone}
The smartphone chosen for development was the Google Nexus 5. Smartphone technology has
advanced to the point that many Android smartphones are homogeneous with respect to this
project --- they all contain sufficient processing power, internal memory and an
accelerometer capable of recording data.
The Nexus 5 contains a triaxial accelerometer capable of recording measurements $\pm2\si{g}$
on each axis, where $\si{g} \approx 9.81\si{\metre\per\square\second}$. This gives a total possible magnitude of $\sqrt{3 \times (2\mathrm{g})^2} = 2\mathrm{g}\sqrt{3} \approx 34 \si{\metre\per\square\second}$. Many other smartphones are susceptible to this limit and it is not thought that this will be an issue for classification.
Figure~\ref{fig:nexus-5-accelerometer} shows the front face of a Nexus 5 with the axes of the
accelerometer labeled.
\begin{figure}[!th]
\centering
\includegraphics[width=0.3\textwidth]{Nexus_5_Accelerometer}
\caption[A Nexus 5 device, overlaid with the coordinate system used by the Android API.]{A Nexus 5 device, overlaid with the coordinate system used by the Android API. The positive $x$ direction is defined as towards the top of the phone, the positive $y$ direction is defined as towards the right of the phone and the positive $z$ direction comes out of the screen. These directions are all relative to the natural portrait orientation of the device; they do not change when the device is used in horizontal orientation. Original image~\cite{nexus-5} licensed under CC BY 2.5 / direction arrows added.}
\label{fig:nexus-5-accelerometer}
\end{figure}
%TODO: include tech specs of Nexus 5?
\subsection{Smartwatch}
\label{sec:smartwatch}
The smartwatch chosen for development was the Samsung Galaxy Gear Live, running Android Wear.
It pairs to any device running Android 4.4 or higher and communicates over Bluetooth.
Wearable devices that do not run Android typically run either Tizen, an open-source but not
widely adopted operating system, such as the Samsung Galaxy Gear 2, or a proprietary
operating system that does not allow access to the raw accelerometer data, for example the
Jawbone Up. Without access to accelerometer data, they cannot be used to compare classification accuracy with smartphones.
There is more differentiation in smartwatches than there is in smartphones, with them varying
not just in screen size but also in screen format (round or rectangular), battery life,
charging facilities and sensors. Table~\ref{tab:smartwatch-features} presents an overview of possible smartwatch devices.
Though the Sony Smartwatch 3 has the best technical stats, it wasn't yet fully released at the time we acquired the smartwatch. The group has had previous success with Samsung devices and the Gear Live met all the requires I had of the smartwatch for the project.
\begin{table}
\centering
{\tabulinesep=1.2mm
\begin{tabu} to \linewidth { p{2cm} X[l] X[l] X[l] X[l]}
Device & \textbf{Samsung Galaxy Gear Live} & \textbf{Samsung Galaxy Gear 2} &\textbf{ LG G Watch} & \textbf{Sony Smartwatch 3} \\
\hline
Operating System & Android Wear & Tizen & Android Wear & Android Wear \\
\hline
Processor & 1.2 GHz single-core Qualcomm Snapdragon 400 & 1.0 GHz dual-core Exynos 3250 & 1.2 GHz single-core Qualcomm Snapdragon 400 & 1.2 GHz quad-core ARM A7 \\
\hline
Memory & 512 MB RAM & 512 MB RAM & 512 MB RAM & 512 MB RAM \\
\hline
Storage & 4 GB & 4 GB & 4 GB & 4 GB \\
\hline
Sensors & Touchscreen, Accelerometer, Gyroscope, Compass, Heart Rate Monitor & Touchscreen, Accelerometer, Gyroscope, Heart Rate Sensor, 2 MP Camera & Touchscreen, Accelerometer, Gyroscope, Compass & Touchscreen, Accelerometer, Gyroscope, Compass \\
\hline
Radios & Bluetooth 4.0 Low Energy & Bluetooth 4.0 Low Energy & Bluetooth 4.0 Low Energy & Bluetooth 4.0 Low Energy, GPS, NFC, Wi-Fi \\
\hline
Battery & 300 mAh & 300 mAh & 400 mAh & 420 mAh \\
\hline
Notes & & Pairs only with Samsung devices & & \\
\hline
\end{tabu}}
\caption[An overview of possible smartwatch devices]{An overview of possible smartwatch devices. The Samsung Galaxy Gear Live was the device eventually chosen.}
\label{tab:smartwatch-features}
\end{table}
\section{Activities for classification}
\label{sec:activities_for_classification}
Activities can vary in the amount of movement required, their typical body position and the parts of the body which are moving. When picking activities to attempt to classify, I wanted to select a mix of activities that were similar in some characteristics and that varied in others.
The activities selected fit into three broad categories:
\begin{itemize}
\item physical activities requiring whole body motion;
\item activities that primarily require arm motion;
\item low energy upright activities.
\end{itemize}
The activities that I hope to classify are as follows:
\begin{itemize}
\item Physical activities requiring whole body motion:
\begin{itemize}
\item Climbing
\item Cycling
\item Gym cycling
\item Running
\item Stair climbing
\item Walking
\end{itemize}
\item Activities that primarily require arm motion:
\begin{itemize}
\item Computer use
\item Eating
\item Playing fussball
\end{itemize}
\item Low energy upright activities:
\begin{itemize}
\item Gallery perusal
\item Standing
\item Teeth brushing
\end{itemize}
\end{itemize}
\section{Working with accelerometer signals}
\label{sec:intro-sig-processing}
\begin{figure}
\centering
\includegraphics[width=1.0\textwidth]{x_y_z_flat_plot}
\caption[Noisy readings of the X, Y and Z axes]{The X, Y and Z axis readings from an hour long accelerometer recording of the chosen smartphone laying flat on a table. The readings contain noise.}
\label{fig:x_y_z_flat_plot}
\end{figure}
The output from any accelerometer is a time-series representing its acceleration. Effectively extracting information from this time-series is central to the success of this project. Knowledge of signal processing is therefore critical.
It is essential to capture as much of the movement as possible. Conversion from continuous
physical acceleration to a discrete time-series requires sampling. The Nyquist-Shannon sampling theorem states that a signal can be exactly reconstructed from its samples if the sample rate is greater than twice the highest frequency of the signal.
The highest frequency of a physical activity is not well defined. The activities I hope to classify will vary in their periodicity. Some, like walking, will be very periodic, while others, like climbing, will have no period at all. Considering periodic activities like cycling and walking, I anticipate that the frequencies that best describe movement will be present in the 0--5~\si{Hz} range, and so will require sampling at a frequency of at least 10~\si{Hz}.
\subsubsection{Frequency domain analysis}
Much of the analysis of the accelerometer readings will be done in the frequency domain. A time domain signal can be converted into the frequency domain using a Fourier transform.
The discrete Fourier transform of a sequence of N complex numbers $f_0, f_1, \ldots, f_{N-1}$ is the sequence $F$. The $k$-th element of $F$ is defined by:
$$F_k = \sum\limits_{n=0}^{N-1} f_n \cdot \mathrm{e}^{-2\pi \mathrm{i} kn/N}$$
The power spectrum, $PS_k$, of a signal describes how power is distributed over different frequencies. One method of estimating the power spectrum is to take the square of the absolute value of the Fourier transform component:
$$PS_k = \|F_k\|^2$$
\subsubsection{Noise and filtering}
The readings from the accelerometer are subject to noise, exhibited in
Figure~\ref{fig:x_y_z_flat_plot}, which plots readings from the X, Y, and Z axes during an hour long recording with the chosen smartphone laying flat on a table. Note that majority of the acceleration is present in the positive z axis, indicating the phone is face down.
Figure~\ref{fig:noise_histogram} plots the distribution of the magnitude of the acceleration, where the magnitude $\|\mathbf{x}\| = \sqrt{x^2+y^2+z^2}$. The magnitude, which should be a constant $\mathrm{g} \approx 9.81 \si{\meter\per\square\second}$, is subject to normally distributed noise.
\begin{figure}[p]
\centering
\includegraphics[width=0.7\textwidth]{noise_histogram}
\caption[Histogram of the magnitude from the data shown in Figure~\ref{fig:x_y_z_flat_plot}.]{Histogram of the magnitude $\|\mathbf{x}\| = \sqrt{x^2+y^2+z^2}$ from the data shown in Figure~\ref{fig:x_y_z_flat_plot}. The magnitude should measure $ \mathrm{g} \approx 9.81\si{\metre\per\square\second}$, which is marked as the black dashed line on the graph. The noise implies the accelerometer data is imprecise. The mean of the data is less than $\mathrm{g}$. This difference between the recorded mean and the true value is known as the bias of the accelerometer.}
\label{fig:noise_histogram}
\end{figure}
\begin{figure}[p]
\centering
\includegraphics[width=0.5\textwidth]{noise_prob_plot}
\caption[A normal probability plot of the magnitude from the data shown in Figure~\ref{fig:x_y_z_flat_plot}.]{A normal probability plot of the magnitude $\|\mathbf{x}\| = \sqrt{x^2+y^2+z^2}$ from the data shown in Figure~\ref{fig:x_y_z_flat_plot}. Data that is normally distributed will form a straight line when plotted in this way. The straight line of best fit exhibits a coefficient of determination, $R^2 = 0.9993$, indicating is highly likely that this noise is Gaussian distributed.}
\label{fig:noise_prob_plot}
\end{figure}
Figure~\ref{fig:noise_prob_plot} gives a normal probability plot of the same magnitude data. Points on a normal probability plot should form a straight line if they are normally distributed. The straight line of best fit exhibits a coefficient of determination, $R^2 = 0.9993$, which is very close to 1. It is very likely that the noise is normally distributed.
%TOOD: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.probplot.html
Noise can be reduced with the application of a low-pass filter. A low-pass filter attenuates signals with a higher frequency than some cutoff, such as the noise exhibited in the signal. There is more information on low-pass filters in Section~\ref{sec:importing-and-preprocessing}.
%TODO: explain how filters work?
\section{Libraries and APIs}
This project makes use of existing libraries and APIs for the data collection, data handling
and classification aspects of the project. I investigate each library and API early on to ensure I don't encounter any potential show-stopping issues further along.
\subsection{Android Sensor API}
\label{sec:sensor-api}
The Android platform Sensor API\cite{androidsensoreventapi} is implemented using a publisher-subscriber model. Listeners must be registered
to a particular sensor and must implement an \texttt{onSensorChanged()}
method. The \texttt{onSensorChanged()} method is called whenever the sensor reports a new
value. A \texttt{SensorEvent} object is provided, containing a timestamp at which the data was
reported together with the new data.
The rate at which \texttt{onSensorChanged()} is called is `user-suggested'; though it can be
specified by the user, it can also be altered by the Android system. In practice, this means
that the difference in timestamps is not constant but is approximately equal to the specified
delay. A histogram of timestamp differences for a particular 1 hour recording is given in
figure~\ref{fig:timestamp-differences}.
\begin{figure}[!b]
\centering
\includegraphics[width=1.0\textwidth]{timestamp_histogram}
\caption[Histogram of the differences in successive timestamps]{Histogram of the differences in successive timestamps from the data shown in Figure~\ref{fig:x_y_z_flat_plot}.
The sample rate was set to 50~\si{Hz}. A difference of $0.02002\si{s} \approx 49.95~\si{Hz}$ accounted for 75\% of all differences.
Thus the actual sample rate is approximately the user-suggested sample rate.}
\label{fig:timestamp-differences}
\end{figure}
Android provides both acceleration and linear acceleration sensors, related by
$$\textrm{acceleration} = \textrm{linear acceleration} + \textrm{gravity}$$
They each provide a timestamp represented as a 64-bit integer (i.e. a long) and three 32-bit float values representing the
acceleration of each axis in \si{\metre\per\square\second} at that timestamp.
Table~\ref{tab:data-row} gives a graphical representation of the data returned.
\begin{table}
\tabcolsep=0.11cm
\centering
\begin{tabu} to \linewidth {|X[2,c] | X[c] | X[c] | X[c] |}
\hline
Timestamp & X acceleration & Y acceleration & Z acceleration \\
\si{ns} & \si{\metre\per\square\second} & \si{\metre\per\square\second} & \si{\metre\per\square\second} \\
Long & Float & Float & Float \\
2 bytes & 1 byte & 1 byte & 1 byte \\
\hline
\end{tabu}
\caption[Data from the accelerometer sensor]{Data from the accelerometer sensor provided to the \texttt{onSensorChanged()}
method.}
\label{tab:data-row}
\end{table}
Curiously, the timestamp returned as part of the data is documented only as ``The time in nanosecond [sic] at which the event happened'' \cite{androidsensoreventapi}. Further exploration reveals that the timestamp is not defined against any particular zero-base, but rather the time since the device was powered on \cite{androidissuedocumentationbug, androidissuehardwarebug}. The implication of this for the project is that while the timestamp can be relied on for intervals between measurements, it cannot be used between different sets of recordings or across devices.
%TODO: which are the x y z values of the device's accelerometer?
\subsection{Android Wear Data API}
\label{sec:prep-data-api}
As discussed in Section~\ref{sec:smartwatch}, the only radio present in the Samsung Galaxy
Gear Live is Bluetooth. To transfer any recorded data from the watch, it must first be transferred to the paired smartphone.
The Android Wearable Data Layer API allows communication between Android handheld and wearable
devices. It provides three methods of communication between devices:
\begin{description}
\item[Data items] provide data storage with automatic syncing;
\item[Messages] are good for remote procedure calls but do not carry data;
\item[Asset objects] for sending binary blobs of data.
\end{description}
The project makes use of Assets to send the accelerometer data and Messages to indicate that a particular device has started recording. Use of this API is described in Section~\ref{sec:Transmitting_accelerometer_data}.
%TODO: explain this as a diagram with state machine: recording state, recording off state
The data layer synchronises data between the handheld and wearable. To do so, the Wearable Data Layer API requires the registration of a listener service, much like the Sensor API. The listener service listens for data layer events, such as the creation of asset objects or when messages are received.
\section{Choice of tools}
\subsection{Programming languages}
\label{sec:programming_languages}
Java is the native programming language used on Android. Although it is
possible to write code for Android in programming languages other than Java, for example
by using the Android Native Development Kit, doing so would add much complexity for marginal performance gain that is not required. Using a language other than Java would lead to rewrites of many of the Android APIs The Android SDK builds on the principles of Java but is complicated by having to manage interactions with the Android operating system.
XML is Android's standard markup language. All user-interface components are written in XML.
The project includes a user interface on both the phone and the watch to configure and control the recording of data.
Python 3.4 was chosen as the data processing language due to its ease of use and the strength of its data processing, signal processing and machine learning libraries:
\begin{description}
\item[NumPy:] a scientific computing library and the basis for the other three libraries below \cite{van2011numpy}.
\item[Pandas:] extensions to NumPy that enable easier processing of time-series data \cite{pandas}.
\item[SciPy:] signal processing tools and other statistical features \cite{scipy}.
\item[Scikit Learn:] machine learning classifiers and utilities to work with them \cite{scikitlearn}.
\end{description}
All of NumPy, SciPy, Pandas and Scikit Learn are open-source and licensed under the BSD license.
\subsection{Development Environment}
Two IDEs, Android Studio and PyCharm were used for the development of the Android app and the Python data pipeline respectively. Android Studio is available for free from Google, while PyCharm is provided free for educational use by JetBrains. Both include advanced debuggers.
Though the Android SDK contains a device emulator, it runs slowly and cannot simulate sensors. Developing the Android apps is therefore done by connecting them to a computer and running new versions of the code. This also enables access to the device's logs from the development environment. I made extensive use of logging to determine that the program was executing as expected.
\section{Software engineering techniques}
\subsection{Development methodologies}
I used a combination of development methodologies for the project. The data collection apps were developed using a waterfall methodology, while the data processing was developed using an Agile methodology.
Waterfall models are excellent when the end goals of the project are known and can be well specified. The goal of the data collection apps can be easily stated: to write apps for the smartphone and smartwatch that will allow user collection of accelerometer data.
The data processing and machine learning elements of the project required an Agile methodology. The goal here is less well defined --- to classify activities with the greatest accuracy --- and the implementation to achieve the goal is far more experimental.
\subsection{Version control and backups}
I used three separate Git repositories for the data collection code\footnote{https://github.com/geotho/WearableActivityClassificationApp}, the data processing code\footnote{https://github.com/geotho/Wearable-Activity-Classifier-Machine-Learning} and the dissertation\footnote{https://github.com/geotho/Wearable-Activity-Classification-Dissertation} respectively. The Git repositories were synced to GitHub at each commit. Version control allowed me to follow a \emph{implement--test--commit} pattern when writing code.
GitHub also served as one method of backup. Each GitHub repository is publicly accessible such that I can continue implementation even if my primary development computer crashed and I was also locked out of my GitHub account. In addition, I backed up periodically to Dropbox and to an external hard drive. The external hard drive backup retained old copies of files when they were updated. This gives four replications of my entire project, with two of these able to access previous versions of the code.
\section{Summary}
In this section I presented:
\begin{itemize}
\item information on the smartphone and smartwatch used;
\item activities I seek to classify;
\item information on the accelerometer signals;
\item preliminary results on sensor noise and timestamp inconsistency;
\item details of key APIs used including the Android Sensor API and the Android Wear Data API;
\item development tools and software engineering techniques.
\end{itemize}