Skip to content

jranaraki/wearable_device_classification

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Authors

Introduction

This document walks you through the all the required steps to extract, clean and process the generated data sourcing from Ethica, GENEActive (Original), Apple Watch (S2), Fitbit (Charge 2), and SenseDoc.

Preliminaries

There are six (four + one) sub-folders named as follows under a folder called data:

  • data
    • Ethica
    • GENEActive
    • HealthData (for both Apple Watch and Fitbit)
    • SenseDoc
    • Merged

The Merged folder is the one which keeps participant specific merged file, a file containing all participants data with NAs, a file containing all participants data with no NAs in class labels and a file containing all participants data with no NAs at all. The data folder also contains a file called intervals.csv which stores information about each participants as follows:

  • herox stores participants alias names
  • kit is the package number
  • phone is phone id
  • watch is Apple watch id
  • fitbit is Fitbit id
  • geneactiv is GENEActiv id
  • snesedoc is SenseDoc id
  • start is start date and time (in ####-##-## ##:##:## format)
  • end is end date and time (in ####-##-## ##:##:## format)
  • userid is a unique id for each participant
  • wrist is 1 if Apple Watch and GENEActiv are on the same wrist, otherwise 0
  • age, gender, weight and height are demographic data
  • street, city, postal are address information for each participant.

Note: This file should be kept updated throughout the experiment.

Each device's folder should contain a folder named with userid (e.g. 301), and all the extracted data from each device should be stored in each device's folder under userid, accordingly (see below example).

  • data
    • Ethica
      • 301
        • abc.csv
      • 302
        • def.csv
      • ...
    • GENEActive
      • 301
        • 123.csv
      • 302
        • 456.csv
      • ...
    • ...

Each device's folder contains an R program to clean and prepare the collected data for the main processing step. These programs are shown in the following:

  • EthicaDataPrep.R
  • GENEActivDataPrep.R
  • HealthDataPrep.R
  • SenseDocDataPrep.R

Steps

The whole data processing is divided into four steps as follows:

  1. Generating labels
  2. Preparing and labeling data
  3. Merging all data files
  4. Classifying the results

Each step is explained in detail in the following.

1. Generating labels

After storing all the extracted data files of each device for a participant, open GENEActivDataPrep.R program in R, update the timezone, path, intrPath, and run the program. The results will be stored as an CSV file under each participant id's folder ending with _labeled.csv, containing cleaned and labeled GENEActiv data.

Info: More information on setting the timezone can be found here.

2. Preparing and labeling data

The labels are ready and we can run the following programs after updating timezone and path variables for each one.

  • EthicaDataPrep.R
  • HealthDataPrep.R
  • SenseDocDataPrep.R

3. Merging all data files

Under Merged folder open merger.R, update timezone and run it to create a file called finalData.csv under each participant folder. The generated file contains all the processed data files for each device in second-level from start to end indicated in intervals.csv. The mergeAllMerged.R creates three files under Merged folder, called mergedData.csv, mergedDataNoNAClass.csv and mergedDataNoNA.csv which contains all participants data with NAs, all participants data with no NAs in class labels and all participants data with no NAs at all, respectively.

4. Classifying the results

Weka is a ready-to-use machine learning package which has been employed to apply a set of classification methods on the resulting dataset (i.e. mergedDataNoNA.csv). Based on the desired setup of data processing, a subset of features/attributes/columns of mergedDataNoNA.csv should be selected and stored in a new CSV file (e.g. subMergedDataNoNA.csv). For this experiment, we used Weka version 3.6.15.

Weka main window

We used Rotation Forest to apply principal component analysis (PCA) and decision tree (J48) classifier to the data using Explorer.

Explorer window

To do so, open Weka and click on Explorer, click on Open file... and browse for subMergedDataNoNA.csv.

Open subMergedDataNoNA.csv in Explorer window

Then click on Classify tab and choose Rotation Forest under classifier>meta category.

Rotation Forest

In the Test Options choose Percentage split and modify the percentage to 70%. This way the data is divided into 70% for training and 30% for testing and validation. Click on start button and the results will be shown in Classifier output window as follows.

Results

About

Code for cleaning, imputation, and classification of wearable device data

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%