# mHealth Dataset
1) Experimental Setup

The collected dataset comprises body motion and vital signs recordings for ten volunteers of diverse profile while performing 12 physical activities (Table 1). 
Shimmer2 [BUR10] wearable sensors were used for the recordings. The sensors were respectively placed on the subject's chest, right wrist and left ankle and 
attached by using elastic straps (as shown in the figure in attachment). The use of multiple sensors permits us to measure the motion experienced by diverse body parts, 
namely, the acceleration, the rate of turn and the magnetic field orientation, thus better capturing the body dynamics. The sensor positioned
on the chest also provides 2-lead ECG measurements which are not used for the development of the recognition model but rather collected for future work purposes.
This information can be used, for example, for basic heart monitoring, checking for various arrhythmias or looking at the effects of exercise on the ECG. 
All sensing modalities are recorded at a sampling rate of 50 Hz, which is considered sufficient for capturing human activity. Each session was recorded using a video camera.
This dataset is found to generalize to common activities of the daily living, given the diversity of body parts involved in each one (e.g., frontal elevation of arms vs.
knees bending), the intensity of the actions (e.g., cycling vs. sitting and relaxing) and their execution speed or dynamicity (e.g., running vs. standing still). The activities
were collected in an out-of-lab environment with no constraints on the way these must be executed, with the exception that the subject should try their best when executing them.

2) Activity set

The activity set is listed in the following:

- L1: Standing still (1 min) 
- L2: Sitting and relaxing (1 min) 
- L3: Lying down (1 min) 
- L4: Walking (1 min) 
- L5: Climbing stairs (1 min) 
- L6: Waist bends forward (20x) 
- L7: Frontal elevation of arms (20x)
- L8: Knees bending (crouching) (20x)
- L9: Cycling (1 min)
- L10: Jogging (1 min)
- L11: Running (1 min)
- L12: Jump front & back (20x)

NOTE: In brackets are the number of repetitions (Nx) or the duration of the exercises (min).

3) Dataset files
The data collected for each subject is stored in a different log file: 'mHealth_subject<SUBJECT_ID>.log'.
Each file contains the samples (by rows) recorded for all sensors (by columns).
The labels used to identify the activities are similar to the ones presented in Section 2 (e.g., the label for walking is '4').
The meaning of each column is detailed next:

- Column 1: acceleration from the chest sensor (X axis)
- Column 2: acceleration from the chest sensor (Y axis)
- Column 3: acceleration from the chest sensor (Z axis)
- Column 4: electrocardiogram signal (lead 1) 
- Column 5: electrocardiogram signal (lead 2)
- Column 6: acceleration from the left-ankle sensor (X axis)
- Column 7: acceleration from the left-ankle sensor (Y axis)
- Column 8: acceleration from the left-ankle sensor (Z axis)
- Column 9: gyro from the left-ankle sensor (X axis)
- Column 10: gyro from the left-ankle sensor (Y axis)
- Column 11: gyro from the left-ankle sensor (Z axis)
- Column 13: magnetometer from the left-ankle sensor (X axis)
- Column 13: magnetometer from the left-ankle sensor (Y axis)
- Column 14: magnetometer from the left-ankle sensor (Z axis)
- Column 15: acceleration from the right-lower-arm sensor (X axis)
- Column 16: acceleration from the right-lower-arm sensor (Y axis)
- Column 17: acceleration from the right-lower-arm sensor (Z axis)
- Column 18: gyro from the right-lower-arm sensor (X axis)
- Column 19: gyro from the right-lower-arm sensor (Y axis)
- Column 20: gyro from the right-lower-arm sensor (Z axis)
- Column 21: magnetometer from the right-lower-arm sensor (X axis)
- Column 22: magnetometer from the right-lower-arm sensor (Y axis)
- Column 23: magnetometer from the right-lower-arm sensor (Z axis)
- Column 24: Label (0 for the null class)

*Units: Acceleration (m/s^2), gyroscope (deg/s), magnetic field (local), ecg (mV)



In [1]:
mhealth_chest_location = [0, 1, 2, 23]
mhealth_left_ankle_location = [5, 6, 7, 23]
mhealth_right_wrist_location = [14, 15, 16, 23]

In [26]:
import os
import sys
import numpy as np
import pandas as pd
import pickle as pkl
import matplotlib.pyplot as plt
from sklearn.utils import shuffle
import random

In [10]:
mhealth_folder = "../../../Datasets/MHEALTHDATASET/"

In [11]:
os.listdir(mhealth_folder)

['mHealth_subject1.log',
 'mHealth_subject10.log',
 'mHealth_subject2.log',
 'mHealth_subject3.log',
 'mHealth_subject4.log',
 'mHealth_subject5.log',
 'mHealth_subject6.log',
 'mHealth_subject7.log',
 'mHealth_subject8.log',
 'mHealth_subject9.log',
 'README.txt']

In [21]:
subject_ids = np.arange(1, 11, 1)
subject_ids, len(subject_ids)

(array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10]), 10)

In [63]:
# We randomly select half subject ids for source dataset and the remaining half for the target dataset
source_ids = subject_ids[random.sample(range(0, max(subject_ids)), len(subject_ids)//2)]
target_ids = [r for r in subject_ids if r not in source_ids]

In [64]:
source_ids, target_ids

(array([ 1,  6,  7,  9, 10]), [2, 3, 4, 5, 8])

In [32]:
# read one subject data
def mhealth_read_one_subject(subject_id, body_position):
    filename = mhealth_folder + "mHealth_subject" + str(subject_id) + ".log"
    
    # Read the file
    df = pd.read_csv(filename, delim_whitespace = True, header = None)
    
    # Remove the data with no class - null class 
    df = df[df[23] != 0]
    
    # We have sensor reading along the 23 columns but we just want the sensor reading of acceleration from the right-lower-arm
    # Those are in the columns 14 (x), 15(y), and 16 (z). We also hava acceleration from chest sensor in columns 0, 1, and 2, 
    # and acceleration from left-ankle sensor in columns 5, 6, and 7. Also, the label is in column 23
    
    df = df[body_position] #cannot leave the label out
    return df

In [33]:
# We need a windowing function, since the sampling frequency is 50 Hz
def mhealth_windowing(data, window_length = 128, overlap_percent = 0.5):
    n_channels = data.shape[1]
    
    # since we also have the class labels
    n_channels =  n_channels - 1
    
    # group the samples based on the class labels in column 23
    groups = data.groupby(23)
    
    # placeholders to save the data
    X_ = []
    Y_ = []
    
    # now loop over the groups and extract the windows with 50% overlap
    for gr in range(1, len(groups) + 1, 1):
        # get the data for that group 
        df_ = groups.get_group(gr)
        
        # Data and labels
        y = pd.unique(df_[23].values)
        x = df_.drop(23, axis=1).values
        
        # now windowing
        n_samples = len(x)
        
        # the number of window segments we will have without overlap
        n_blocks = n_samples // window_length
        n_upto = n_blocks * window_length
        
        # windowing with overlap_percent % overlap
        tp = []
        n_start = 0
        n_end = n_start + window_length
        overlap_length = int(overlap_percent * window_length)
        
        while n_end < n_samples:
            tp.append(x[n_start:n_end])
            n_start = n_end - overlap_length
            n_end = n_start + window_length
        
        # save the data
        X_.append(tp)
        Y_.append(np.array([y] * len(tp), dtype=int).squeeze())
        
    
    # Concatenate and return the data
    X = np.concatenate(X_, axis=0)
    Y = np.concatenate(Y_, axis=0)
    
    return X, Y

In [34]:
# extract and merge all the subjects data into one file
def mhealth_process_data(window_length, overlap_percent, position, subjects_index):
    X_ = []
    Y_ = []
    
#     # from 1 to 11, subject ids
#     total_subject = 10 + 1  
    
#     # select the proper subject ids
#     subject_range = np.arange(1, total_subject, 1)
    
#     if subjects == 'even':
#         subject_range = np.arange(2, total_subject, 2)
#     elif subjects == 'odd':
#         subject_range = np.arange(1, total_subject, 2)
    
    # read the data file based on the subject id
    for s in subjects_index:
        print("Reading subject {} data".format(s))
        
        # read the current subject data
        df = mhealth_read_one_subject(s, position)
        print("Data shape {}".format(df.shape))

        # Apply the windowing to the data
        s_x, s_y = mhealth_windowing(df, window_length, overlap_percent)
        print("Total segmens {}, of shape {}".format(len(s_x), s_x.shape))
        
        # add to the list
        X_.append(s_x)
        Y_.append(s_y)
        
    # concatenate and save all
    X = np.concatenate(X_, axis=0)
    Y = np.concatenate(Y_, axis=0)
    
    return X, Y
    

In [36]:
def combine_components(df, window_length):
    n_samples = df.shape[0]
    X = np.zeros((n_samples, window_length * 3))

    for i in range(n_samples):
        # get the window data at position i
        dp = df[i]

        # placeholders to store the x, y, and z component of the window data
        x = []
        y = []
        z = []
        for q in dp:
            x.append(q[0])
            y.append(q[1])
            z.append(q[2])

        # save the x, y, and z as a single array
        X[i] = np.concatenate((x, y, z))
        
    return X

In [40]:
n_window_length = 128
overlap_percentage = 0.5

In [37]:
# Now the name of the activities
mhealth_activities = ['Standing',
                     'Sitting',
                     'Lying Down',
                     'Walking',
                     'Climbing Stairs',
                     'Waist Bend Forward',
                     'Frontal Elevation Arms',
                     'Knees Bending',
                     'Cycling',
                     'Jogging',
                     'Running',
                     'Jump Front & Back']

In [65]:
source_x, source_y = mhealth_process_data(n_window_length, overlap_percentage, mhealth_right_wrist_location, source_ids)

Reading subject 1 data
Data shape (35174, 4)
Total segmens 526, of shape (526, 128, 3)
Reading subject 6 data
Data shape (32205, 4)
Total segmens 480, of shape (480, 128, 3)
Reading subject 7 data
Data shape (34253, 4)
Total segmens 512, of shape (512, 128, 3)
Reading subject 9 data
Data shape (34354, 4)
Total segmens 514, of shape (514, 128, 3)
Reading subject 10 data
Data shape (33690, 4)
Total segmens 504, of shape (504, 128, 3)


In [66]:
# starting the label from 0
source_y = np.array(source_y) - 1

In [67]:
target_x, target_y = mhealth_process_data(n_window_len, overlap_percentage, mhealth_right_wrist_location, target_ids)

Reading subject 2 data
Data shape (35532, 4)
Total segmens 532, of shape (532, 128, 3)
Reading subject 3 data
Data shape (35380, 4)
Total segmens 530, of shape (530, 128, 3)
Reading subject 4 data
Data shape (35328, 4)
Total segmens 529, of shape (529, 128, 3)
Reading subject 5 data
Data shape (33947, 4)
Total segmens 508, of shape (508, 128, 3)
Reading subject 8 data
Data shape (33332, 4)
Total segmens 498, of shape (498, 128, 3)


In [68]:
target_y = np.array(target_y) - 1

In [69]:
data_folder = "../Processed data/"

In [70]:
os.listdir(data_folder)

['adl_activity_dataset.pickle',
 'adl_activity_dataset_small.pickle',
 'adl_activity_dataset_small_minmax_scaled.pickle',
 'adl_activity_feature_dataset_small.pickle',
 'adl_dataset_small_minmax_scaled_feature.pickle',
 'adl_posture_data.pickle',
 'adl_posture_dataset.pickle',
 'adl_posture_dataset_small.pickle',
 'adl_posture_feature_dataset.pickle',
 'adl_posture_feature_dataset_1.pickle',
 'mHealth_ankle_dataset.pickle',
 'mHealth_ankle_feature_dataset.pickle',
 'mHealth_chest_dataset.pickle',
 'mHealth_chest_feature_dataset.pickle',
 'mHealth_wrist_dataset.pickle',
 'mHealth_wrist_feature_dataset.pickle',
 'mHealth_wrist_feature_dataset_1.pickle',
 'mHealth_wrist_source_dataset.pickle',
 'mHealth_wrist_target_dataset.pickle',
 'uci_body_acc_dataset.pickle',
 'uci_feature_dataset.pickle',
 'uci_feature_dataset_1.pickle']

In [71]:
f = open(data_folder+"mHealth_wrist_source_dataset.pickle", "wb")
pkl.dump([source_x, source_y, source_ids], f)
f.close()

In [72]:
f = open(data_folder+"mHealth_wrist_target_dataset.pickle", "wb")
pkl.dump([target_x, target_y, target_ids], f)
f.close()