# Pre-Processing of the 2016-ANSAMO Dataset

**Directory:**
    > `Subject_<nr>_ADL_<activity>.csv`
    > ...

**Types of Executed ADLs:**   
1) normal walking, 2) light jogging, 3) body bending, 4) hopping, 5) climbing stairs (up), 6) climbing stairs (down), 7) lying down and getting up from a bed, 8) sitting down (and up) on (from) a chair.

**Columns Units:**  
    After the header, every line in the files corresponds to a measurement captured by a particular mobility sensor of a determined node (mote or SensorTag).  
    The format of the lines, which is also explained in the file header, includes 7 numerical values separated by a semicolon:  
        -The time (in ms) since the experiment began.  
        -The number of the sample (for the same sensor and node).  
        -The three real numbers describing the measurements of the triaxial sensor (x-axis, y-axis and z-axis). The units are g, °/s or μT depending on whether the measurement was performed by an accelerometer, a gyroscope or a magnetometer, respectively.  
        -An integer (0, 1 or 2) describing the type of the sensor that originated the measurement (Accelerometer = 0 , Gyroscope = 1, Magnetometer = 2)  
        -An integer (from 0 to 4) informing about the sensing node (the correspondence between this numerical code and the Bluetooth MAC address and position of the motes is described in the file header).

In [3]:
## Readme File ##
from IPython.display import IFrame
IFrame("./2016-ANSAMO-Readme.pdf", width=900, height=900)

## Desired Dataset Format

TS; AccX; Accy; Accz; MagnX; MagnY; MagnZ; GyroX; GyroY; GyroZ; Label;

In [49]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

train_dataRaw = pd.read_csv('../../../datasets/ANSAMO-2016/UMA_ADL_FALL_Dataset/UMAFall_Subject_01_ADL_Bending_1_2016-06-13_20-25-34.csv')
#
train_dataRaw.head(31)

Unnamed: 0,% Universidad de Malaga - ETSI de Telecomunicacion (Spain)
0,% Date: 2016-06-13_20:25:34 ...
1,% ID: Subject_01_ADL_Bending_1 ...
2,% Name: Subject_01 ...
3,% Age: 22 ...
4,% Height(cm): 167 ...
5,% Weight(Kg): 63 ...
6,% Gender: F ...
7,% Type of Movement: ADL ...
8,% Type of Movement: FALSE ...
9,% Description of the movement: Bending ...


In [50]:
# ignore fist lines
train_data = train_dataRaw.iloc[31:]
train_data.head(5)

Unnamed: 0,% Universidad de Malaga - ETSI de Telecomunicacion (Spain)
31,% TimeStamp; Sample No; X-Axis; Y-Axis; Z-Axis...
32,145;1;-0.4218206405639648;1.136497378349304;0....
33,145;2;-0.5496244430541992;0.902985155582428;0....
34,145;3;-0.6334832310676575;0.8695393800735474;0...
35,145;4;-0.6050428152084351;0.9466843605041504;0...
36,145;5;-0.5244794487953186;0.9684126377105713;0...
37,145;6;-0.485295444726944;0.8882149457931519;0....
38,145;7;-0.4090046286582947;0.7871447205543518;0...
39,146;8;-0.3587130010128021;0.6935194730758667;0...
40,146;9;-0.3360092043876648;0.681801974773407;0....


In [116]:
##
#header = [ 'ts(ms)', 'accX', 'accY', 'accZ', 'magX', 'magY', 'magZ', 'gyrX', 'gyrY', 'gyrZ', 'position', 'label' ]
def process_file(inputPath, outputpath):
    train_dataRaw = pd.read_csv('../../../datasets/ANSAMO-2016/UMA_ADL_FALL_Dataset/UMAFall_Subject_01_ADL_Bending_1_2016-06-13_20-25-34.csv')
    header_tmp = {'ts(ms)': [], 'SampleNr': [], 'X-Axis': [], 'Y-Axis': [], 'Z-Axis': [], 'SensorType': [] , 'SensorID': []}
    values = pd.DataFrame(data=header_tmp)
    i = 0
    for index, row in train_data.iterrows():
        line = row[0]
        array = line.split(';')
        if i > 0: values = values.append(pd.DataFrame({'ts(ms)': [array[0]], 'SampleNr': [array[1]], 'X-Axis': [array[2]], 'Y-Axis': [array[3]], 'Z-Axis': [array[4]], 'SensorType': [array[5]] , 'SensorID': [array[6]]}))
        i+=1;
    values.to_csv(outputpath, sep=',')

In [None]:
import os
files = os.listdir('../../../datasets/ANSAMO-2016/UMA_ADL_FALL_Dataset')

for file in files:
    if '.csv' in file:
        process_file(base_path + file, file)