# Pre-Processing of the 2016-ANSAMO Dataset

**Directory:**
    > `Subject_<nr>_ADL_<activity>.csv`
    > ...

**Types of Executed ADLs:**   
1) normal walking, 2) light jogging, 3) body bending, 4) hopping, 5) climbing stairs (up), 6) climbing stairs (down), 7) lying down and getting up from a bed, 8) sitting down (and up) on (from) a chair.

**Columns Units:**  
    After the header, every line in the files corresponds to a measurement captured by a particular mobility sensor of a determined node (mote or SensorTag).  
    The format of the lines, which is also explained in the file header, includes 7 numerical values separated by a semicolon:  
        -The time (in ms) since the experiment began.  
        -The number of the sample (for the same sensor and node).  
        -The three real numbers describing the measurements of the triaxial sensor (x-axis, y-axis and z-axis). The units are g, °/s or μT depending on whether the measurement was performed by an accelerometer, a gyroscope or a magnetometer, respectively.  
        -An integer (0, 1 or 2) describing the type of the sensor that originated the measurement (Accelerometer = 0 , Gyroscope = 1, Magnetometer = 2)  
        -An integer (from 0 to 4) informing about the sensing node (the correspondence between this numerical code and the Bluetooth MAC address and position of the motes is described in the file header).

In [7]:
## Readme File ##
from IPython.display import IFrame
IFrame("./2016-ANSAMO-Readme.pdf", width=800, height=800)

## Desired Dataset Format

current header = TS(ms);SampleNr;X-Axis;Y-Axis;Z-Axis;SensorType;SensorID  
desired header = TS(ms); AccX; AccY; AccZ; MagnX; MagnY; MagnZ; GyroX; GyroY; GyroZ; Position; Label;

Pre-Processing Tasks:
    - clean headers information
    - divide the files by subject: "subject_01.csv"; "subject_02.csv"; ... with the desired header above.

In [9]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

In [None]:
# upload raw csv file
train_dataRaw = pd.read_csv('../../../datasets/ANSAMO-2016/UMA_ADL_FALL_Dataset/UMAFall_Subject_01_ADL_Bending_1_2016-06-13_20-25-34.csv')
#
train_dataRaw.head(31)

In [50]:
# ignore fist lines
train_data = train_dataRaw.iloc[31:]
train_data.head(5)

Unnamed: 0,% Universidad de Malaga - ETSI de Telecomunicacion (Spain)
31,% TimeStamp; Sample No; X-Axis; Y-Axis; Z-Axis...
32,145;1;-0.4218206405639648;1.136497378349304;0....
33,145;2;-0.5496244430541992;0.902985155582428;0....
34,145;3;-0.6334832310676575;0.8695393800735474;0...
35,145;4;-0.6050428152084351;0.9466843605041504;0...
36,145;5;-0.5244794487953186;0.9684126377105713;0...
37,145;6;-0.485295444726944;0.8882149457931519;0....
38,145;7;-0.4090046286582947;0.7871447205543518;0...
39,146;8;-0.3587130010128021;0.6935194730758667;0...
40,146;9;-0.3360092043876648;0.681801974773407;0....


In [116]:
def process_file(inputPath, outputpath):
    train_dataRaw = pd.read_csv('../../../datasets/ANSAMO-2016/UMA_ADL_FALL_Dataset/UMAFall_Subject_01_ADL_Bending_1_2016-06-13_20-25-34.csv')
    header_tmp = {'ts(ms)': [], 'SampleNr': [], 'X-Axis': [], 'Y-Axis': [], 'Z-Axis': [], 'SensorType': [] , 'SensorID': []}
    values = pd.DataFrame(data=header_tmp)
    i = 0
    for index, row in train_data.iterrows():
        line = row[0]
        array = line.split(';')
        if i > 0: values = values.append(pd.DataFrame({'ts(ms)': [array[0]], 'SampleNr': [array[1]], 'X-Axis': [array[2]], 'Y-Axis': [array[3]], 'Z-Axis': [array[4]], 'SensorType': [array[5]] , 'SensorID': [array[6]]}))
        i+=1;
    values.to_csv(outputpath, sep=',')

In [125]:
# write to csv
import os
files = os.listdir('../../../datasets/ANSAMO-2016/UMA_ADL_FALL_Dataset')

for file in files:
    if '.csv' in file:
        process_file(base_path + file, file)

In [154]:
# read from processed csv
train_data_processed = pd.read_csv('UMAFall_Subject_01_ADL_Bending_1_2016-06-13_20-25-34.csv')
train_data_processed.head(5)

Unnamed: 0.1,Unnamed: 0,SampleNr,SensorID,SensorType,X-Axis,Y-Axis,Z-Axis,ts(ms)
0,0,1,0,0,-0.421821,1.136497,0.278784,145
1,0,2,0,0,-0.549624,0.902985,0.104717,145
2,0,3,0,0,-0.633483,0.869539,0.03392,145
3,0,4,0,0,-0.605043,0.946684,0.069563,145
4,0,5,0,0,-0.524479,0.968413,0.0659,145


In [160]:
train_data_processed_ordered = train_data_processed.sort_values(['SensorID']);
train_data_processed_ordered.head(10)

Unnamed: 0.1,Unnamed: 0,SampleNr,SensorID,SensorType,X-Axis,Y-Axis,Z-Axis,ts(ms)
0,0,1,0,0,-0.421821,1.136497,0.278784,145
1978,0,1979,0,0,-0.19905,0.928863,-0.103527,9983
1979,0,1980,0,0,-0.201615,0.928497,-0.109752,9992
1980,0,1981,0,0,-0.201126,0.929717,-0.108531,9993
1981,0,1982,0,0,-0.202834,0.927155,-0.105724,9998
1982,0,1983,0,0,-0.200393,0.929962,-0.107432,10003
1983,0,1984,0,0,-0.198929,0.93167,-0.110485,10008
1984,0,1985,0,0,-0.199173,0.929352,-0.106334,10013
1985,0,1986,0,0,-0.197342,0.927155,-0.107677,10018
1986,0,1987,0,0,-0.199295,0.930084,-0.110728,10040


In [158]:
## Change from SensorID to Position, and SensorType

#header = [ 'ts(ms)', 'accX', 'accY', 'accZ', 'magX', 'magY', 'magZ', 'gyrX', 'gyrY', 'gyrZ', 'position', 'label' ]
##  Sensor_ID	 Position	 Device Model                                   
#0	 RIGHTPOCKET	 lge-LG-H815-5.1                                
#1	 CHEST	 SensorTag                                            
#3	 WRIST	 SensorTag                                            
#4	 ANKLE	 SensorTag                                            
#2	 WAIST	 SensorTag
import sys
sensors_position = ['pocket', 'chest', 'waist', 'wrist', 'ankle']
sensors_type = ['acc', 'gyro', 'magn']
train_data_processed_ordered_changed = train_data_processed_ordered;

for index, line in train_data_processed_ordered_changed.iterrows():
    sensor_id = int(line.loc['SensorID'])
    sensor_type = int(line.loc['SensorType'])
    position = sensors_position[sensor_id];
    sensor_type = sensors_type[sensor_type]
    train_data_processed_ordered_changed.loc[index, 'SensorID'] = position;
    train_data_processed_ordered_changed.loc[index, 'SensorType'] = sensor_type;

ValueError: invalid literal for int() with base 10: 'pocket'

In [161]:
train_data_processed_ordered_tmp = train_data_processed_ordered_changed.sort_values(['SensorType', 'ts(ms)']);
train_data_processed_ordered_tmp.head(12)

Unnamed: 0.1,Unnamed: 0,SampleNr,SensorID,SensorType,X-Axis,Y-Axis,Z-Axis,ts(ms)
0,0,1,pocket,acc,-0.421821,1.136497,0.278784,145
1,0,2,pocket,acc,-0.549624,0.902985,0.104717,145
2,0,3,pocket,acc,-0.633483,0.869539,0.03392,145
3,0,4,pocket,acc,-0.605043,0.946684,0.069563,145
4,0,5,pocket,acc,-0.524479,0.968413,0.0659,145
5,0,6,pocket,acc,-0.485295,0.888215,0.0659,145
6,0,7,pocket,acc,-0.409005,0.787145,0.089581,145
24,0,25,pocket,acc,-0.67462,0.830234,-0.009658,146
25,0,26,pocket,acc,-0.732479,0.814121,0.023667,146
26,0,27,pocket,acc,-0.743466,0.8046,0.180277,146


In [170]:
mylist = list(set(train_data_processed_ordered_tmp['SensorType']))
mylist

['magn', 'gyro', 'acc']

In [174]:
#train_data_processed_ordered_tmp = train_data_processed_ordered.sort_values(['SensorType', 'ts(ms)']);
### xi - xi+1 , avgIntertvalTime: ? , minIntervalTime: ?, maxIntervalTime?

## divide signals per position
train_data_pocket = train_data_processed_ordered_tmp[train_data_processed_ordered_tmp['SensorID'].isin(['pocket'])];
train_data_wrist = train_data_processed_ordered_tmp[train_data_processed_ordered_tmp['SensorID'].isin(['wrist'])];
train_data_ankle = train_data_processed_ordered_tmp[train_data_processed_ordered_tmp['SensorID'].isin(['ankle'])];
train_data_waist = train_data_processed_ordered_tmp[train_data_processed_ordered_tmp['SensorID'].isin(['waist'])];
train_data_chest = train_data_processed_ordered_tmp[train_data_processed_ordered_tmp['SensorID'].isin(['chest'])];

In [175]:
print("nr of pocket signals: ", train_data_pocket.size)
print("nr of wrist signals: ", train_data_wrist.size)
print("nr of ankle signals: ", train_data_ankle.size)
print("nr of waist signals: ", train_data_waist.size)
print("nr of chest signals: ", train_data_chest.size)

nr of pocket signals:  23784
nr of wrist signals:  7176
nr of ankle signals:  7176
nr of waist signals:  7200
nr of chest signals:  7200


In [176]:
# pocket 
pocket_acc = train_data_pocket[train_data_pocket['SensorType'].isin(['acc'])];
pocket_gyro = train_data_pocket[train_data_pocket['SensorType'].isin(['gyro'])];
pocket_magn = train_data_pocket[train_data_pocket['SensorType'].isin(['magn'])];

# wrist
wrist_acc = train_data_wrist[train_data_wrist['SensorType'].isin(['acc'])];
wrist_gyro = train_data_wrist[train_data_wrist['SensorType'].isin(['gyro'])];
wrist_magn = train_data_wrist[train_data_wrist['SensorType'].isin(['magn'])];

# ankle
ankle_acc = train_data_ankle[train_data_ankle['SensorType'].isin(['acc'])];
ankle_gyro = train_data_ankle[train_data_ankle['SensorType'].isin(['gyro'])];
ankle_magn = train_data_ankle[train_data_ankle['SensorType'].isin(['magn'])];

# waist
waist_acc = train_data_waist[train_data_waist['SensorType'].isin(['acc'])];
waist_gyro = train_data_waist[train_data_waist['SensorType'].isin(['gyro'])];
waist_magn = train_data_waist[train_data_waist['SensorType'].isin(['magn'])];

# chest
chest_acc = train_data_chest[train_data_chest['SensorType'].isin(['acc'])];
chest_gyro = train_data_chest[train_data_chest['SensorType'].isin(['gyro'])];
chest_magn = train_data_chest[train_data_chest['SensorType'].isin(['magn'])];

In [179]:
print("nr of acc signals from pocket: ", pocket_acc.size)
print("nr of gyro signals from pocket: ", pocket_gyro.size)
print("nr of magn signals from pocket: ", pocket_magn.size)
print("------------------------------------------------")
print("nr of acc signals from wrist: ", wrist_acc.size)
print("nr of gyro signals from wrist: ", wrist_gyro.size)
print("nr of magn signals from wrist: ", wrist_magn.size)
print("------------------------------------------------")
print("nr of acc signals from ankle: ", ankle_acc.size)
print("nr of gyro signals from ankle: ", ankle_gyro.size)
print("nr of magn signals from ankle: ", ankle_magn.size)
print("------------------------------------------------")
print("nr of acc signals from waist: ", waist_acc.size)
print("nr of gyro signals from waist: ", waist_gyro.size)
print("nr of magn signals from waist: ", waist_magn.size)
print("------------------------------------------------")
print("nr of acc signals from chest: ", chest_acc.size)
print("nr of gyro signals from chest: ", chest_gyro.size)
print("nr of magn signals from chest: ", chest_magn.size)

nr of acc signals from pocket:  23784
nr of gyro signals from pocket:  0
nr of magn signals from pocket:  0
------------------------------------------------
nr of acc signals from wrist:  2392
nr of gyro signals from wrist:  2392
nr of magn signals from wrist:  2392
------------------------------------------------
nr of acc signals from ankle:  2392
nr of gyro signals from ankle:  2392
nr of magn signals from ankle:  2392
------------------------------------------------
nr of acc signals from waist:  2400
nr of gyro signals from waist:  2400
nr of magn signals from waist:  2400
------------------------------------------------
nr of acc signals from chest:  2400
nr of gyro signals from chest:  2400
nr of magn signals from chest:  2400


In [211]:
old_ts = 0;
deltas_ts = pd.DataFrame({'delta_ts':[]})
for index, line in chest_acc.iterrows():
    ts = line['ts(ms)']
    if old_ts > ts:
        print("fuck")
        break
    delta_ts = ts - old_ts;
    deltas_ts = deltas_ts.append(pd.DataFrame({'delta_ts':[delta_ts]}))
    old_ts = ts;

deltas_ts.describe()

Unnamed: 0,delta_ts
count,300.0
mean,49.926667
std,12.735328
min,7.0
25%,47.0
50%,49.0
75%,50.0
max,172.0


In [202]:
values = pd.DataFrame({'ts':[1,2]})
values.describe()

Unnamed: 0,ts
count,2.0
mean,1.5
std,0.707107
min,1.0
25%,1.25
50%,1.5
75%,1.75
max,2.0


In [107]:
header = [ 'ts(ms)', 'accX', 'accY', 'accZ', 'magX', 'magY', 'magZ', 'gyrX', 'gyrY', 'gyrZ', 'position', 'label' ]
pd.DataFrame(columns=header)

Unnamed: 0,ts(ms),accX,accY,accZ,magX,magY,magZ,gyrX,gyrY,gyrZ,position,label
