# Pre-Processing of the 2016-ANSAMO Dataset

**Directory:**
    > `Subject_<nr>_ADL_<activity>.csv`
    > ...

**Types of Executed ADLs:**   
1) normal walking, 2) light jogging, 3) body bending, 4) hopping, 5) climbing stairs (up), 6) climbing stairs (down), 7) lying down and getting up from a bed, 8) sitting down (and up) on (from) a chair.

**Columns Units:**  
    After the header, every line in the files corresponds to a measurement captured by a particular mobility sensor of a determined node (mote or SensorTag).  
    The format of the lines, which is also explained in the file header, includes 7 numerical values separated by a semicolon:  
        -The time (in ms) since the experiment began.  
        -The number of the sample (for the same sensor and node).  
        -The three real numbers describing the measurements of the triaxial sensor (x-axis, y-axis and z-axis). The units are g, °/s or μT depending on whether the measurement was performed by an accelerometer, a gyroscope or a magnetometer, respectively.  
        -An integer (0, 1 or 2) describing the type of the sensor that originated the measurement (Accelerometer = 0 , Gyroscope = 1, Magnetometer = 2)  
        -An integer (from 0 to 4) informing about the sensing node (the correspondence between this numerical code and the Bluetooth MAC address and position of the motes is described in the file header).

In [7]:
## Readme File ##
from IPython.display import IFrame
IFrame("./2016-ANSAMO-Readme.pdf", width=800, height=800)

## Desired Dataset Format

current header = TS(ms);SampleNr;X-Axis;Y-Axis;Z-Axis;SensorType;SensorID  
desired header = TS(ms); AccX; AccY; AccZ; MagnX; MagnY; MagnZ; GyroX; GyroY; GyroZ; SubjectID; Gender; Age; Position; Label; and other Feature Extraction Columns (mean, std, corr, etc...)

Pre-Processing Tasks:
    - clean headers information
    - divide the files by subject: "subject_01.csv"; "subject_02.csv"; ... with the desired header above.

In [5]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

In [3]:
##
## first download dataset
##

import sys
sys.path.append('../../../')
from ipynb.fs.full.Utils import download_from_google_drive
download_from_google_drive('fuMxj-dnMXEPHGhHwuQoNjBovO_v5Kdk', '../../../datasets/ANSAMO-2016.zip');

The File Already Exists. Please Change The Path Destination.


In [None]:
# upload raw csv file
train_dataRaw = pd.read_csv('../../../datasets/ANSAMO-2016/UMA_ADL_FALL_Dataset/UMAFall_Subject_01_ADL_Bending_1_2016-06-13_20-25-34.csv')
#
train_dataRaw.head(31)

In [50]:
# ignore fist lines
train_data = train_dataRaw.iloc[31:]
train_data.head(5)

Unnamed: 0,% Universidad de Malaga - ETSI de Telecomunicacion (Spain)
31,% TimeStamp; Sample No; X-Axis; Y-Axis; Z-Axis...
32,145;1;-0.4218206405639648;1.136497378349304;0....
33,145;2;-0.5496244430541992;0.902985155582428;0....
34,145;3;-0.6334832310676575;0.8695393800735474;0...
35,145;4;-0.6050428152084351;0.9466843605041504;0...
36,145;5;-0.5244794487953186;0.9684126377105713;0...
37,145;6;-0.485295444726944;0.8882149457931519;0....
38,145;7;-0.4090046286582947;0.7871447205543518;0...
39,146;8;-0.3587130010128021;0.6935194730758667;0...
40,146;9;-0.3360092043876648;0.681801974773407;0....


In [116]:
def process_file(inputPath, outputpath):
    train_dataRaw = pd.read_csv('../../../datasets/ANSAMO-2016/UMA_ADL_FALL_Dataset/UMAFall_Subject_01_ADL_Bending_1_2016-06-13_20-25-34.csv')
    header_tmp = {'ts(ms)': [], 'SampleNr': [], 'X-Axis': [], 'Y-Axis': [], 'Z-Axis': [], 'SensorType': [] , 'SensorID': []}
    values = pd.DataFrame(data=header_tmp)
    i = 0
    for index, row in train_data.iterrows():
        line = row[0]
        array = line.split(';')
        if i > 0: values = values.append(pd.DataFrame({'ts(ms)': [array[0]], 'SampleNr': [array[1]], 'X-Axis': [array[2]], 'Y-Axis': [array[3]], 'Z-Axis': [array[4]], 'SensorType': [array[5]] , 'SensorID': [array[6]]}))
        i+=1;
    values.to_csv(outputpath, sep=',')

In [125]:
# write to csv
import os
files = os.listdir('../../../datasets/ANSAMO-2016/UMA_ADL_FALL_Dataset')

for file in files:
    if '.csv' in file:
        process_file(base_path + file, file)

In [122]:
# read from processed csv
train_data_processed = pd.read_csv('UMAFall_Subject_01_ADL_Bending_1_2016-06-13_20-25-34.csv')
train_data_processed.head(5)

Unnamed: 0.1,Unnamed: 0,SampleNr,SensorID,SensorType,X-Axis,Y-Axis,Z-Axis,ts(ms)
0,0,1,0,0,-0.421821,1.136497,0.278784,145
1,0,2,0,0,-0.549624,0.902985,0.104717,145
2,0,3,0,0,-0.633483,0.869539,0.03392,145
3,0,4,0,0,-0.605043,0.946684,0.069563,145
4,0,5,0,0,-0.524479,0.968413,0.0659,145


In [7]:
train_data_processed_ordered = train_data_processed.sort_values(['SensorID']);
train_data_processed_ordered.head(10)

Unnamed: 0.1,Unnamed: 0,SampleNr,SensorID,SensorType,X-Axis,Y-Axis,Z-Axis,ts(ms)
0,0,1,0,0,-0.421821,1.136497,0.278784,145
1978,0,1979,0,0,-0.19905,0.928863,-0.103527,9983
1979,0,1980,0,0,-0.201615,0.928497,-0.109752,9992
1980,0,1981,0,0,-0.201126,0.929717,-0.108531,9993
1981,0,1982,0,0,-0.202834,0.927155,-0.105724,9998
1982,0,1983,0,0,-0.200393,0.929962,-0.107432,10003
1983,0,1984,0,0,-0.198929,0.93167,-0.110485,10008
1984,0,1985,0,0,-0.199173,0.929352,-0.106334,10013
1985,0,1986,0,0,-0.197342,0.927155,-0.107677,10018
1986,0,1987,0,0,-0.199295,0.930084,-0.110728,10040


In [8]:
## Change from SensorID to Position, and SensorType

#header = [ 'ts(ms)', 'accX', 'accY', 'accZ', 'magX', 'magY', 'magZ', 'gyrX', 'gyrY', 'gyrZ', 'position', 'label' ]
##  Sensor_ID	 Position	 Device Model                                   
#0	 RIGHTPOCKET	 lge-LG-H815-5.1                                
#1	 CHEST	 SensorTag                                            
#3	 WRIST	 SensorTag                                            
#4	 ANKLE	 SensorTag                                            
#2	 WAIST	 SensorTag
import sys
sensors_position = ['pocket', 'chest', 'waist', 'wrist', 'ankle']
sensors_type = ['acc', 'gyro', 'magn']
train_data_processed_ordered_changed = train_data_processed_ordered;

for index, line in train_data_processed_ordered_changed.iterrows():
    sensor_id = int(line.loc['SensorID'])
    sensor_type = int(line.loc['SensorType'])
    position = sensors_position[sensor_id];
    sensor_type = sensors_type[sensor_type]
    train_data_processed_ordered_changed.loc[index, 'SensorID'] = position;
    train_data_processed_ordered_changed.loc[index, 'SensorType'] = sensor_type;

In [9]:
train_data_processed_ordered_tmp = train_data_processed_ordered_changed.sort_values(['SensorType', 'ts(ms)']);
train_data_processed_ordered_tmp.head(12)

Unnamed: 0.1,Unnamed: 0,SampleNr,SensorID,SensorType,X-Axis,Y-Axis,Z-Axis,ts(ms)
0,0,1,pocket,acc,-0.421821,1.136497,0.278784,145
1,0,2,pocket,acc,-0.549624,0.902985,0.104717,145
2,0,3,pocket,acc,-0.633483,0.869539,0.03392,145
3,0,4,pocket,acc,-0.605043,0.946684,0.069563,145
4,0,5,pocket,acc,-0.524479,0.968413,0.0659,145
5,0,6,pocket,acc,-0.485295,0.888215,0.0659,145
6,0,7,pocket,acc,-0.409005,0.787145,0.089581,145
24,0,25,pocket,acc,-0.67462,0.830234,-0.009658,146
25,0,26,pocket,acc,-0.732479,0.814121,0.023667,146
26,0,27,pocket,acc,-0.743466,0.8046,0.180277,146


In [10]:
mylist = list(set(train_data_processed_ordered_tmp['SensorType']))
mylist

['gyro', 'acc', 'magn']

In [11]:
#train_data_processed_ordered_tmp = train_data_processed_ordered.sort_values(['SensorType', 'ts(ms)']);
### xi - xi+1 , avgIntertvalTime: ? , minIntervalTime: ?, maxIntervalTime?

## divide signals per position
train_data_pocket = train_data_processed_ordered_tmp[train_data_processed_ordered_tmp['SensorID'].isin(['pocket'])];
train_data_wrist = train_data_processed_ordered_tmp[train_data_processed_ordered_tmp['SensorID'].isin(['wrist'])];
train_data_ankle = train_data_processed_ordered_tmp[train_data_processed_ordered_tmp['SensorID'].isin(['ankle'])];
train_data_waist = train_data_processed_ordered_tmp[train_data_processed_ordered_tmp['SensorID'].isin(['waist'])];
train_data_chest = train_data_processed_ordered_tmp[train_data_processed_ordered_tmp['SensorID'].isin(['chest'])];

In [123]:
print("nr of pocket signals: ", len(train_data_pocket.size))
print("nr of wrist signals: ", train_data_wrist.size)
print("nr of ankle signals: ", train_data_ankle.size)
print("nr of waist signals: ", train_data_waist.size)
print("nr of chest signals: ", train_data_chest.size)

nr of pocket signals:  23784
nr of wrist signals:  7176
nr of ankle signals:  7176
nr of waist signals:  7200
nr of chest signals:  7200


In [13]:
# pocket 
pocket_acc = train_data_pocket[train_data_pocket['SensorType'].isin(['acc'])];
pocket_gyro = train_data_pocket[train_data_pocket['SensorType'].isin(['gyro'])];
pocket_magn = train_data_pocket[train_data_pocket['SensorType'].isin(['magn'])];

# wrist
wrist_acc = train_data_wrist[train_data_wrist['SensorType'].isin(['acc'])];
wrist_gyro = train_data_wrist[train_data_wrist['SensorType'].isin(['gyro'])];
wrist_magn = train_data_wrist[train_data_wrist['SensorType'].isin(['magn'])];

# ankle
ankle_acc = train_data_ankle[train_data_ankle['SensorType'].isin(['acc'])];
ankle_gyro = train_data_ankle[train_data_ankle['SensorType'].isin(['gyro'])];
ankle_magn = train_data_ankle[train_data_ankle['SensorType'].isin(['magn'])];

# waist
waist_acc = train_data_waist[train_data_waist['SensorType'].isin(['acc'])];
waist_gyro = train_data_waist[train_data_waist['SensorType'].isin(['gyro'])];
waist_magn = train_data_waist[train_data_waist['SensorType'].isin(['magn'])];

# chest
chest_acc = train_data_chest[train_data_chest['SensorType'].isin(['acc'])];
chest_gyro = train_data_chest[train_data_chest['SensorType'].isin(['gyro'])];
chest_magn = train_data_chest[train_data_chest['SensorType'].isin(['magn'])];

In [166]:
print("nr of acc signals from pocket: ", len(pocket_acc))
print("nr of gyro signals from pocket: ", len(pocket_gyro))
print("nr of magn signals from pocket: ", len(pocket_magn))
print("------------------------------------------------")
print("nr of acc signals from wrist: ", len(wrist_acc))
print("nr of gyro signals from wrist: ", len(wrist_gyro))
print("nr of magn signals from wrist: ", len(wrist_magn))
print("------------------------------------------------")
print("nr of acc signals from ankle: ", len(ankle_acc))
print("nr of gyro signals from ankle: ", len(ankle_gyro))
print("nr of magn signals from ankle: ", len(ankle_magn))
print("------------------------------------------------")
print("nr of acc signals from waist: ", len(waist_acc))
print("nr of gyro signals from waist: ", len(waist_gyro))
print("nr of magn signals from waist: ", len(waist_magn))
print("------------------------------------------------")
print("nr of acc signals from chest: ", len(chest_acc))
print("nr of gyro signals from chest: ", len(chest_gyro))
print("nr of magn signals from chest: ", len(chest_magn))

nr of acc signals from pocket:  2973
nr of gyro signals from pocket:  0
nr of magn signals from pocket:  0
------------------------------------------------
nr of acc signals from wrist:  299
nr of gyro signals from wrist:  299
nr of magn signals from wrist:  299
------------------------------------------------
nr of acc signals from ankle:  299
nr of gyro signals from ankle:  299
nr of magn signals from ankle:  299
------------------------------------------------
nr of acc signals from waist:  300
nr of gyro signals from waist:  300
nr of magn signals from waist:  300
------------------------------------------------
nr of acc signals from chest:  300
nr of gyro signals from chest:  300
nr of magn signals from chest:  300


In [27]:
###
### Analysis Only Of Ts Column
###

old_ts = 0;
deltas_ts = pd.DataFrame({'delta_ts':[]})
for index, line in chest_acc.iterrows():
    ts = line['ts(ms)']
    if old_ts >= ts:
        print("fuck")
        break
    delta_ts = ts - old_ts;
    deltas_ts = deltas_ts.append(pd.DataFrame({'delta_ts':[delta_ts]}))
    old_ts = ts;
print("chest acc")
deltas_ts.describe()

chest acc


Unnamed: 0,delta_ts
count,300.0
mean,49.926667
std,12.735328
min,7.0
25%,47.0
50%,49.0
75%,50.0
max,172.0


In [164]:
###
### join all signals of acc, magn, and gyro
###

print("nr of acc signals from chest: ", chest_acc.size)
print("nr of gyro signals from chest: ", chest_gyro.size)
print("nr of magn signals from chest: ", chest_magn.size)

header = [ 'ts(ms)', 'accX', 'accY', 'accZ',
          'magX', 'magY', 'magZ',
          'gyrX', 'gyrY', 'gyrZ',
          'userGender', 'userAge', 'userID',
          'position', 
          'label',
          'filename',
          'experiment' ];

### shit temporary
shit_tmp = pd.DataFrame(columns=['ts'])

signalType = "acc";
user_gender = "Male";
user_age = 18;
user_id = 1;
position = "chest"
filename = "filename"
experiment = "experiment"
i = 0
for index, line in chest_acc.iterrows():
    i += 1
    '''row = chest_all[chest_all['ts(ms)'] == line['ts(ms)']];
    if row.size != 0:
        print("auch")
        chest_all.at[row.index, 'accX'] = line['X-Axis'];
        chest_all.at[row.index, 'accY'] = line['Y-Axis'];
        chest_all.at[row.index, 'accZ'] = line['Z-Axis'];            
    else:
        ts_tmp = [line['ts(ms)']];
        if signalType is "acc": 
                acc_values_tmp = [ [line['X-Axis']] ,  [line['Y-Axis']],  [line['Z-Axis']]];
                magn_values_tmp = [ [-1] ,  [-1],  [-1]];
                gyro_values_tmp = [ [-1] ,  [-1],  [-1]]; 
        elif signalType is "magn":
                acc_values_tmp = [ [-1] ,  [-1],  [-1]]; 
                magn_values_tmp = [ [line['X-Axis']] ,  [line['Y-Axis']],  [line['Z-Axis']]];
                gyro_values_tmp = [ [-1] ,  [-1],  [-1]]; 
        else:
               acc_values_tmp = [ [-1] ,  [-1],  [-1]];    
               magn_values_tmp = [ [-1] ,  [-1],  [-1]];
               gyro_values_tmp = [ [line['X-Axis']] ,  [line['Y-Axis']],  [line['Z-Axis']]];         
        
        tmp = pd.DataFrame({'ts(ms)': ts_tmp, 
                            'accX':acc_values_tmp[0],
                            'accY':acc_values_tmp[1],
                            'accZ': acc_values_tmp[2], 
                            'magX':magn_values_tmp[0],
                            'magY':magn_values_tmp[1], 
                            'magZ': magn_values_tmp[2], 
                            'gyrX':gyro_values_tmp[0],
                            'gyrY': gyro_values_tmp[1],
                            'gyrZ': gyro_values_tmp[2], 
                            'userGender': [user_gender], 
                            'userAge': [user_age], 
                            'userID': [user_id],
                            'position': line['SensorID'], 
                            'label':['null'],
                            'filename':[filename],
                            'experiment': [experiment]})'''
    
    #tmp = pd.DataFrame({'ts(ms)': [1]})
#for i in range(0,10):
#    shit_tmp = shit_tmp.append(pd.DataFrame({'ts': [i]}))

nr of acc signals from chest:  2400
nr of gyro signals from chest:  2400
nr of magn signals from chest:  2400
last index: 300


In [154]:
chest_all = pd.DataFrame(columns=['ts'])

#for index, line in chest_all.iterrows():
for x in range(0, 300):
    chest_all = chest_all.append(pd.DataFrame({'ts': [0]}))

chest_all.size

300

In [111]:
chest_all.head(5)

Unnamed: 0,accX,accY,accZ,gyrX,gyrY,gyrZ,label,magX,magY,magZ,position,ts(ms),userAge,userGender,userID,userIDposition
0,0.935547,-0.087158,0.439453,-1,-1,-1,,-1,-1,-1,chest,172,18,Male,1.0,
0,0.954102,-0.081543,0.312256,-1,-1,-1,,-1,-1,-1,chest,190,18,Male,1.0,
0,0.954102,-0.081543,0.312256,-1,-1,-1,,-1,-1,-1,chest,204,18,Male,1.0,
0,0.954102,-0.081543,0.312256,-1,-1,-1,,-1,-1,-1,chest,212,18,Male,1.0,
0,0.998535,-0.185303,0.335205,-1,-1,-1,,-1,-1,-1,chest,239,18,Male,1.0,


In [93]:
values = pd.DataFrame({'ts':[1,2,3], 'value': [3,4,5]})
values.describe()
acc_valueX = [ [line['X-Axis']] ,  [line['Y-Axis']],  [line['Z-Axis']]];


acc_values_tmp = [ [-1] ,  [-1],  [-1]];    
magn_values_tmp = [ [-1] ,  [-1],  [-1]];
gyro_values_tmp = [ [-1] ,  [-1],  [-1]];
 
tmp = pd.DataFrame({'ts(ms)': 1, 'accX': acc_values_tmp[0], 
                    'accY': line['Y-Axis'], 'accZ': line['Z-Axis'], 'magX': [1]})

print(acc_valueX[0])
row = values[values['ts'] == 2];
print(row)
if row.size == 0:
    print("null")
    #values.
else:    
    print(row.index)
    #values = values.loc[row.index]['ts'] = 0;
    values.at[row.index, 'ts'] = 0;
    print(values)

[0.918212890625]
   ts  value
1   2      4
Int64Index([1], dtype='int64')
   ts  value
0   1      3
1   0      4
2   3      5


In [107]:
header = [ 'ts(ms)', 'accX', 'accY', 'accZ', 'magX', 'magY', 'magZ', 'gyrX', 'gyrY', 'gyrZ', 'position', 'label' ]
pd.DataFrame(columns=header)

Unnamed: 0,ts(ms),accX,accY,accZ,magX,magY,magZ,gyrX,gyrY,gyrZ,position,label
