# RapidsE4Formatter
This function formats raw .csv data from the Empatica E4 wearable sensor into a format compatible with the RAPIDS framework.

## Overview
------
#### Input
Raw .csv files downloaded from Empatica   
#### Output
Properly formatted dataframes compiled from all recordings with correct timestamps  

#### Folder configuration
Folders and files should be organized as shown below.
```bash
├── participant_1
│   └── EmpaticaE4
│       ├── ACC.csv
│       ├── BVP.csv
│       ├── EDA.csv
│       ├── HR.csv
│       ├── IBI.csv
│       └── TEMP.csv
├── participant_2
...
```
#### Write to .csv functionality
Data frames can be written to .csv files by setting parameter csvRW to 'w'.  
CSV files will be stored in a new folder named *'formatted'* within *'EmpaticaE4'* for each participant.
```bash
├── participant_1
│   └── EmpaticaE4
│       ├── *.csv
│       └── Formatted
│           └── *_fmtd.csv
├── participant_2
...
```


## User Inputs
------
#### Overview of parameters
```python
participant = 'unique participant/subject identifier number' # 1, 2, 3, ..., n
wearable = 'identifies type of wearable device' # "EmpaticaE4", "AppleWatch", etc.
label = 'user-defined label'
startDate, endDate = 'yyyy-mm-dd hh:mm:ss', 'yyyy-mm-dd hh:mm:ss' # filters data between these dates
filesource = 'folder containing csv files' # e.g. /participant_1/Empatica
csvRW = 'read/write permission' # set to 'w' to write dataframes to csv files in separate folder
```

In [1]:
participant = 'n'
wearable = 'EmpaticaE4'
label = 'trial_1'
startDate, endDate = '2019-07-23 16:19:45', '2019-07-23 16:19:49'
filesource = ''
csvRW = 'w'

## Required libraries
------

In [2]:
import csv
import datetime
from collections import OrderedDict
import pandas as pd
import glob
import os

## Optional Write to CSV Function
------
writetoscv() - writes dataframe to .csv file

In [3]:
def writetocsv(dframe, idd, wearable, dtype):
    outfolder = filesource + 'participant_' + idd + '/' + wearable +'/' + 'Formatted/'
    if not os.path.exists(outfolder):
        os.mkdir(outfolder)
        
    out_filename = (outfolder + dtype + '_fmtd.csv')
    dframe.to_csv(out_filename, mode='w', header=True)

## Import and Format
------
#### Functions 
* readFile() - reads file into dictionary (and corrects for time zone)
* formatFile() - formats into dataframe with time as timestamp using datetime (ISO8601), formats sensor values to float 
* importandexport() - finds all files of sensor type in participant folder and runs formatfile for each input file
* processAcceleration() - converts 3 axes to float values

In [4]:
def processAcceleration(x,y,z):
    x = float(x)
    y = float(y)
    z = float(z) 
    return {'x':x,'y':y,'z':z}

In [5]:
def readFile(file, dtype, utc):
    dict = OrderedDict()

    with open(file, 'rt') as csvfile:
        if dtype in ('EDA','TEMP','HR','BVP'):
            reader = csv.reader(csvfile, delimiter='\n')
        elif dtype == 'ACC':
            reader = csv.reader(csvfile, delimiter=',') 
        # add IBI
        
        i=0
        for row in reader:
            if i==0:
                timestamp=float(row[0])+3600*utc # TODO: time zone correction (plusminus utc*3600)
            elif i==1:
                hertz = float(row[0])
            else:
                #if i!=2:
                timestamp = timestamp + 1.0/hertz
                if dtype in ('EDA','TEMP','HR','BVP'):
                    dict[timestamp]=row[0]
                elif dtype=='ACC':
                    dict[timestamp]= processAcceleration(row[0],row[1],row[2])
            i += 1
    return dict

In [7]:
def formatfile(file, idd, wearable, dtype, startDate, endDate, csvRW, utc):
    dfile = readFile(file=file, dtype=dtype, utc=utc)
    dfile =  {datetime.datetime.utcfromtimestamp(k).strftime('%Y-%m-%d %H:%M:%S.%f'): v for k, v in dfile.items()}
    if dtype in ('EDA','TEMP','HR','BVP'):
        dframe = pd.DataFrame.from_dict(dfile, orient='index', columns=[dtype])
        dframe[dtype] = dframe[dtype].astype(float)
    elif dtype=='ACC':
        dframe = pd.DataFrame.from_dict(dfile, orient='index', columns=['x','y','z'])
        dframe['x'] = dframe['x'].astype(float)
        dframe['y'] = dframe['y'].astype(float)
        dframe['z'] = dframe['z'].astype(float)
    
    dframe['Datetime'] =dframe.index
    dframe['Datetime'] = pd.to_datetime(dframe['Datetime'], format='%Y-%m-%dT%H:%M:%S.%f')
    dframe  = dframe.set_index('Datetime')
    
    if startDate:
        dframe = dframe.loc[startDate:] 
    if endDate:
        dframe = dframe.loc[:endDate]
                
    if csvRW=='w':
        writetocsv(dframe, idd, wearable, dtype)
        
    return dframe

In [8]:
def importandexport(filesource, idd, wearable, dtype, startDate, endDate, csvRW, utc):
    configfiles = glob.glob(filesource + 'participant_' + participant +'/' + wearable +'/' + dtype + '.csv')
    print(configfiles)
    
    [formatfile(file, idd, wearable, dtype, startDate, endDate, csvRW, utc) for file in configfiles]
    print(('Completed Import and Export of: ' + dtype))

In [10]:
listdtype = ['EDA','TEMP','HR','BVP']
utc=0
[importandexport(filesource, participant, wearable, dtype, startDate, endDate, csvRW, utc) for dtype in listdtype]

utc=-4
startDate, endDate = '2019-07-23 12:31:31', '2019-07-23 12:31:35'
importandexport(filesource, participant, wearable, 'ACC', startDate, endDate, csvRW, utc)

['participant_n/EmpaticaE4/EDA.csv']
Completed Import and Export of: EDA
['participant_n/EmpaticaE4/TEMP.csv']
Completed Import and Export of: TEMP
['participant_n/EmpaticaE4/HR.csv']
Completed Import and Export of: HR
['participant_n/EmpaticaE4/BVP.csv']
Completed Import and Export of: BVP
['participant_n/EmpaticaE4/ACC.csv']
Completed Import and Export of: ACC


## Import and Format ACC
------
#### Functions
* processAcceleration() - converts 3 axis to float values
* readAccFile() - reads file into dictionary and corrects for time zone
* formatAccFile() - formats into dataframe with time as timestamp using datetime (ISO8601), formats sensor values to float, writes to .csv
* importandexport() - finds all files of sensor type 'ACC' in participant folder and runs formatfile for each input file

In [14]:
def readAccFile(file):
    dict = OrderedDict()
    
    with open(file, 'rt') as csvfile:
        reader = csv.reader(csvfile, delimiter=',')
        i=0
        for row in reader:
            if i == 0:
                timestamp = float(row[0])-3600*4 # TODO: time zone correction
            elif i == 1:
                hertz=float(row[0])
            elif i == 2:
                dict[timestamp]= processAcceleration(row[0],row[1],row[2])
            else:
                timestamp = timestamp + 1.0/hertz 
                dict[timestamp] = processAcceleration(row[0],row[1],row[2])
            i += 1
        return dict

In [15]:
def formatAccfile(file, idd, wearable, dtype, startDate, endDate, csvRW):
    dfile = readAccFile(file = file)
    dfile =  {datetime.datetime.utcfromtimestamp(k).strftime('%Y-%m-%d %H:%M:%S.%f'): v for k, v in dfile.items()}
    dframe = pd.DataFrame.from_dict(dfile, orient='index', columns=['x', 'y', 'z'])
    
    dframe['x'] = dframe['x'].astype(float)
    dframe['y'] = dframe['y'].astype(float)
    dframe['z'] = dframe['z'].astype(float)
    
    dframe['Datetime'] =dframe.index
    dframe['Datetime'] = pd.to_datetime(dframe['Datetime'], format='%Y-%m-%dT%H:%M:%S.%f')
    dframe  = dframe.set_index('Datetime')
    if startDate:
        dframe=dframe.loc[startDate:]
    if endDate:
        dframe=dframe.loc[:endDate]

    if csvRW=='w':
        writetocsv(dframe, idd, wearable, dtype)
        
    return dframe

In [16]:
def importandexportAcc(filesource, idd, wearable, dtype, startDate, endDate, csvRW):
    configfiles = glob.glob(filesource + 'participant_' + idd +'/' + wearable +'/' + dtype + '.csv')
    print(configfiles)
    
    [formatAccfile(file, idd, wearable, dtype, startDate, endDate, csvRW) for file in configfiles]
    print(('Completed Import and Export of: ' + dtype))

In [17]:
startDate, endDate = '2019-07-23 12:31:31', '2019-07-23 12:35:35'
importandexportAcc(filesource, participant, wearable, 'ACC', startDate=False, endDate=False, csvRW=csvRW)

['participant_n/EmpaticaE4/ACC.csv']
Completed Import and Export of: ACC


## Import and Format IBI
------
#### Functions
* importIBI() - reads file into dataframe and corrects for time zone, formats time as timestamp using datetime (ISO8601), formats sensor values to float, writes to .csv
* importandexportIBI() - finds all files of sensor type 'IBI' in participant folder and runs importIBI() for each input file

In [None]:
def importIBI(file, idd, wearable, dtype, startDate, endDate, csvRW):
    IBI = pd.read_csv(file, header=None)
    timestampstart = float(IBI[0][0])-3600*4
    IBI[0] = (IBI[0][1:len(IBI)]).astype(float)+timestampstart
    IBI = IBI.drop([0])
    IBI[0] = IBI[0].apply(lambda x: datetime.datetime.utcfromtimestamp(x).strftime('%Y-%m-%d %H:%M:%S.%f'))
    IBI  = IBI.set_index(0)
    
    if csvRW=='w':
        writetocsv(IBI, idd, wearable, dtype, startDate, endDate)
    
    return IBI

In [None]:
def importandexportIBI(filesource, idd, wearable, dtype, startDate, endDate, csvRW):
    configfiles = glob.glob(filesource + 'participant_' + idd +'/' + wearable +'/' + dtype + '.csv')
    print(configfiles)
    
    [importIBI(file, idd, wearable, dtype, startDate, endDate, csvRW) for file in configfiles]
    print(('Completed Import and Export of: ' + dtype))

In [None]:
startDate, endDate = '2019-07-23 12:20:42', '2019-07-23 12:22:15' # TODO select with times not by the second
importandexportIBI(filesource, participant, wearable, 'IBI', startDate, endDate, csvRW) 

***
#### Sources
- [DBDP - Preprocessing - E4 File Formatter](https://github.com/DigitalBiomarkerDiscoveryPipeline/Pre-process)
- [Empatica Timestamp Explanation](https://support.empatica.com/hc/en-us/articles/202800715-Session-start-time-format-and-synchronization-)
- [E4 File Sync Python Script on GitHub](https://github.com/Ev4ngelos/EmpaticaBiophysicalSync/blob/master/E4BioSync.py)