<table width="100%">
    <tr style="border-bottom:solid 2pt #009EE3">
        <td class="header_buttons">
            <a href="FILENAME" download><img src="../../images/icons/download.png" alt="biosignalsnotebooks | download button"></a>
        </td>
        <td class="header_buttons">
            <a href="SOURCE" target="_blank"><img src="../../images/icons/program.png" alt="biosignalsnotebooks | binder server" title="Be creative and test your solutions !"></a>
        </td>
        <td></td>
        <td class="header_icons">
            <a href="../MainFiles/biosignalsnotebooks.ipynb"><img src="../../images/icons/home.png" alt="biosignalsnotebooks | home button"></a>
        </td>
        <td class="header_icons">
            <a href="../MainFiles/contacts.ipynb"><img src="../../images/icons/contacts.png" alt="biosignalsnotebooks | contacts button"></a>
        </td>
        <td class="header_icons">
            <a href="https://github.com/biosignalsplux/biosignalsnotebooks" target="_blank"><img src="../../images/icons/github.png" alt="biosignalsnotebooks | github button"></a>
        </td>
        <td class="header_logo">
            <img src="../../images/ost_logo.png" alt="biosignalsnotebooks | project logo">
        </td>
    </tr>
</table>

<link rel="stylesheet" href="../../styles/theme_style.css">
<!--link rel="stylesheet" href="../../styles/header_style.css"-->
<link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">

<table width="100%">
    <tr>
        <td id="image_td" width="15%" class="header_image_color_1"><div id="image_img"
        class="header_image_15"></div></td>
        <td class="header_text"> Write data from multiple Android sensor files into one file</td>
    </tr>
</table>

<div id="flex-container">
    <div id="diff_level" class="flex-item">
        <strong>Difficulty Level:</strong>   <span class="fa fa-star checked"></span>
                                <span class="fa fa-star checked"></span>
                                <span class="fa fa-star checked"></span>
                                <span class="fa fa-star checked"></span>
                                <span class="fa fa-star"></span>
    </div>
    <div id="tag" class="flex-item-tag">
        <span id="tag_list">
            <table id="tag_list_table">
                <tr>
                    <td class="shield_left">Tags</td>
                    <td class="shield_right" id="tags">Android&#9729;OpenSignals mobile&#9729;File handling</td>
                </tr>
            </table>
        </span>
        <!-- [OR] Visit https://img.shields.io in order to create a tag badge-->
    </div>
</div>

The <strong><span class="color2">OpenSignals mobile application</span></strong> allows to acquire data from the sensors that are built into the hardware of an Android smartphone. When acquiring data from multiple Android sensors, the data for each is saved into an individual .txt file. In order to properly put the data of the sensors into context, the files need to be synchronized. In this <strong><span class="color4">Jupyter notebook</span></strong> we will have a look at how to synchronize the sensors and how to write them all into a single .txt file.

For this <strong><span class="color4">notebook</span></strong> we will synchronize the data from five sensors: The accelerometer, GPS, light, proximity, and significant motion. This is done because these sensors represent a wide range of different acquisition types. However, the procedures shown here can be also applied to as many android sensors as you record using the <strong><span class="color2">OpenSignals mobile application</span></strong>.<br> 
As part of this <strong><span class="color4">notebook</span></strong> we will guide you through all essential steps for synchronizing the android sensors. In the last section we will present a new function integrated into our <strong><span class="color2">biosiganlsnotebooks</span></strong> package that conveniently handles all these steps for you. 

In case this is your first time working with Android sensors, we highly recommend reading the <a href=https://biosignalsplux.com/learn/notebooks.html>notebook (correct link still missing) <img src="../../images/icons/link.png" width="10px" height="10px" style="display:inline"></a> with general information on Android sensors.

<hr>

<p class="steps">1 - Package imports</p>

First, lets import some useful libraries that will be used for visualization and data processing purposes.

In [5]:
# biosignalsnotebooks package
import biosignalsnotebooks as bsnb

# package in order to load .txt files
from numpy import loadtxt
import numpy as np

# package for converting the header into a dictonary
import json

# package for accessing the files
import os

# package for handling warnings
import warnings

<p class="steps">2 - Creating a new header</p>

Before we start synchronizing our data we will create a new header that will be written to the file in which we are going to store all our data. Since we want to be able to open this file within the <strong><span class="color2">OpenSignals</span></strong> application, we need to abide by the <strong><span class="color2">OpenSignals</span></strong> header format. In order to make things as easy as possible, we are simply going to copy the header from one of the sensors and edit the fields that need to be changed.

The fields that we need to edit are:
<ul>
    <li>"sensor"</li><br>
    <li>"column"</li><br>
    <li>"channels"</li><br>
    <li>"label"</li><br>
    <li>"resolution"</li><br>
    <li>"special"</li><br>
    <li>"sleeve color"</li>
</ul>

We will extend these fields with the information from the other sensors. With the exception of the "channels" field, the extension is straight forward. The "channels" field will be extended in such a way that the number of channels increases monotonically with each new sensor added.

Since we will probably have to use this whole procedure more than once, we will encapsulate it into a function that will take a list of paths that point to the files as input. The function will return the new header as a String.

In [6]:
def create_sync_header(in_path):
    
    # variable for the header
    header = None

    # cycle through the file list
    for i, file in enumerate(in_path):

        # check if it is the first file entry
        if(i == 0):
        
            # open the file 
            with open(file, encoding='latin-1') as opened_file:
                # read the information from the header lines (omitting the begin and end tags of the header)
                header_string = opened_file.readlines()[1][2:] # omit "# " at the beginning of the sensor infromation
            
                # convert the header into a dict
                header = json.loads(header_string)
        
        else:
        
            # open the file
            with open(file, encoding='latin-1') as opened_file:
                header_string = opened_file.readlines()[1][2:]
            
                # convert header into a dict
                curr_header = json.loads(header_string)
            
                # add the fields from the header of the current file
                header['internal sensors']['sensor'].extend(curr_header['internal sensors']['sensor']) # sensor field
                header['internal sensors']['column'].extend(curr_header['internal sensors']['column'][1:]) # column field
            
                # get the last channel from the channel field
                num_channels = header['internal sensors']['channels'][-1]
            
                # get the channels from the current sensor
                new_channels = curr_header['internal sensors']['channels']
            
                # adjust the channel number 
                new_channels = [ch + (num_channels +1) for ch in new_channels]
            
                header['internal sensors']['channels'].extend(new_channels) # channels field
                header['internal sensors']['label'].extend(curr_header['internal sensors']['label']) # label field            
                header['internal sensors']['resolution'].extend(curr_header['internal sensors']['resolution']) # resolution field
                header['internal sensors']['special'].extend(curr_header['internal sensors']['special']) # special field
                header['internal sensors']['sleeve color'].extend(curr_header['internal sensors']['sleeve color']) # sleeve color field
            

    # create new header string
    header_string = "# OpenSignals Text File Format\n# " + json.dumps(header) + '\n# EndOfHeader\n'
    
    return header_string
    

Let's use our created function and see how the edited header looks like. For simplicity we added all our files into one folder, thus making the process of creating a file list much more straight forward.

As we will see, all the information that we need is added to the header. The order in which the sensors are added to the header depends on how the files are ordered in the file list that we pass to the function.

In [7]:
# set file path
path = '../../images/other/android_file_sync/'

# get a list with all the files within that folder
file_list = os.listdir(path)

# make full path for each file
file_list = [path + file for file in file_list]

# create header
header = create_sync_header(in_path=file_list)

# print the edited header string
print(header)

# OpenSignals Text File Format
# {"internal sensors": {"sensor": ["xAcc", "yAcc", "zAcc", "Latitude", "Longitude", "Altitude", "Light", "distance", "SigMotion"], "device name": "internal sensors", "column": ["nSeq", "xAcc", "yAcc", "zAcc", "Latitude", "Longitude", "Altitude", "Light", "distance", "SigMotion"], "sync interval": 2, "time": "14:55:57", "comments": "", "keywords": "", "device connection": "UNKNOWNinternal sensors", "channels": [0, 1, 2, 3, 4, 5, 6, 7, 8], "date": "2020-07-17", "mode": 0, "digital IO": [], "firmware version": 0, "device": "android", "position": 0, "sampling rate": 100, "label": ["xAcc", "yAcc", "zAcc", "Latitude", "Longitude", "Altitude", "Light", "distance", "SigMotion"], "resolution": [1, 1, 1, 1, 1, 1, 1, 1, 1], "special": [{}, {}, {}, {}, {}, {}, {}, {}, {}], "sleeve color": ["UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN", "UNKNOWN"]}}
# EndOfHeader



<p class="steps">3 - Loading the data and gathering information on the signals</p>

Next, we will load the sensor data and gather some useful information about the signals we are going to synchronize. We will do this in order to able to make a more sophisticated decision on how to exactly synchronize our signals.

Again, we will write a function that will take a list of paths as input. Additionally, a flag can be set that allows for showing a short report of the data. The function will return the sensor data in a list and a dictionary with the following information on the signals:
<ul>
    <li><strong>names:</strong> The names of the sensors.</li><br>
    <li><strong>number of samples:</strong> The number of samples each sensor recorded.</li><br>
    <li><strong>starting times:</strong> The timestamps when the sensors started recording.</li><br>
    <li><strong>stopping times:</strong> The timestamps when the sensors stopped recording.</li><br>
    <li><strong>avg. sampling rates:</strong> The average sampling rate of each sensor (*).</li><br>
    <li><strong>min:</strong> The minimum sampling rate.</li><br>
    <li><strong>max:</strong> The maximum sampling rate.</li><br>
    <li><strong>mean:</strong> The mean of the sampling rates.</li><br>
    <li><strong>std:</strong> The standard deviation of the sampling rates.</li><br>
    <li><strong>starting order:</strong> Order in which the sensors started recording, from first to last.</li><br>
    <li><strong>stopping order:</strong> Order in which the sensors stopped recording, from first to last.</li>
</ul>

(*) in case you are wondering why an average sampling rate is displayed, then have a look at this <a href=https://biosignalsplux.com/learn/notebooks.html>notebook (correct link still missing) <img src="../../images/icons/link.png" width="10px" height="10px" style="display:inline"></a> about re-sampling signals recorded with Android sensors.

In [8]:
def load_android_data(in_path, print_report=True):
    
    # list for holding the data all sensors
    sensor_data = []
    
    # list for holding sensor names
    names = []
    
    # list for holding the number of samples each sensor recorded
    num_samples = []
    
    # list for holding average sampling rates
    avg_sampling_rates = []
    
    # list for holding start times
    start_times = []
    
    # list for holding stop times
    stop_times = []
    
    # cycle over the files
    for file in in_path:
        
        # supress loadtxt warning that is thrown when there is no sensor data present in the file
        with warnings.catch_warnings():
            warnings.simplefilter("ignore")              
            # load the data in the file
            data = loadtxt(file)
        
        # check if data array has values (the signifcant motion sensor might not return any data if no significant motion was deteced)
        # the sensor is only added to the report if it has at least one sampled data point
        if(data.size):
            
                
            # get the sensor data
            sensor_data.append(data)
            
            # get the dimensionality of the data 
            if(data.ndim == 1): # 1D array, this means that the sensor only sampled a single point
                
                # get the time axis
                time_axis = data[:1]
                
                # set the sampling rate to zero because only one sample was acquired by the sensor
                avg_sampling_rates.append(0)
                
            else: # multi-dimensional

                # get the times axis
                time_axis = data[:,0]
        
                # calculate the distance between sampling points
                # data[:,0] is the time axis
                sample_dist = np.diff(time_axis)

                # calculate the mean distance
                mean_dist = np.mean(sample_dist)

                # calculate the sampling rate and add it to the list
                # 1e9 is used because the time axis is in nanoseconds
                avg_sampling_rates.append(1e9/mean_dist)
        
            # get the number of samples
            num_samples.append(time_axis.size)
            
            # get the start time of the signal
            start_times.append(time_axis[0])
        
            # get the stop time of the signal
            stop_times.append(time_axis[-1])
            
            # open the file and retreive information from the header
            with open(file, encoding='latin-1') as opened_file:
                # read the information from the header lines (omitting the begin and end tags of the header)
                header_string = opened_file.readlines()[1][2:] # omit "# " at the beginning of the sensor infromation
            
                # convert the header into a dict
                header = json.loads(header_string)
            
                # add the name of the sensor to the list
                name = header['internal sensors']['sensor'][0]
            
                # remove the x from the name (in case it is a 3-axis sensor)
                if(name.startswith('x')): name = name[1:]
                    
                # check if it is the GPS sensor (the first data comlumn of the GPS sensor is 'Latitude')
                # and change the name accordingly
                if(name.startswith('Lat')): name = 'GPS'
                    
                # check if it is the proximity sensor (its data column is called 'distance')
                if(name.startswith('dist')): name = 'Proximity'
                
                # add the name to the list
                names.append(name)
        
    
    # calculate max, min, mean and std
    max_sample  = np.max(avg_sampling_rates)
    min_sample  = np.min(avg_sampling_rates)
    mean        = np.mean(avg_sampling_rates)
    std         = np.std(avg_sampling_rates)
    
    # get the starting order of the sensors
    starting_order = [name for (_,name) in sorted(zip(start_times, names))]
    
    # get the stopping order of the sensors
    stopping_order = [name for (_,name) in sorted(zip(stop_times, names))]
    
    # create dictionary
    report={
        'names': names,
        'number of samples': num_samples,
        'starting times': start_times,
        'stopping times': stop_times,
        'avg. sampling rates': avg_sampling_rates,
        'min. sampling rate': min_sample,
        'max. sampling rate': max_sample,
        'mean sampling rate': mean,
        'std. sampling rate': std,
        'starting order': starting_order,
        'stopping order': stopping_order,
    }
    
    # print a report if the user indicates to do so
    if(print_report): [print('{}: {}'.format(key, value)) for key, value in report.items()]
    
    return sensor_data, report

Now, we will test our function an see what it will report about the signals we recorded.

We will see that the accelerometer samples at the highest rate, while the GPS, light, and the proximity sensor sample at lower rates. Furthermore, we see that the significant motion sensor detected only one motion that it labeled as significant. Thus, its sampling rate is set to zero. The report also shows that the proximity sensor was the first to start recording, while the significant motion sensor was the last to start recording. The accelerometer was the last to stop recording and the significant motion sensor the first to stop recording.

In [9]:
sensor_data, report = load_android_data(file_list)

names: ['Acc', 'GPS', 'Light', 'Proximity', 'SigMotion']
number of samples: [17661, 132, 178, 18, 1]
starting times: [188488175020937.0, 188490065043301.0, 188488121593000.0, 188488102522000.0, 188502049742000.0]
stopping times: [188669138104000.0, 188668271596298.0, 188668132960000.0, 188636399533000.0, 188502049742000.0]
avg. sampling rates: [97.58896511423767, 0.7351020363555616, 0.9832712397545429, 0.11463481216084659, 0]
min. sampling rate: 0.0
max. sampling rate: 97.58896511423767
mean sampling rate: 19.88439464050172
std. sampling rate: 38.85403633973379
starting order: ['Proximity', 'Light', 'Acc', 'GPS', 'SigMotion']
stopping order: ['SigMotion', 'Proximity', 'Light', 'GPS', 'Acc']


<p class="steps">4 - Padding all signals to the same length</p>

Since the signals are going to be synchronized and written into the same file, all signals have to be of the same length. However, depending on what parts of the signals we want to include into our synchronized file, the length to which all signals should be padded to varies. In order to make this a little bit more clear, we will have a look at two toy signals. For these two signals, there are four possible ways on how to decide which parts these signals to include into the synchronization. The graphs are shown below.

In [10]:
# imports for bokeh plotting
from bokeh.layouts import gridplot
from bokeh.plotting import figure, show

# define signals
t1 = np.arange(0, 21, 2)
t2 = np.arange(3, 22, 3)

x1 = [0.5, 2, 1, 3, 2.5, 2, 2.5, 1.5, 1.5, 3, 2]
x2 = [0.5, 2, 1.5, 3.5, 1, 2, 0.5]

# define color and alph
c = 'white'
alpha = 0.7

# ------ figure 1 -------
p1 = figure()
# rectangle for highlighting
p1.rect(10.5, 2, 21, 3, color=c, fill_alpha = alpha)

# x1
p1.line(t1, x1, color="blue", line_width=1, legend_label='x_1') # draw lines
p1.circle(t1, x1, color="blue", size=10) # draw circles
# x2
p1.line(t2, x2, color="red", line_width=1, legend_label='x_2') # draw lines
p1.circle(t2, x2, color="red", size=10) # draw circles
p1.xaxis.axis_label = 'time (s)'
p1.title.text = 'Entire recording time'
p1.title.align = 'center'
p1.title.text_font_size = "15px"
p1.legend.location = "top_left"
bsnb.opensignals_style([p1]) 
#show(p1)

# ------ figure 2 ------
p2 = figure()
# lines for indicating sart and stop
# rectangle for highlighting
p2.rect(10, 2, 20, 3, color=c, fill_alpha = alpha)

# x1
p2.line(t1, x1, color="blue", line_width=1, legend_label='x_1') # draw lines
p2.circle(t1, x1, color="blue", size=10) # draw circles
# x2
p2.line(t2, x2, color="red", line_width=1, legend_label='x_2') # draw lines
p2.circle(t2, x2, color="red", size=10) # draw circles
p2.xaxis.axis_label = 'time (s)'
p2.title.text = 'Recording time of x_1'
p2.title.align = 'center'
p2.title.text_font_size = "15px"
p2.legend.location = "top_left"
bsnb.opensignals_style([p2]) 
#show(p2)

# ------ figure 3 -------
p3 = figure()
# lines for indicating sart and stop
# rectangle for highlighting
p3.rect(12, 2, 18, 3, color=c, fill_alpha = alpha)

# x1
p3.line(t1, x1, color="blue", line_width=1, legend_label='x_1') # draw lines
p3.circle(t1, x1, color="blue", size=10) # draw circles
# x2
p3.line(t2, x2, color="red", line_width=1, legend_label='x_2') # draw lines
p3.circle(t2, x2, color="red", size=10) # draw circles
p3.xaxis.axis_label = 'time (s)'
p3.title.text = 'Recording time of x_2'
p3.title.align = 'center'
p3.title.text_font_size = "15px"
p3.legend.location = "top_left"
bsnb.opensignals_style([p3]) 
#show(p3)

# ------ figure 4 -------
p4 = figure()
# lines for indicating sart and stop
# rectangle for highlighting
p4.rect(11.5, 2, 17, 3, color=c, fill_alpha = alpha)

# x1
p4.line(t1, x1, color="blue", line_width=1, legend_label='x_1') # draw lines
p4.circle(t1, x1, color="blue", size=10) # draw circles
# x2
p4.line(t2, x2, color="red", line_width=1, legend_label='x_2') # draw lines
p4.circle(t2, x2, color="red", size=10) # draw circles
p4.xaxis.axis_label = 'time (s)'
p4.title.text = 'x_1 and x_2 recording at the same time'
p4.title.align = 'center'
p4.title.text_font_size = "15px"
p4.legend.location = "top_left"
bsnb.opensignals_style([p4]) 
#show(p4)

# ------ Grid plot -------
grid = gridplot([[p1, p2], [p3, p4]])
grid.sizing_mode = 'scale_width'
#bsnb.opensignals_style([grid]) 

show(grid)

In order to have the freedom to do any possible option, we will write a function that allows for setting when to start and when to end the synchronization. This of course means that all signals will be cropped to the defined start and ending points. To keep it intuitive as possible the start and ending points will be defined by the sensor names, thus we can easily choose from our set of recorded signals. Additionally, we will give the possibility to set the type of padding to be used. There will be only some exceptions because an arbitrary padding doesn't make sense for all sensor types. The GPS will always use a padding of type 'same', thus mimicking that the phone is at fixed location. The significant motion sensor will always be padded with zeros.

The function will have the following inputs:
 <ul>
  <li><strong>sensor_data (list):</strong> A list containing the data of the sensors to be synchronized.</li><br>
    
  <li><strong>report (dict):</strong> The report returned by the 'load_android_data' function.</li><br>
    
  <li><strong>start (string, optional):</strong> The sensor that indicates that indicates when the synchronization should be started.If not specified the sensor that started latest is chosen. </li><br>
    
  <li><strong>stop (string, optional):</strong> The sensor that indicates when the synchronizing should be stopped.If not specified the sensor that stopped earliest is chosen </li><br>
    
  <li><strong>padding_type (string, optional):</strong> The padding type used for padding the signal. Options are either 'same' or 'zero'. If not specified, 'same' is used. </li>
</ul> 

The function will return the padded sensor data.

In [11]:
def pad_data(sensor_data, report, start=None, end=None, padding_type='same'):
    
    # list for holding the padded data
    padded_sensor_data = []
    
    # get the index of the sensor used for padding in the start (ssi = start sensor index)
    # if none is provided (start == None) then the latest starting sensor is used
    if(start == None):
        ssi = report['starting times'].index(max(report['starting times']))
    
    else:
        ssi = report['names'].index(start)
                                                                            
    # get the index of the sensor used for padding in the end (esi = end sensor index)
    # if none is provided (end == None) then the sensor that stopped earliest is used
    if (end == None):
        esi = report['stopping times'].index(min(report['stopping times']))
    
    else:
        esi = report['names'].index(end)

    # check if the starting and stopping times are equal (this can be the case when a significant motion sensor is used and 
    # only one significant motion was detected by the sensor)
    # in that case we use the next sensor that stopped recording the earliest
    if(report['starting times'][esi] == report['stopping times'][ssi]):
        print('Warning: Start and end at same time...using next sensor that stopped earliest instead')
        esi = report['stopping times'].index(np.sort(report['stopping times'])[1])
    
    # get the starting value
    start_time = report['starting times'][ssi]
    
    # get the stopping value
    end_time = report['stopping times'][esi]
 
    # get time axis of the starting sensor and check for dimensionality of the data 
    time_axis_start = sensor_data[ssi][:1] if (sensor_data[ssi].ndim == 1) else sensor_data[ssi][:,0]
    
    # get the time axis of th ending sensor and check for dimensionality of the data
    time_axis_end = sensor_data[esi][:1] if (sensor_data[esi].ndim == 1) else sensor_data[esi][:,0]
    
    # start padding: for loop over names (enumerated)
    for i, name in enumerate(report['names']):
        
        # get the data of the current signal
        data = sensor_data[i] 
        
        # check for the dimensionality of the signal data (handling for significant motion sensor)
        if(data.ndim == 1): # 1D array
            
            # get the time axis
            time_axis = data[:1]
            
            # get the signal data
            signals = data[1:]
            
            # expand the dimensionality of the data (in order to have the same dimensionality as all other data)
            signals = np.expand_dims(signals, axis=1)
            
        else: # mutlidimensional array
            
            # get the time_axis
            time_axis = data[:,0]
            
            # get the signal data
            signals = data[:, 1:]
        
        # --- 1.) padding at the beginnging ---
        if(start_time > time_axis[0]): # start_time after current signal start (cropping of the signal needed)
            
            # get the time_axis size before cropping
            orig_size = time_axis.size
            
            # crop the time axis
            time_axis = time_axis[time_axis >= start_time]
            
            # crop the signal data
            signals = signals[(orig_size-time_axis.size):, :]
            
        # get the values that need to be padded to the current time axis
        start_pad = time_axis_start[time_axis_start < time_axis[0]]
        
        # --- 2.) padding at the end ---
        if(end_time < time_axis[-1]): # end_time before current signal end (cropping of the signal needed
            
            # crop the time axis
            time_axis = time_axis[time_axis <= end_time]
            
            # check if cropping leads to elimination of signal
            if(time_axis.size == 0):
                
                raise IOError('The configuration you chose led to elimination of the {} sensor. Please choose another sensor for paremeter \'end\'.'.format(name))
            
            # crop the signal data
            signals = signals[:time_axis.size, :]
            
        # get the values that need to be padded to the current time axis
        end_pad = time_axis_end[time_axis_end > time_axis[-1]]
        
        # pad the time axis
        time_axis = np.concatenate((start_pad, time_axis, end_pad))
        
        # for holing the new padded data
        padded_data = time_axis
        
        # cycle over the signal channels
        for channel in np.arange(signals.shape[1]):
            
            # get the signal channel
            sig_channel = signals[:, channel]
            
            # check for the sensor
            if(name == 'GPS'): # gps sensor (always use padding type 'same')
                
                # pad the channel
                sig_channel = np.pad(sig_channel, (start_pad.size, end_pad.size), 'edge')

            elif(name == 'SigMotion'): # significant motion sensor (always padd zeros)
                
                # pad the channel
                sig_channel = np.pad(sig_channel, (start_pad.size, end_pad.size), 'constant', constant_values=(0, 0))
                   
            else: # all other sensors
                
                # check for setting of the user
                if(padding_type == 'same'):
                    
                    # pad the channel
                    sig_channel = np.pad(sig_channel, (start_pad.size, end_pad.size), 'edge')
                    
                elif(padding_type == 'zeros'):
                    
                    # pad the channel
                    sig_channel = np.pad(sig_channel, (start_pad.size, end_pad.size), 'constant', constant_values=(0, 0))
                    
                
            # concatenate the channel to the padded data
            padded_data = np.vstack((padded_data, sig_channel))
            
        # append the data to the padded_sensor_data list
        # the data is transposed in order to have the correct shape (samples x number of channels)
        padded_sensor_data.append(padded_data.T)
    
    return padded_sensor_data
        

For the purpose of this <strong><span class="color4">notebook</span></strong> we will be using the entire recording time and we will pad using the padding type 'same'.

In [28]:
padded_sensor_data = pad_data(sensor_data, report, start='Proximity', end='Acc', padding_type='same')

<p class="steps">5 - Re-sampling all signals to the same sampling rate</p>

In this next step, we will re-sample all our signals to the same sampling rate. This has to be done in order to ensure that all signal columns are of the same length. We will be using the function we developed in the <a href=https://biosignalsplux.com/learn/notebooks.html>notebook (correct link still missing) <img src="../../images/icons/link.png" width="10px" height="10px" style="display:inline"></a> on re-sampling signals recorded with Android sensors.

For each sensor we are going to re-sample the data to a sampling rate of 100 Hz, which would be the approximate sampling rate of the accelerometer according to the report we generated. In addition to that we will shift the time axis to start at zero and display it in seconds and use the padding type 'previous'.

In [13]:
# @Guilherme: This part is a little bit redundant. 
#I will change this after I integrated my functions into to bsnb package. 
#In that case I won't have to show the entire function again...for the moment I am hiding it.

import scipy as scp
import numpy as np

# define function for interpolating all signal columns
def re_sample_data(time, data, start=0, stop=-1, shift_time_axis=False, sampling_rate=None, kind_interp='linear'):
    
    # crop the data and time to specified start and stop values
    if(start != 0 or stop !=-1):
        time = time[start:stop]
        
        # check for dimensionality of the data
        if(data.ndim == 1): # 1D array
            
            data = data[start:stop]
            
        else: # multidimensional array
            
            data = data[start:stop, :]
        
    # get the original time origin
    time_origin = time[0]

    # shift time axis (shifting is done in order to simplify the calculations)
    time = time - time_origin
    time = time * 1e-9
    
    # calculate the approximate sampling rate and round it to the next tens digit
    if(sampling_rate == None):
        # calculate the distance between sampling points
        sample_dist = np.diff(time)

        # calculate the mean distance
        mean_dist = np.mean(sample_dist)
        
        # calculate the sampling rate
        sampling_rate = 1/mean_dist
        
        # round it to the next tens digiit
        sampling_rate = int(np.ceil(sampling_rate / 10.0)) * 10
    
    # create new time axis
    time_inter = np.arange(time[0], time[-1], 1/sampling_rate)
    
    # check for the dimensionality of the data array.
    if(data.ndim ==1): # 1D array
        
        # create the interpolation function
        inter_func = scp.interpolate.interp1d(time, data, kind=kind_interp)
        
        # calculate the interpolated column and save it to the correct column of the data_inter array
        data_inter = inter_func(time_inter)
    
    else: # multidimensional array
        
        # create dummy array
        data_inter = np.zeros([time_inter.shape[0], data.shape[1]])
    
        # cycle over the columns of data
        for col in range(data.shape[1]):
        
            # create the interpolation function
            inter_func = scp.interpolate.interp1d(time, data[:,col], kind=kind_interp)
        
            # calculate the interpolated column and save it to the correct column of the data_inter array
            data_inter[:,col] = inter_func(time_inter)
    
    # check if time is not suppossed to be shifted
    if(not shift_time_axis):
        
        # shift back
        time_inter = time_inter * 1e9
        time_inter = time_inter + time_origin
    
    
    # return the interpolated time axis and data
    return time_inter, data_inter, sampling_rate

In [29]:
# list for holding the re-sampled data
re_sampled_data = []
    
# list for holding the time axes of each sensor
re_sampled_time = []

# cycle over the sig
for data in padded_sensor_data:
    
    # resample the data ('_' suppreses the output for the sampling rate)
    re_time, re_data, _ = re_sample_data(data[:,0], data[:,1:], shift_time_axis=True, sampling_rate=100, kind_interp='previous')
    
    # add the the time and data to the lists
    re_sampled_time.append(re_time)
    re_sampled_data.append(re_data)
    

Since we re-sampled all of our signals to the same sampling rate, all time axis should be equal and the data of each sensor should be of the same length. We can easily check this by doing the following.

In [30]:
print('checking for number of samples in each sensor')
# cycle through the data list
for i,data in enumerate(re_sampled_data):
    
    # get the sensor name
    name = report['names'][i]
    
    # print the first axis of the data 
    print('{}: {}'.format(name,data.shape[0]))

# get the number of unique time axes
unique_axes = np.unique(re_sampled_time)

print('\nnumber of unique time axes: {}'.format(unique_axes.ndim))


checking for number of samples in each sensor
Acc: 18104
GPS: 18104
Light: 18104
Proximity: 18104
SigMotion: 18104

number of unique time axes: 1


<p class="steps">5 - Writing the synchronized data to a new file</p>

The last step that we need to do is to write our synchronized data to a new file. As we have done in the previous steps, we will write a function that will do everything for us.

The function will receive the following inputs:  
 <ul>
    
  <li><strong>time_axis (1D array):</strong> The time axis after the padding and re-sampling the sensor data. </li><br>
    
  <li><strong>data (list):</strong> List containing the padded and re-sampled sensor signals. The length of data along the 0-axis has to be the same size as time_axis</li><br>

  <li><strong>header (string):</strong> A string containing the header that is supposed to be added to the file.</li><br>
  
   <li><strong>path (string):</strong> A string with the location where the file should be saved.</li><br>
    
  <li><strong>name (string, optional):</strong> The name of the file, with the suffix '.txt'. If not specified, the file is named 'android_synchronized.txt'.</li><br>
</ul> 

In [31]:
def save_synchronized_data(time_axis, data, header, path, name='android_synchroinzed.txt'):
    
    # create final save path
    save_path = os.path.join(path, name)
    
    # add the time axis for the final data array
    # make the time axis a column vecotr
    final_data_array = np.expand_dims(time_axis, 1)
    
    # write all the data into a single array
    for signals in data:
        
        final_data_array = np.append(final_data_array, signals, axis=1)
    
    # open a new file at the path location
    sync_file = open(save_path, 'w')
    
    # write the header to the file
    sync_file.write(header)
    
    # write the data to the file
    for row in final_data_array:
        sync_file.write('\t'.join(str(value) for value in row) + '\t\n')
    
    # close the file
    sync_file.close()
    
    

In [32]:
save_synchronized_data(re_sampled_time[0], re_sampled_data, header, path)

<p class="steps">6 - Using the the <strong><span class="color2">biosiganlsnotebooks</span></strong> package</p>

In case you want to use a function that conveniently handles everything for you, you can use the 'sync_android_files' function in the <strong><span class="color2">biosiganlsnotebooks</span></strong>. This function will handle all the steps shown above in a single function call. 

Below it is shown how to use the function in order to produce the same output as obtained in our previous steps.

In [33]:
# !!!! @Guilherme: function not yet implemented !!!! 
bsnb.sync_android_files(file_list, ...)

AttributeError: module 'biosignalsnotebooks' has no attribute 'sync_android_files'

<hr>
<table width="100%">
    <tr>
        <td class="footer_logo">
            <img src="../../images/ost_logo.png" alt="biosignalsnotebooks | project logo [footer]">
        </td>
        <td width="40%" style="text-align:left">
            <a href="../MainFiles/aux_files/biosignalsnotebooks_presentation.pdf" target="_blank">&#9740; Project Presentation</a>
            <br>
            <a href="https://github.com/biosignalsplux/biosignalsnotebooks" target="_blank">&#9740; GitHub Repository</a>
            <br>
            <a href="https://pypi.org/project/biosignalsnotebooks/" target="_blank">&#9740; How to install biosignalsnotebooks Python package ?</a>
            <br>
            <a href="https://www.biosignalsplux.com/notebooks/Categories/MainFiles/signal_samples.ipynb">&#9740; Signal Library</a>
        </td>
        <td width="40%" style="text-align:left">
            <a href="https://www.biosignalsplux.com/notebooks/Categories/MainFiles/biosignalsnotebooks.ipynb">&#9740; Notebook Categories</a>
            <br>
            <a href="https://www.biosignalsplux.com/notebooks/Categories/MainFiles/by_diff.ipynb">&#9740; Notebooks by Difficulty</a>
            <br>
            <a href="https://www.biosignalsplux.com/notebooks/Categories/MainFiles/by_signal_type.ipynb">&#9740; Notebooks by Signal Type</a>
            <br>
            <a href="https://www.biosignalsplux.com/notebooks/Categories/MainFiles/by_tag.ipynb">&#9740; Notebooks by Tag</a>
        </td>
    </tr>
</table>

<span class="color6"><strong>Auxiliary Code Segment (should not be replicated by
the user)</strong></span>

In [1]:

from biosignalsnotebooks.__notebook_support__ import css_style_apply
css_style_apply()

.................... CSS Style Applied to Jupyter Notebook .........................


In [4]:
%%html
<script>
    // AUTORUN ALL CELLS ON NOTEBOOK-LOAD!
    require(
        ['base/js/namespace', 'jquery'],
        function(jupyter, $) {
            $(jupyter.events).on("kernel_ready.Kernel", function () {
                console.log("Auto-running all cells-below...");
                jupyter.actions.call('jupyter-notebook:run-all-cells-below');
                jupyter.actions.call('jupyter-notebook:save-notebook');
            });
        }
    );
</script>