# Challenge: Workshop and Challenge on Detection of Stress and Mental Health Using Wearable Sensors

<ul>
    <li><a href="#1">1. Data retrieval and cleaning</a></li>
</ul>
   
<ul>
   <li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#1.1">1.1.Import libraries</a></li>
</ul>

<ul>
   <li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#1.2">1.2. Retrieve dataset</a></li>
</ul>
<ul>
   <li>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href="#1.4">1.3. Selecting a sample</a></li>
</ul>


<ul>
   <li><a href="#4">4. Data statistics</a></li>
</ul>


<a id="1"></a>

## 1. Data retrieval + cleaning

<a id="1.1"></a>
### 1.1 Import libraries

In [1]:
import json
import os
import pandas as pd
import requests
import sys
import numpy as np

In [2]:
pip install --upgrade pip

Note: you may need to restart the kernel to use updated packages.


<a id="1.2"></a>
### 1.2. Retrieve SMILE

The SMILE dataset was collected from 45 healthy adult participants (39 females and 6 males) in Belgium. The average age of participants was 24.5 years old, with a standard deviation of 3.0 years. Each participant contributed to an average of 8.7 days of data. Two types of wearable sensors were used for data collection. One was a wrist-worn device (Chillband, IMEC, Belgium) designed for the measurement of skin conductance (SC), ST, and acceleration data (ACC). The second sensor was a chest patch (Health Patch, IMEC, Belgium) to measure ECG and ACC. It contains a sensor node designed to monitor ECG at 256 Hz and ACC at 32 Hz continuously throughout the study period. Participants could remove the sensors while showering or before doing intense exercises. Also, participants received notifications on their mobile phones to report their momentary stress levels daily. 

https://compwell.rice.edu/workshops/embc2022/dataset

In [3]:
dataset = np.load('data/dataset_smile_challenge.npy', allow_pickle=True).item()

    dict
        dictionary with dataset, with keys:
         * `train`
          * `deep_features`
           * `ECG_features_C`
           * `ECG_features_T`
           * `masking`
          * `hand_crafted_features`
           * `ECG_features`
           * `ECG_masking`
           * `GSR_features`
           * `GSR_masking`
          * `labels`
         * `test`
          * Same structure as `train`

Let's explore the contents of the dataset directory

In [4]:


# for training and testing data:

dataset_train = dataset['train']

dataset_test = dataset['test']

# for deep features.

deep_features = dataset_train['deep_features']

# conv1d backbone based features for ECG signal.

deep_features['ECG_features_C'] 

# transformer backbone basde features for ECG signal  

deep_features['ECG_features_T']   

# for hand-crafted features.

handcrafted_features_train = dataset_train['hand_crafted_features']
handcrafted_features_test = dataset_test['hand_crafted_features']

# handcrafted features for ECG signal

handcrafted_features_train['ECG_features'] 
handcrafted_features_test['ECG_features']

 # handcrafted features for GSR signal. 

handcrafted_features_train['GSR_features'] 
handcrafted_features_test['GSR_features']

# for labels.

labels_train = dataset_train['labels']  # labels.
labels_test = dataset_test['labels']


In [5]:
dataset['test'].keys()

dict_keys(['deep_features', 'hand_crafted_features', 'labels'])

In [82]:
len(dataset['train']['labels'])

2070

Now we have a DataFrame with the contents of the metadata file.

<a id="4"></a>
## 4. Data statistics 

In [9]:
print(
    f"train: {dataset['train']['hand_crafted_features']['ECG_features'].shape}")
print(
    f"test: {dataset['test']['hand_crafted_features']['ECG_features'].shape}")


train: (2070, 60, 8)
test: (986, 60, 8)


Load SMILE dataset as a dictionary from npy file.
Each feature matrix has 3 dimensions:
* sequence (of 60 minutes)
* window (5 minute with 4 min overlap)
* feature

## ECG Features
* feature and label vector construction
* creation of classifier

In [8]:
len(dataset['train']['hand_crafted_features']['ECG_features'])


2070

In [6]:
dataset['train']['hand_crafted_features']['ECG_features'][0].shape #tem uma sequencia de 60 minutos ?

(60, 8)

In [85]:
# representa as features extraidas de 1 janela da sequencia (correspondente a 5 minutos)
dataset['test']['hand_crafted_features']['ECG_features'][0][0]


array([0.07646664, 0.0957195 , 0.01932093, 0.01004426, 0.89676394,
       0.11722841, 0.06801284, 0.10347245])

In [34]:
# Create an array with features
# Train dataset
nfeatures = len(dataset['train']['hand_crafted_features']['ECG_features'][0][0])
n = len(dataset['train']['hand_crafted_features']['ECG_features']) * \
    len(dataset['train']['hand_crafted_features']['ECG_features'])
# variaveis e iniciação a zero
handcrafted_features_train=np.zeros((n,nfeatures)) #X
handcrafted_labels_train = np.zeros(n)  # y
#
count=0
for i in range(len(dataset['train']['hand_crafted_features']['ECG_features'])):
    for j in range(len(dataset['train']['hand_crafted_features']['ECG_features'][i])):
        if(np.sum(np.isnan(dataset['train']['hand_crafted_features']['ECG_features'][i][j])) == 0):
            # nao considerar os nan
            handcrafted_features_train[count,
                                       0:nfeatures] = dataset['train']['hand_crafted_features']['ECG_features'][i][j]
            handcrafted_labels_train[count] = dataset['train']['labels'][i]
            count=count+1      

In [None]:
# Test dataset
nfeatures = len(dataset['test']['hand_crafted_features']['ECG_features'][0][0])
n = len(dataset['test']['hand_crafted_features']['ECG_features']) * \
    len(dataset['test']['hand_crafted_features']['ECG_features'][0])
# variaveis e iniciação a zero
handcrafted_features_test=np.zeros((n,nfeatures)) #X
handcrafted_labels_test = np.zeros(n)  # y

count=0
for i in range(len(dataset['test']['hand_crafted_features']['ECG_features'])):
    #print(dataset['test']['hand_crafted_features']['ECG_features'][i])
    for j in range(len(dataset['test']['hand_crafted_features']['ECG_features'][i])):
        if(np.sum(np.isnan(dataset['test']['hand_crafted_features']['ECG_features'][i][j])) == 0):
            # nao considerar os nan
            handcrafted_features_test[count,
                                      0:nfeatures] = dataset['test']['hand_crafted_features']['ECG_features'][i][j]
            handcrafted_labels_test[count] = dataset['test']['labels'][i]
            count = count+1

In [41]:
# Remove all rows with zeros from the array
data1 = handcrafted_features_train
data1[~np.all(data1 == 0, axis=1)].shape

# Remove last values from the array
data_train = handcrafted_features_train
data_train[:-(n-count),:].shape  

data_test = handcrafted_features_test
data_test[:-(n-count),:].shape

(0, 8)

In [42]:
## Labels
data_train_label = handcrafted_labels_train
data_train_label[:-(n-count)].shape

data_test_label = handcrafted_labels_test
data_test_label[:-(n-count)].shape


(0,)

In [37]:
## find max
np.where(data_train_label == np.max(data_train_label))


(array([ 16860,  16861,  16862, ..., 117486, 117487, 117488]),)

In [38]:
## find min
np.where(data_train_label==np.min(data_train_label))

(array([      0,       1,       2, ..., 4284897, 4284898, 4284899]),)

In [39]:
## find uniques
np.unique(data_train_label)

array([0., 1., 2., 3., 4., 5., 6.])

In [45]:
# Convert to a dataframe and save it in csv
df_train = pd.DataFrame(data=np.column_stack((data_train, data_train_label)))
df_train.columns = ["F"+str(i) for i in range(1, len(df_train.columns) + 1)]
df_train.rename(columns={'F9': 'Label'}, inplace=True)
df_train

Unnamed: 0,F1,F2,F3,F4,F5,F6,F7,F8,Label
0,0.145656,0.152954,0.029353,0.013258,0.487958,0.272209,0.149786,0.056021,0.0
1,0.161642,0.037914,0.008152,0.015038,0.485591,0.273006,0.150057,0.061644,0.0
2,0.102252,0.007947,0.003004,0.026775,0.469134,0.222267,0.105493,0.101103,0.0
3,0.101629,0.007554,0.003805,0.035377,0.456785,0.069741,0.043349,0.124622,0.0
4,0.084450,0.012880,0.007234,0.039552,0.510779,0.250722,0.168897,0.120275,0.0
...,...,...,...,...,...,...,...,...,...
4284895,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
4284896,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
4284897,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
4284898,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0


In [43]:
# Convert to a dataframe and save it in csv
df_test = pd.DataFrame(data=np.column_stack((data_test, data_test_label)))
df_test.columns = ["F"+str(i) for i in range(1, len(df_test.columns) + 1)]
df_test.rename(columns={'F9': 'Label'}, inplace=True)
df_test


Unnamed: 0,F1,F2,F3,F4,F5,F6,F7,F8,Label
0,0.076467,0.095720,0.019321,0.010044,0.896764,0.117228,0.068013,0.103472,0.0
1,0.080711,0.110469,0.028849,0.014305,0.862462,0.151996,0.091932,0.105959,0.0
2,0.079685,0.075259,0.021964,0.016503,0.793803,0.142263,0.100381,0.115297,0.0
3,0.114616,0.047981,0.009772,0.010209,0.688007,0.128244,0.102531,0.061526,0.0
4,0.114824,0.018641,0.009672,0.032108,0.600439,0.106743,0.092132,0.073630,0.0
...,...,...,...,...,...,...,...,...,...
59155,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
59156,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
59157,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0
59158,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.000000,0.0


In [46]:
df_train.to_csv('data/data_train.csv')
df_test.to_csv('data/data_test.csv')

### Classifier

In [17]:
from sklearn import svm
clf = svm.SVC()
clf.fit(data_train, data_train_label)


In [47]:
dataset['train']['hand_crafted_features']['GSR_features'].shape


(2070, 60, 12)

In [48]:
dataset['test']['hand_crafted_features']['GSR_features'].shape

(986, 60, 12)