# Stress Detection in Working Environments Using Physiological Sensors

### Ilyass El Mansouri - Sajeevan Puvikaran - Belkacem Zehani - Salim Tabarani - Yanis Daci 

## Business Introduction

#### Background definition
The challenge nature of the global economy and the use of advanced technologies have changed the way we work. This has resulted in an increase workload, it has become a problem in many organizations, where employees have encountered psychological problems related to work stress. In particular, stress has been high in North America, where 55% of the population has reported that work-related stress had an impact on their physical and mental health. 
<br/>

<br/>
It has been proven that work-related stress engander disease activation. Several studies have shown that stress can lead to several negative health effects, such as high blood pressure, lack of sleep, susceptibility to infections, and cardiovascular disease. All these situations result to musculoskeletal diseases, immunological problems and mental health problems such as anxiety and depressive disorders. We noticed for example in the studies of European Foundation for the Improvement of Living and Working Conditions that a decline in physical and mental health due to work-related stress leads to a decrease in the performance and overall productivity of organizations and increased cost in terms of absenteeism.
<br/>

#### Business Need
<br/>
Considering detrimental effects of prolonged exposure to stress both for employees and organizations, there is a clear need for a system to regularily monitor the well-being of workers and act when levels of stress are too high in the overall ecosystem. In order to do this, we consider ourselves in the case of a company that is concerned about the well-being of its employees and offers annual medical check-ups in which they monitor a daily life of some of their employees. If daily stress levels are alarming, she will take the necessary steps to address them.
<br/>

## Introducing the Data : WESAD

In everyday language, stress typically indicates strain caused by physical or psychological pressures at work, at school, or in personal life as well as by one’s environment. From this point of view, stress can be seen as a defensive process to protect oneself from potential injury and threats to emotional well-being. Thus, it is not surprising that stress is related to the capacity to adapt and respond to various circumstances.
<br/>

<table>
    <tr>
    <td>
        <img id ='im1' src='stressometer.png' width="300" >
    </td>
    </tr>
</table>

In October 2018, a group of researchers introduced a WEarable Stress and Affect Detection (WESAD) dataset that provides a multimodal high-quality dataset with various affective states. The experiment tested for three affective states amusement, stress, and neutral. It also could help determine whether a test subject was or was not stressed. 
<br/>

The WESAD dataset was composed after a stress test that was performed on to twelve males and three females. To collect their data, they used both a chest and a wrist-worn devices: a RespiBAN Professional2 and an Empatica E4. 
<br/>
The RespiBAN itself is equipped with sensors to measure accelerometer and respiratory data. All signals were sampled at 700 Hz.
<br/>
All subjects wore the Empatica E4 on their non-dominant hand. The Empatica E4 recorded blood volume pulse at 64Hz, temperature at 4Hz, and electrodermal activity at 4Hz.
<br/>

<table>
    <tr>
    <td>
        <img id ='im1' src='ima_appareil.png' width="500" >
    </td>
    </tr>
</table>

In addition, the subjects were asked to follow a guided meditation in order to de-excite them after the stress and amusement conditions. After the subjects had been equipped with the sensors, a 20 minute baseline test was recorded. During the baseline, the subjects were asked to sit or stand at a table. After the baseline condition results were recorded, there was an amusement condition test. During the amusement condition, the subjects watched a set of eleven funny video clips. In total, the amusement condition had a length of 392 seconds.
<br/>
At the end of the protocol, the sensors were again synchronised via a double tap gesture. In total, the study had a duration of about two hours.

Raw sensor data was recorded with two devices: a chest-worn device (RespiBAN) and a wrist-worn device (Empatica E4). 



#### RespiBAN :

In order to convert the raw sensor values into SI units, each channel has to transformed based on the formulas given below (signal contains the raw sensor values, $vcc=3$, $\text{chan_bit}=2^{16}$).
- <b>ECG (mV):</b> $$(\frac{\text{signal}}{\text{chan_bit}}-0.5)*vcc$$
- <b>EDA (μS):</b> $$\frac{\frac{\text{signal}}{\text{chan_bit}}*vcc}{0.12}$$   
- <b>EMG (mV):</b> $$\frac{\text{signal}}{\text{chan_bit}}*vcc$$
- <b>TEMP (°C):</b>
$$v_{out} = \frac{\text{signal}*vcc}{\text{chan_bit}-1.}$$
$$rntc = \frac{(10^4)*v_{out}}{vcc-vout}- 273.15 + \frac{1}{1.12*10^{-3}} + 2.34*10^{-4}*\log(rntc)+ 8.77*10^{-8}*\log(rntc)^3$$
- <b>XYZ (g):</b> $$\frac{\text{signal}-C_{min}}{C_{max}-C_{min}}*2-1$$ Where $C_{min} = 28000$ and $C_{max} = 38000$


- <b>RESPIRATION (%):</b> $$\frac{\text{signal}} {\text{chan_bit} - 0.5} * 100$$

#### Empatic E4 :

The E4 device was worn on the subjects’ non-dominant wrist, The double-tap signal pattern was used to manually synchronise the two devices’ raw data.

The double-tap signal pattern was used to manually synchronise the two devices’ raw data. The result is provided in the files SX.pkl, one file per subject. This file is a dictionary, with the following keys:
- subject: SX, the subject ID
- signal: includes all the raw data, in two fields:
    - Chest: RespiBAN data (all the modalities: ACC, ECG, EDA, EMG, RESP, TEMP)
    - Wrist : EmpaticaE4data(allthemodalities:ACC, BVP, EDA, TEMP)
- label : ID of the respective study protocol condition, sampled at 700 Hz. The following IDs
are provided: 0 = not defined / transient, 1 = baseline, 2 = stress, 3 = amusement, 4 = meditation

## Exploring the Dataset

In [1]:
import numpy as np
import pandas as pd
import scipy

In [2]:
import warnings
warnings.filterwarnings('ignore')

In [3]:
agg_data = pd.read_csv(r"SUBJECTS_15_Data_sampled.csv", sep=',', encoding="utf-8", error_bad_lines=False, low_memory = False)

In [4]:
agg_data = agg_data.drop(columns=['SUBJECT'])

agg_data= agg_data.drop(columns=['Sample', 'GROUP_FREQ'])
noise = np.random.randn(agg_data.loc[:, agg_data.columns != 'Label'].shape[0], agg_data.loc[:, agg_data.columns != 'Label'].shape[1])
agg_data.loc[:, agg_data.columns != 'Label'] +=0.5* noise

In [5]:
agg_data.head()

Unnamed: 0,ECG,EMG,RESP,Wrist_ACC_X,Wrist_ACC_Y,Wrist_ACC_Z,BVP,TEMP,Label
0,-2.056404,-1.386906,-49.01492,9.642655,-3.406866,6.454657,11.02511,33.04595,0
1,-2.616893,-0.526535,-50.008533,4.497229,4.606888,2.194279,7.585705,34.159354,0
2,-0.749616,-2.63205,-49.323822,8.890774,-2.89048,0.25773,5.47152,33.001818,0
3,-1.992701,-1.416784,-49.904727,6.096096,0.297534,0.732023,0.300146,33.021762,0
4,-1.719664,-0.959851,-50.044601,5.42468,2.520652,6.584089,-13.842268,33.228527,0


In [6]:
from sklearn.model_selection import train_test_split

data_train, data_test, labels_train, labels_test = train_test_split(agg_data.loc[:, agg_data.columns != 'Label'],
                                                                    agg_data["Label"],
                                                                    test_size=0.33, random_state=42)

In [7]:
test=pd.concat([data_test,labels_test],axis=1)
train=pd.concat([data_train,labels_train],axis=1)

In [8]:
test.to_csv('data/test.csv')
train.to_csv('data/train.csv')

## Workflow

In [9]:
from sklearn.base import BaseEstimator
from sklearn.base import TransformerMixin


class FeatureExtractor(BaseEstimator, TransformerMixin):
    def __init__(self):
        pass

    def fit(self, X_df, y):
        return self

    def transform(self, X_df):
        X_df_new = X_df.copy()
        X_df_new = compute_rolling_mean(X_df_new, 'Wrist_ACC_X', '5')
        return X_df_new


def compute_rolling_mean(data, feature, row_window):
    name = '_'.join([feature, row_window, 'mean'])
    data[name] = data[feature].rolling(int(row_window)).mean()
    data[name] = data[name].ffill().bfill()
    data[name].astype(data[feature].dtype)
    return data


In [56]:
from sklearn.base import BaseEstimator
from sklearn.tree import DecisionTreeClassifier
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler


class Classifier(BaseEstimator):
    def __init__(self):
        self.model = make_pipeline(StandardScaler(), DecisionTreeClassifier())

    def fit(self, X, y):
        self.model.fit(X, y)

    def predict(self, X):
        return self.model.predict(X)

In [57]:
from sklearn.pipeline import make_pipeline

model = make_pipeline(FeatureExtractor(), Classifier())

In [58]:
model.fit(data_train, labels_train)

Pipeline(memory=None,
     steps=[('featureextractor', FeatureExtractor()), ('classifier', Classifier())])

In [59]:
y_pred = model.predict(data_test)

## Evaluation

In [62]:
!ramp_test_submission --submission=starting_kit

[38;5;178m[1mTesting stress rate prediction[0m
[38;5;178m[1mReading train and test files from ./data ...[0m
[38;5;178m[1mReading cv ...[0m
[38;5;178m[1mTraining ./submissions/starting_kit ...[0m
[38;5;178m[1mCV fold 0[0m
	[38;5;178m[1mscore  recall  precision  f1_score  Precision_Stress  final_score[0m
	[38;5;10m[1mtrain[0m    [38;5;10m[1m[38;5;150m0.92[0m[0m       [38;5;150m0.93[0m      [38;5;10m[1m[38;5;150m0.92[0m[0m              [38;5;150m[38;5;150m0.91[0m[0m         [38;5;150m[38;5;150m0.91[0m[0m
	[38;5;12m[1mvalid[0m    [38;5;12m[1m[38;5;105m0.60[0m[0m       [38;5;105m0.64[0m      [38;5;12m[1m[38;5;105m0.60[0m[0m              [38;5;105m0.54[0m         [38;5;105m0.56[0m
	[38;5;1m[1mtest[0m     [38;5;1m[1m[38;5;218m0.60[0m[0m       [38;5;218m0.64[0m      [38;5;1m[1m[38;5;218m0.60[0m[0m              [38;5;218m0.55[0m         [38;5;218m0.57[0m
[38;5;178m[1mCV fold 1[0m
	[38;5;178m[1mscore  recall  pre