# Overall Per-User Summary Across Stress, Aerobic, and Anaerobic Sessions (Multi-Signal)

This notebook creates an overall per-user summary using multiple wearable signals from the Empatica E4 dataset.  
We compute simple statistical summaries (mean and std) for each user in each condition (stress, aerobic, anaerobic) using:  
EDA, HR, TEMP, ACC magnitude (movement), and BVP.  
The final output helps compare physiological responses across different states and participants.



In [34]:
# import basic libraries
import os
import numpy as np
import pandas as pd


In [35]:
# main dataset folder path (update if needed)
BASE_PATH = r"C:\Users\KARAN\Downloads\Internship Project\Wearable_Dataset\Wearable_Dataset"

# create paths for each condition
STRESS_PATH = os.path.join(BASE_PATH, "STRESS")
AEROBIC_PATH = os.path.join(BASE_PATH, "AEROBIC")
ANAEROBIC_PATH = os.path.join(BASE_PATH, "ANAEROBIC")

print("Dataset paths loaded successfully")


Dataset paths loaded successfully


In [36]:
# function to read EDA, HR, TEMP, BVP (1 column signals)
def read_empatica_1col(file_path):
    
    # read CSV file
    raw = pd.read_csv(file_path, header=None)
    
    # actual signal values start from row 3
    values = raw.iloc[2:, 0].astype(float).values
    
    return values


In [37]:
# function to compute accelerometer magnitude
def read_acc_magnitude(acc_path):
    
    # read raw accelerometer file
    acc_raw = pd.read_csv(acc_path, header=None)
    
    # get X, Y, Z columns (skip first 2 rows)
    acc_xyz = acc_raw.iloc[2:, :3].astype(float).values
    
    # compute magnitude = sqrt(x^2 + y^2 + z^2)
    magnitude = np.sqrt(
        acc_xyz[:, 0]**2 +
        acc_xyz[:, 1]**2 +
        acc_xyz[:, 2]**2
    )
    
    return magnitude


In [38]:
# function to safely compute mean and std
# if signal is missing or empty, return NaN
def safe_mean_std(values):
    
    if values is None or len(values) == 0:
        return np.nan, np.nan
    
    mean_val = float(np.mean(values))
    std_val = float(np.std(values))
    
    return mean_val, std_val


In [39]:
# this function computes summary statistics for one subject
def get_subject_summary(subject_folder_path):
    
    # define file paths
    eda_path = os.path.join(subject_folder_path, "EDA.csv")
    hr_path = os.path.join(subject_folder_path, "HR.csv")
    temp_path = os.path.join(subject_folder_path, "TEMP.csv")
    bvp_path = os.path.join(subject_folder_path, "BVP.csv")
    acc_path = os.path.join(subject_folder_path, "ACC.csv")
    
    # if EDA is missing, skip this subject
    if not os.path.exists(eda_path):
        return None
    
    # read signals
    eda = read_empatica_1col(eda_path)
    hr = read_empatica_1col(hr_path) if os.path.exists(hr_path) else None
    temp = read_empatica_1col(temp_path) if os.path.exists(temp_path) else None
    bvp = read_empatica_1col(bvp_path) if os.path.exists(bvp_path) else None
    acc_mag = read_acc_magnitude(acc_path) if os.path.exists(acc_path) else None
    
    # compute mean and std for each signal
    eda_mean, eda_std = safe_mean_std(eda)
    hr_mean, hr_std = safe_mean_std(hr)
    temp_mean, temp_std = safe_mean_std(temp)
    bvp_mean, bvp_std = safe_mean_std(bvp)
    acc_mean, acc_std = safe_mean_std(acc_mag)
    
    # return dictionary
    return {
        "eda_mean": eda_mean, "eda_std": eda_std,
        "hr_mean": hr_mean, "hr_std": hr_std,
        "temp_mean": temp_mean, "temp_std": temp_std,
        "bvp_mean": bvp_mean, "bvp_std": bvp_std,
        "acc_mean": acc_mean, "acc_std": acc_std
    }


In [40]:
# function to collect summaries for all subjects in a condition
def collect_condition(condition_path, condition_name):
    
    rows = []
    
    # list all subject folders
    subjects = sorted([
        s for s in os.listdir(condition_path)
        if os.path.isdir(os.path.join(condition_path, s))
    ])
    
    # loop through each subject
    for subject in subjects:
        
        subj_path = os.path.join(condition_path, subject)
        
        summary = get_subject_summary(subj_path)
        
        if summary is None:
            continue
        
        # create row dictionary
        row = {"subject": subject, "condition": condition_name}
        row.update(summary)
        
        rows.append(row)
    
    return pd.DataFrame(rows)


In [41]:
# collect summaries for stress, aerobic, and anaerobic
stress_df = collect_condition(STRESS_PATH, "stress")
aerobic_df = collect_condition(AEROBIC_PATH, "aerobic")
anaerobic_df = collect_condition(ANAEROBIC_PATH, "anaerobic")

# combine all into one dataframe
overall_df = pd.concat(
    [stress_df, aerobic_df, anaerobic_df],
    ignore_index=True
)

print("Total rows in combined summary:", len(overall_df))
display(overall_df.head())


Total rows in combined summary: 100


Unnamed: 0,subject,condition,eda_mean,eda_std,hr_mean,hr_std,temp_mean,temp_std,bvp_mean,bvp_std,acc_mean,acc_std
0,S01,stress,0.64373,0.14306,64.518334,10.728472,33.42942,0.440674,0.001251,59.577254,63.770533,2.768202
1,S02,stress,1.149615,0.356382,97.605146,27.201427,34.025904,0.313551,0.007915,46.290369,61.804389,1.717103
2,S03,stress,2.725394,2.043679,76.825209,14.574857,31.478544,0.378386,0.000887,76.599326,61.359186,1.763488
3,S04,stress,1.097917,0.594482,62.228672,8.454042,32.875733,0.215317,-0.003054,32.391408,61.706599,1.870639
4,S05,stress,14.095163,3.822214,77.00247,13.278766,33.580589,0.181167,0.011029,75.442597,63.862339,4.515905


In [42]:
# create pivot tables for mean values per user per condition

pivot_eda = overall_df.pivot_table(
    index="subject", columns="condition", values="eda_mean"
)

pivot_hr = overall_df.pivot_table(
    index="subject", columns="condition", values="hr_mean"
)

pivot_temp = overall_df.pivot_table(
    index="subject", columns="condition", values="temp_mean"
)

pivot_acc = overall_df.pivot_table(
    index="subject", columns="condition", values="acc_mean"
)

pivot_bvp = overall_df.pivot_table(
    index="subject", columns="condition", values="bvp_mean"
)

# display clean rounded tables
print("EDA Mean Per User")
display(pivot_eda.round(3))

print("HR Mean Per User")
display(pivot_hr.round(3))

print("TEMP Mean Per User")
display(pivot_temp.round(3))

print("ACC Mean Per User")
display(pivot_acc.round(3))

print("BVP Mean Per User")
display(pivot_bvp.round(3))


EDA Mean Per User


condition,aerobic,anaerobic,stress
subject,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
S01,3.465,6.072,0.644
S02,22.775,8.261,1.15
S03,,1.045,2.725
S03_,13.175,,
S04,21.884,3.343,1.098
S05,16.174,41.501,14.095
S06,3.27,,3.708
S06_,,11.684,
S07,,0.283,0.258
S07_,6.061,,


HR Mean Per User


condition,aerobic,anaerobic,stress
subject,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
S01,107.575,88.029,64.518
S02,127.322,74.85,97.605
S03,,109.179,76.825
S03_,122.335,,
S04,116.12,94.0,62.229
S05,101.963,105.717,77.002
S06,88.644,,90.997
S06_,,102.424,
S07,,94.446,78.461
S07_,104.634,,


TEMP Mean Per User


condition,aerobic,anaerobic,stress
subject,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
S01,32.468,32.192,33.429
S02,32.778,200.874,34.026
S03,,31.506,31.479
S03_,34.14,,
S04,32.626,32.178,32.876
S05,33.193,142.847,33.581
S06,31.458,,33.699
S06_,,31.959,
S07,,30.675,30.404
S07_,211.937,,


ACC Mean Per User


condition,aerobic,anaerobic,stress
subject,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
S01,63.521,64.644,63.771
S02,63.863,64.731,61.804
S03,,62.332,61.359
S03_,64.642,,
S04,64.344,64.892,61.707
S05,64.553,64.277,63.862
S06,64.346,,63.606
S06_,,64.448,
S07,,64.062,62.55
S07_,64.216,,


BVP Mean Per User


condition,aerobic,anaerobic,stress
subject,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
S01,-0.004,-0.006,0.001
S02,0.003,-0.002,0.008
S03,,-0.024,0.001
S03_,0.002,,
S04,0.005,-0.002,-0.003
S05,0.017,0.005,0.011
S06,-0.001,,-0.003
S06_,,0.027,
S07,,-0.011,0.002
S07_,0.003,,


In [43]:
# save full summary
overall_df.to_csv("overall_per_user_multisignal_summary.csv", index=False)

# save pivot tables
pivot_eda.to_csv("pivot_EDA_mean.csv")
pivot_hr.to_csv("pivot_HR_mean.csv")
pivot_temp.to_csv("pivot_TEMP_mean.csv")
pivot_acc.to_csv("pivot_ACC_mean.csv")
pivot_bvp.to_csv("pivot_BVP_mean.csv")

print("All summary files saved successfully")


All summary files saved successfully


## Conclusion

We created an overall multi-signal summary for each participant across stress, aerobic, and anaerobic sessions. By computing mean physiological values per condition, we were able to compare how wearable signals change between different activity states.

The results show meaningful differences across conditions, confirming that multi-signal wearable data can effectively represent physiological variations and support stress classification models.
