# HR Calibration & Forecasting using PPG-Dalia Dataset

Training a model that can predict an individual's ECG-measured HR based on signals measured by a wrist device. These signals include:
- Blood Volume Pulse (BVP)
- Accelerometer (ACC)
- Skin Temperature (TEMP)
- Electrodermal Activity (EDA)

We will also train the model to simutaneously predict HR in future periods given current HR trends.

Forecasting and calibration functions are achieved due to the model architecture: shared encoder, followed by two heads.

## Data Notes


### Data Process:
- For every 8-second window of PPG-measured features, there is a corresponding ECG-measured HR.
- Our shared encoder will look to filter each window into its final state to allow for optimal data representation for calibration and forecasting.
- Our calibration and forecasting heads will then train on these final states via HR and HR at *t+H* predictions, in relation to each window's label.

### Window Information:
- length: 8s | shift: 2s
- BVP: 512 samples per window (64Hz)
- ACC: 256 samples per window (32Hz)
- TEMP: 32 samples per window (4Hz)
- EDA: 32 samples per window (4Hz)

We will interpolate ACC, TEMP, and EDA to ensure equal input per feature.

### Calibration & Forecasting Labels:
- `y_now`
- `y_fut` to be calculated by taking corresponding `y_now` + H, where H is number of windows ahead

### Activities:
- Sitting (ID: 1) used as a motion-artefact-free baseline.
- Ascending & descending stairs (ID: 2)
- Table soccer (ID: 3)
- Driving a car (ID: 5)
- Lunch break (ID: 6)
- Walking (ID: 7)
- Working (ID: 8)
- Transient (ID: 0)

### Retrieving the Data

In [None]:
import os
os.makedirs("/content/PPG_DaLia", exist_ok=True)

In [None]:
import os
import pickle
import numpy as np

# path where PPG_DaLia data resides
DATASET_PATH = "/content/drive/MyDrive/PPG_DaLia/subjects"

# dictionary of subject dictionaries
all_subjects = {}

for subject_id in range(1, 16):
  fname = f"S{subject_id}.pkl"
  fpath = os.path.join(DATASET_PATH, fname)

  if not os.path.exists(fpath):
    print(f"File {fname} missing, skipping...")
    continue


  with open(fpath, "rb") as f:
    data = pickle.load(f, encoding="latin1")

  print(f"{subject_id} extracted")

  # extract features
  wrist = data["signal"]["wrist"]
  bvp = wrist["BVP"]
  acc = wrist["ACC"]
  eda = wrist["EDA"]
  temp = wrist["TEMP"]

  # extract label
  labels = data["label"]

  # store in dict format; convert from list into array
  all_subjects[subject_id] = {
      "BVP": np.array(bvp, dtype=np.float32),
      "ACC": np.array(acc, dtype=np.float32),
      "EDA": np.array(eda, dtype=np.float32),
      "TEMP": np.array(temp, dtype=np.float32),
      "HR_labels": np.array(labels, dtype=np.float32)
  }

print("Loaded subjects:", list(all_subjects.keys()))

1 extracted
2 extracted
3 extracted
4 extracted
5 extracted
6 extracted
7 extracted
8 extracted
9 extracted
10 extracted
11 extracted
12 extracted
13 extracted
14 extracted
15 extracted
Loaded subjects: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]


Save `all_subjects` in Drive file for easy accessibility in the future...

In [None]:
import pickle

with open("/content/drive/MyDrive/PPG_DaLia/all_subjects.pkl", "wb") as f:
  pickle.dump(all_subjects, f)

Retrieve `all_subjects` dictionary

In [None]:
with open("/content/drive/MyDrive/PPG_DaLia/all_subjects.pkl", "rb") as f:
  all_subjects = pickle.load(f)

### Interpolating ACC, TEMP, & EDA

- To meet standards set by BVP of 64Hz, therefore 512 values per feature per 8s window.

Note to self:
- numpy's criteria for a dtype=object is when the array holds elements that aren't uniform (e.g. one element string, another integer), of varying length (e.g. [1,2] followed by a [1,3] array), or are custom python objects

In [None]:
import numpy as np

TARGET_FS = 64.0

# conversion to float32 numpy numeric array
def to_float32_np(x):
  """
  In the case dtype=object and array isn't numeric; must convert to numpy float32 array
  """
  a = np.asarray(x)
  # if object dtype, try to stack if it's an array of arrays
  if a.dtype == object:
    try:
      a = np.stack(a.tolist(), axis=0) # convert a into list for which it is then stacked
    except Exception:
      a = a.astype('float32', copy=False) # in case a is not an array of containers

  return a.astype(np.float32, copy=False)

# ensure all array shapes (signals) are time-major, meaning that (samples, channels)
def make_time_major(arr, expect_channels=None):
  """
  Ensure array is time-major
  - 1D -> (N, 1)
  - If 2D and looks like (C, N) due to C being relatively small, transpose to (N, C)
  """
  a = to_float32_np(arr)
  # conditions to determine array shape
  if a.ndim == 1:
    a = a[:, None] # (N,) -> (N, 1)
  elif a.ndim == 2:
    N0, N1 = a.shape
    # if first dim is small and second much larger, transpose to meet (N, C) requirements
    if N0 <= 4 and N1 > N0:
      a = a.T

  else:
    a = a.reshape(a.shape[0], -1) # if array has > 2 dimensions, reshape to (C, all other dims)

  # in case where expected_channels of an array doesn't match
  if expect_channels is not None and a.shape[1] != expect_channels:
    pass # no hard fail; may want to add in logs

  return a # (N, C)

def interp_to_target_1d(x_1d, in_fs, target_len, target_fs=TARGET_FS):
  """
  Using target_len as reference, interpolate x to ensure 512 values
  per feature per window
  """
  # ensure 1-D arrays
  x = to_float32_np(x_1d).reshape(-1)

  # window length
  duration = target_len / target_fs

  # set desired distribution
  t_out = np.linspace(0.0, duration, target_len, endpoint=False)
  # set current distribution on target
  t_in = np.linspace(0.0, duration, len(x), endpoint=False)

  # interpolate x to match distribution
  return np.interp(t_out, t_in, x).astype(np.float32)

def resample_subject_to_64hz(subj):
  """
  Uses interp_to_target_1d() to interpolate ACC, EDA, and TEMP
  """
  # convert object into np.float32 as tensor prep
  bvp = to_float32_np(subj["BVP"]).reshape(-1)
  target_len = len(bvp)

  # ACC - could be in format (N,3), (3,N), (N,) ---> (256, 3)
  acc_raw = make_time_major(subj["ACC"])
  C = acc_raw.shape[1]

  acc_64 = np.stack(
      [interp_to_target_1d(acc_raw[:, i], in_fs=32.0, target_len=target_len) for i in range(C)],
      axis=1
  )

  # EDA
  eda_raw = to_float32_np(subj["EDA"]).reshape(-1)
  eda_64 = interp_to_target_1d(eda_raw, in_fs=4.0, target_len=target_len)

  # TEMP
  temp_raw = to_float32_np(subj["TEMP"]).reshape(-1)
  temp_64 = interp_to_target_1d(temp_raw, in_fs=4.0, target_len=target_len)

  # updated dictionary with interpolated values
  return {
      "BVP": bvp.astype(np.float32),
      "ACC": acc_64.astype(np.float32),
      "EDA": eda_64.astype(np.float32),
      "TEMP": temp_64.astype(np.float32),
      "HR_labels": to_float32_np(subj["HR_labels"]).reshape(-1)
  }

# create a new dict of similar format, just with interpolated signals
all_subjects_64hz = {}

for sid, subj in all_subjects.items():
  try:
    out = resample_subject_to_64hz(subj)
    all_subjects_64hz[sid] = out
    # shape of signal arrays
    print(f"S{sid}: BVP {out['BVP'].shape}, ACC {out['ACC'].shape}, "
          f"EDA {out['EDA'].shape}, TEMP {out['TEMP'].shape}, HR {out['HR_labels'].shape}")
  except Exception as e:
    print(f"Resample failed for subject {sid}: {e}")

S1: BVP (589568,), ACC (589568, 3), EDA (589568,), TEMP (589568,), HR (4603,)
S2: BVP (525120,), ACC (525120, 3), EDA (525120,), TEMP (525120,), HR (4099,)
S3: BVP (559424,), ACC (559424, 3), EDA (559424,), TEMP (559424,), HR (4367,)
S4: BVP (585600,), ACC (585600, 3), EDA (585600,), TEMP (585600,), HR (4572,)
S5: BVP (595520,), ACC (595520, 3), EDA (595520,), TEMP (595520,), HR (4649,)
S6: BVP (336000,), ACC (336000, 3), EDA (336000,), TEMP (336000,), HR (2622,)
S7: BVP (597952,), ACC (597952, 3), EDA (597952,), TEMP (597952,), HR (4668,)
S8: BVP (517120,), ACC (517120, 3), EDA (517120,), TEMP (517120,), HR (4037,)
S9: BVP (547840,), ACC (547840, 3), EDA (547840,), TEMP (547840,), HR (4277,)
S10: BVP (681472,), ACC (681472, 3), EDA (681472,), TEMP (681472,), HR (5321,)
S11: BVP (579072,), ACC (579072, 3), EDA (579072,), TEMP (579072,), HR (4521,)
S12: BVP (506496,), ACC (506496, 3), EDA (506496,), TEMP (506496,), HR (3954,)
S13: BVP (584704,), ACC (584704, 3), EDA (584704,), TEMP (584

As seen above, signal data is currently not split into its respective windows, which needs to be done for corresponding HR values.

## Train/Val/Test Split

### Leave-One-Subject-Out (LOSO)

- With the split, we will take a LOSO approach, for which we train/val/test as many times as there are subjects.
- Each fold (run) will have a different subject as the test set, so we can get a good understanding of the model's generalizability - e.g. to fit/unfit men/women.
- Average performance over all folds.

### Validation Set

- The validation set will contain the final 20% of windows for every activity for every subject (except for testing subject).
- This ensures variety in the HR of the validation set, while also maintaining chronology.
- An embargo of 8 seconds (1 window) will be included between the train and validation data to prevent overlap leakage.

## Data Setup

### Signal Reformatting, Normalization, and Window Formation

In [None]:
import numpy as np
import torch

FS = 64
WIN_S = 8
SHIFT_S = 2 # with every window shifting 2 seconds, 6 out 8 seconds (therefore samples) will overlap with previous window
T = WIN_S * FS # 512 samples per window
STRIDE = SHIFT_S * FS # number of unique samples per window relative to previous window

# feature order used everywhere
FEATURE_ORDER = ["BVP", "ACCx", "ACCy", "ACCz", "TEMP", "EDA"]


# reformats each subject dictionary so all signals are within a single array of shape [N, 6]
def assemble_features_subject(subj_dict):
  """
  Returns:
  X_cont: (N, F=6) in FEATURE_ORDER
  labels: (M,) average ECG-HR per 8s window
  """
  # reshape to 1D and retrieve row shape for N
  bvp = subj_dict["BVP"].astype(np.float32).reshape(-1)
  N = bvp.shape[0]

  # ACC - (N,) OR (N, 1) --> (N, 3)
  acc = subj_dict["ACC"].astype(np.float32)

  if acc.ndim == 1:
    acc = acc[:, None]
  if acc.shape[1] == 1:
    acc = np.repeat(acc, 3, axis=1)
  # case if neither
  elif acc.shape[1] != 3:
    raise ValueError(f"ACC has {acc.shape[1]} channels; expected 1 or 3")

  # for TEMP and EDA
  temp = subj_dict["TEMP"].astype(np.float32).reshape(-1)
  eda = subj_dict["EDA"].astype(np.float32).reshape(-1)

  # sanity check
  assert acc.shape[0] == N == temp.shape[0] == eda.shape[0], "Signal length mismatch"

  # stack to form (N, 6)
  X_cont = np.stack([bvp, acc[:, 0], acc[:, 1], acc[:, 2], temp, eda], axis=1)

  # labels to (M,)
  labels = subj_dict["HR_labels"].astype(np.float32).reshape(-1)

  return X_cont, labels

# compute stats for normalization from training subjects only
def compute_norm_stats(train_subject_ids, all_subjects_64hz):
  """Returns: mean (F,), std (F,) for each feature"""
  feats_list = []
  # extract signal data for every subject
  for sid in train_subject_ids:
    X_cont, _ = assemble_features_subject(all_subjects_64hz[sid])
    feats_list.append(X_cont)

  # combine all signal data across all subjects and then calculate stats
  big = np.concatenate(feats_list, axis=0) # (sumN, F)
  mean = big.mean(axis=0)
  std = big.std(axis=0)
  std = np.where(std < 1e-6, 1.0, std) # avoid division by zero during normalization

  return mean.astype(np.float32), std.astype(np.float32)

# normalization
def apply_norm(X_cont, mean, std):
  return ((X_cont - mean) / std).astype(np.float32)

# 8s windows are created by taking the final 6s of the previous window, and adding 2s more
def make_windows_for_subject(X_cont, labels, H_segments=1):
  """
  Returns:
  Xw: all windows stacked in (no. windows, 512, 6) shape
  y_now: all corresponding HRs for every window (no. windows,)
  y_fut: all corresponding future HRs for all compatible windows (no. compatible windows,)
  """
  # number of windows expected
  M = len(labels)
  num_seg = M

  # windows for signal data, HR data, and future HR data
  Xw = []
  y_now = []
  y_fut = []

  # for every window k...
  for k in range(num_seg):

    # define the data range for one window
    start = int(k * STRIDE)
    end = start + T

    if end > X_cont.shape[0]:
      break # unlikely to happen as num_seg should correspond with signal data

    # forecasting criteria - final few windows not included due to t + H not existing in labels
    if (k + H_segments) >= M:
      break

    # addition of data for each window
    Xw.append(X_cont[start:end, :]) # (512, F)
    y_now.append(labels[k]) # ECG-HR @ k
    y_fut.append(labels[k + H_segments]) # ECG-HR @ segment k+H

  # format
  Xw = np.stack(Xw, axis=0).astype(np.float32) # (num_kept, 512, 6)
  y_now = np.array(y_now, dtype=np.float32) # (num_kept,)
  y_fut = np.array(y_fut, dtype=np.float32) # (num_kept,)

  return Xw, y_now, y_fut

### CSV Extraction and Activity Masks

In [None]:
import csv
import numpy as np
from collections import defaultdict

# extract per-activity window information for each subject
def load_activity_markers(csv_path):
  """
  Returns a list of (name, start_idx) sorted by start_idx, where 'name' is the activity
  and 'start_idx' is the window for which the activity begins.
  """
  markers = []
  # read path as csv
  with open(csv_path, "r") as f:
    reader = csv.reader(f)

    for row in reader:
      if not row or len(row) < 2:
        continue

      # extract activity (or subject if row 0)
      name = str(row[0]).strip()

      # skip if row 0
      if "SUBJECT" in name.upper():
        continue

      # extract start window
      try:
        start_idx = int(str(row[1]).strip()) # this would only work if only integer was in row[1]
      except ValueError:
        # in case in format "NAME, 1234" in row[1]
        parts = name.split(",")
        if len(parts) == 2:
          name, start_idx = parts[0].strip(), int(parts[1].strip())
        else:
          continue # ignore otherwise

      # normalize name
      if name.startswith("#"):
        name = name[1:].strip() # remove #

      markers.append((name, start_idx))

  # keep and store only rows that have numeric start indices
  markers = [(n, i) for (n, i) in markers if isinstance(i, int)]
  markers.sort(key=lambda x: x[1])
  return markers

# assign windows either train, validation, or embargo
def build_activity_masks_from_markers(markers, num_segments, val_frac=0.2, embargo_segments=4):
  """
  Returns: train_mask (num_segments,) (boolean), val_mask (num_segments,) (boolean), per_seg_activity (num_segments,) (activity)
  """
  # set up arrays that will hold train/val/activity information per window
  train_mask = np.ones(num_segments, dtype=bool)
  val_mask = np.zeros(num_segments, dtype=bool)
  per_seg_activity = np.array(["UNKNOWN"] * num_segments, dtype=object)

  # determine start and end window for all activities
  for j, (name, start_k) in enumerate(markers):
    # the end window is the next marker's start window (assuming a window is available)
    end_k = markers[j+1][1] if (j + 1) < len(markers) else num_segments
    start_k = max(0, min(start_k, num_segments))
    end_k = max(0, min(end_k, num_segments))
    # in case of malfunction
    if end_k <= start_k:
      continue

    # set up index range for corresponding activity windows
    idx = np.arange(start_k, end_k)

    # fill corresponding windows with activity name
    per_seg_activity[idx] = name

    # extract validation (final 20%) of windows
    n = len(idx)
    n_val = max(1, int(np.floor(val_frac * n)))
    val_idx = idx[-n_val:]

    # determine embargo windows; final 4 (default) windows of training data
    emb_start = max(start_k, end_k - n_val - embargo_segments)
    emb_idx = np.arange(emb_start, end_k - n_val) if (end_k - n_val) > emb_start else np.array([], dtype=int)

    # apply masks
    val_mask[val_idx] = True
    train_mask[val_idx] = False
    train_mask[emb_idx] = False

  return train_mask, val_mask, per_seg_activity

### Create LOSO Fold Builder

In [None]:
import os
import numpy as np
import torch
from torch.utils.data import TensorDataset, DataLoader

# build loso fold builder by: splitting train & test subjects, computing norm for both,
# assigning train/val masks for each activity per subject, and then splitting accordingly
# and into tensors and loaders
def build_loso_fold_with_activity(
    all_subjects_64hz,
    activity_dir, # location SX_activity.csv file
    test_sid,
    H_segments=30, # prediction 60s ahead
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256,
    num_workers=2,
    pin_memory=True):

  # order subject_ids and filter out test subject
  subject_ids = sorted(all_subjects_64hz.keys())
  train_sids = [s for s in subject_ids if s != test_sid]

  # normalization stats for train subjects
  mean, std = compute_norm_stats(train_sids, all_subjects_64hz)

  # reformat test subject to (N, 6) and (M,)
  Xc_test, lab_test = assemble_features_subject(all_subjects_64hz[test_sid])
  Xc_test = apply_norm(Xc_test, mean, std)
  # reformat to (no. windows, 512, 6), (no. windows,), (no. compatible windows,)
  Xw_test, y_now_test, y_fut_test = make_windows_for_subject(Xc_test, lab_test, H_segments)

  # number of windows (segments) to be embargoed for 8 second embargo
  embargo_segments = int(np.ceil(embargo_seconds / SHIFT_S))

  # train and val data following 20% split
  X_train_list, y_now_train_list, y_fut_train_list = [], [], []
  X_val_list, y_now_val_list, y_fut_val_list, = [], [], []

  for sid in train_sids:
    Xc, lab = assemble_features_subject(all_subjects_64hz[sid]) # to (N, 6) and (M,)
    Xc = apply_norm(Xc, mean, std)
    Xw, y_now, y_fut = make_windows_for_subject(Xc, lab, H_segments) # to (no. windows, 512, 6)
    num_seg = len(y_now) # total windows

    # extract and assign activity markers
    act_csv = os.path.join(activity_dir, f"S{sid}_activity.csv")
    markers = load_activity_markers(act_csv)
    tr_mask, va_mask, _ = build_activity_masks_from_markers(
        markers, num_segments=num_seg, val_frac=val_frac, embargo_segments=embargo_segments
    )

    # add train and validation segments to respective array
    X_train_list.append(Xw[tr_mask]); y_now_train_list.append(y_now[tr_mask]); y_fut_train_list.append(y_fut[tr_mask])
    X_val_list.append(Xw[va_mask]); y_now_val_list.append(y_now[va_mask]); y_fut_val_list.append(y_fut[va_mask])

  # concatenate all subjects' train and validation segments along rows
  def _catX(lst): return np.concatenate(lst, axis=0).astype(np.float32) if len(lst) else np.empty((0, T, len(FEATURE_ORDER)), np.float32)
  def _caty(lst): return np.concatenate(lst, axis=0).astype(np.float32) if len(lst) else np.empty((0,), np.float32)

  X_train = _catX(X_train_list); y_now_train = _caty(y_now_train_list); y_fut_train = _caty(y_fut_train_list)
  X_val = _catX(X_val_list); y_now_val = _caty(y_now_val_list); y_fut_val = _caty(y_fut_val_list)

  # tensors
  train_ds = TensorDataset(torch.from_numpy(X_train), torch.from_numpy(y_now_train), torch.from_numpy(y_fut_train))
  val_ds = TensorDataset(torch.from_numpy(X_val), torch.from_numpy(y_now_val), torch.from_numpy(y_fut_val))
  test_ds = TensorDataset(torch.from_numpy(Xw_test), torch.from_numpy(y_now_test), torch.from_numpy(y_fut_test))

  # loaders
  train_loader = DataLoader(train_ds, batch_size=batch_train, shuffle=True, drop_last=True, num_workers=num_workers, pin_memory=pin_memory)
  val_loader = DataLoader(val_ds, batch_size=batch_eval, shuffle=False, drop_last=False, num_workers=num_workers, pin_memory=pin_memory)
  test_loader = DataLoader(test_ds, batch_size=batch_eval, shuffle=False, drop_last=False, num_workers=num_workers, pin_memory=pin_memory)

  print(f"[Fold S{test_sid}] Train {len(train_ds)} | Val {len(val_ds)} | Test {len(test_ds)} | H={H_segments} seg")

  return {
      "mean": mean, "std": std, "feature_order": FEATURE_ORDER,
      "train_loader": train_loader, "val_loader": val_loader, "test_loader": test_loader,
      "train": tuple(t for t in train_ds.tensors),
      "val": tuple(t for t in val_ds.tensors),
      "test": tuple(t for t in test_ds.tensors)
  }

## Building the Model

### GRU (shared encoder) --> Head A (calibration) | Head B (forecasting) --> `y_now` | `y_fut`

Shared encoder uses a GRU architecture due to the sequential nature of the windows (512 recordings per window). Patterns discovered over that 512-sample timeframe are crucial for determining the `y_now` and the `y_fut`.

- calibration (`y_now`) requires GRU due to every sample representing 1/64 of a second, which isn't enough to determine HR at a single point in time.
- forecasting (`y_fut`) requires GRU due to the identification of a trend within the 8-second window to then predict future HR.

It is the responsibility of the shared encoder to include the best information in the hidden states to allow for the two heads to derive useful patterns.

In [None]:
import torch
import torch.nn as nn

class MultiTaskGRU(nn.Module):
  """
  Shared GRU encoder + two regression heads (calibration & forecasting)

  Input: x of shape (B, T=512, F=6)
  Output: y_now (B,), y_fut (B,)
  """
  def __init__(
      self,
      input_dim: int = 6,
      hidden_size: int = 96,
      num_layers: int = 2,
      dropout: float = 0.2,
      bidirectional: bool = False,
      head_hidden: int = 64,
      head_dropout: float = 0.1):

    super().__init__()

    self.input_dim = input_dim
    self.hidden_size = hidden_size
    self.num_layers = num_layers
    self.bidirectional = bidirectional
    self.dir_mult = 2 if bidirectional else 1
    enc_out_dim = hidden_size * self.dir_mult

    # shared encoder
    self.gru = nn.GRU(
        input_size=input_dim,
        hidden_size=hidden_size,
        num_layers=num_layers,
        dropout=dropout if num_layers > 1 else 0.0,
        batch_first=True,
        bidirectional=bidirectional
    )

    # head constructor
    def make_head():
      if head_hidden and head_hidden > 0:
        return nn.Sequential(
            nn.Linear(enc_out_dim, head_hidden),
            nn.ReLU(inplace=True),
            nn.Dropout(head_dropout),
            nn.Linear(head_hidden, 1)
        )
      else:
        return nn.Linear(enc_out_dim, 1)

    # creation of calibration and forecasting heads
    self.head_now = make_head()
    self.head_fut = make_head()

    # initialize parameters to 0 to prevent early instability
    nn.init.zeros_(self.head_now[-1].weight if isinstance(self.head_now, nn.Sequential) else self.head_now.weight)
    nn.init.zeros_(self.head_now[-1].bias if isinstance(self.head_now, nn.Sequential) else self.head_now.bias)
    nn.init.zeros_(self.head_fut[-1].weight if isinstance(self.head_fut, nn.Sequential) else self.head_fut.weight)
    nn.init.zeros_(self.head_fut[-1].bias if isinstance(self.head_fut, nn.Sequential) else self.head_fut.bias)

  def forward(self, x, return_hidden: bool = False):
    """
    x: (B, T, F)
    Return: y_now, y_fut, (+ h_last if return_hidden)
    """
    # GRU (B, T, F) --> (B, H*dir)

    # GRU outputs: out (B, T, H*dir), h_n (num_layers*dir, B, H)
    out, h_n = self.gru(x)

    # h_last to retrieve last hidden state for every window
    h_last = h_n[self.dir_mult:, :, :] # (dir, B, H)
    h_last = h_last.transpose(0, 1).reshape(x.size(0), -1) # (B, H*dir)

    # Heads (B, H*dir) --> (B,) and (B,,)

    y_now = self.head_now(h_last).squeeze(-1)
    y_fut = self.head_fut(h_last).squeeze(-1)

    if return_hidden:
      return y_now, y_fut, h_last
    return y_now, y_fut

## Loss, Optimizer, and Training Loop

Techniques used:
- weight decay: pulls gradients closer to 0 to prevent overfitting
- gradient clipping: minimises increase in gradient to prevent gradient explosion
- cosine scheduler: beyond warmup, lr decreases per epoch

### TrainConfig

In [None]:
import math
import os
from dataclasses import dataclass
import torch
import torch.nn as nn
from torch.cuda.amp import autocast, GradScaler

# hold all important info
@dataclass
class TrainConfig:
  lr: float = 1e-3
  weight_decay: float = 1e-4 # strength of 'pull' of gradients towards 0
  lambda_fut: float = 1.0
  max_epochs: int = 40
  grad_clip: float = 1.0 # minimises increase in gradient to prevent grad explosion
  use_amp: bool = True # faster training on gpu due to FP16 and FP32 operations
  early_stopping: bool = False
  patience: int = 6
  scheduler: str = "cosine" # lr gradually decreases per epoch
  warmup_epochs: int = 2
  step_size: int = 15 # lr decreases by x0.1 per 15 epochs
  gamma: float = 0.1
  ckpt_dir: str = "/content/drive/MyDrive/PPG_DaLia/checkpoints"
  device: str = "cuda" if torch.cuda.is_available else "cpu"

### MAE Loss (per head)

L = L_now + Î» * L_fut

In [None]:
mae = nn.L1Loss()

def multitask_loss(y_now_pred, y_fut_pred, y_now_true, y_fut_true, lambda_fut=1.0):
  l_now = mae(y_now_pred, y_now_true)
  l_fut = mae(y_fut_pred, y_fut_true)
  return l_now + lambda_fut * l_fut, l_now.item(), l_fut.item()

### Scheduler

Cosine, Step, or None

In [None]:
def build_scheduler(optimizer, cfg: TrainConfig, steps_per_epoch: int):
  sched_main = None

  # cosine
  if cfg.scheduler == "cosine":
    sched_main = torch.optim.lr_scheduler.CosineAnnealingLR(
        optimizer, T_max=cfg.max_epochs
    )

  # step
  elif cfg.scheduler == "step":
    sched_main = torch.optim.lr_scheduler.StepLR(
        optimizer, step_size=cfg.step_size, gamma=cfg.gamma
    )

  # none
  elif cfg.scheduler == "none":
    sched_main = None

  else:
    raise ValueError("scheduler must be one of: 'cosine', 'step', 'none'")

  # lr during warmup
  if cfg.warmup_epochs and cfg.warmup_epochs > 0:
    # how many potential updates under 'warmup'
    total_warm = cfg.warmup_epochs * steps_per_epoch

    # LR calculation
    def lr_lambda(step):
      if step >= total_warm:
        return 1.0 # 100% of base LR
      return max(1e-8, (step + 1) / float(total_warm)) # proportion of base LR given current step

    sched_warm = torch.optim.lr_scheduler.LambdaLR(optimizer, lr_lambda=lr_lambda)

    return sched_warm, sched_main
  else:
    return None, sched_main

### Train Loop

In [None]:
def train_one_epoch(model, loader, optimizer, scaler, cfg: TrainConfig):
  model.train()
  total, sum_loss, sum_now, sum_fut = 0, 0.0, 0.0, 0.0

  for xb, y_now, y_fut in loader:

    # device agnostic; non_blocking speeds up CPU -> GPU transfer
    xb = xb.to(cfg.device, non_blocking=True)
    y_now = y_now.to(cfg.device, non_blocking=True)
    y_fut = y_fut.to(cfg.device, non_blocking=True)

    # zero grad
    optimizer.zero_grad(set_to_none=True)

    # FP16 and FP32 operations
    if cfg.use_amp:
      with autocast():
        # forward
        y_now_pred, y_fut_pred = model(xb)
        # loss
        loss, l_now, l_fut = multitask_loss(y_now_pred, y_fut_pred, y_now, y_fut, cfg.lambda_fut)

      # scale gradients up through scaled loss due to FP16 having a worse limit on small numbers
      # backprop
      scaler.scale(loss).backward()

      # bring gradients back down to true scale before clipping (now on FP32)
      if cfg.grad_clip is not None:
        scaler.unscale_(optimizer)
        torch.nn.utils.clip_grad_norm_(model.parameters(), cfg.grad_clip)

      # gradient descent
      scaler.step(optimizer)

      # adjust scaling factor, assuming gradients are still finite (not NaN or Inf)
      scaler.update()

    # F32 operations
    else:
      # forward
      y_now_pred, y_fut_pred = model(xb)
      # loss
      loss, l_now, l_fut = multitask_loss(y_now_pred, y_fut_pred, y_now, y_fut, cfg.lambda_fut)
      # backprop
      loss.backward()

      # gradient clipping
      if cfg.grad_clip is not None:
        torch.nn.utils.clip_grad_norm(model.parameters(), cfg.grad_clip)

      # gradient descent
      optimizer.step()


    bs = xb.size(0) # number of windows (batch size)
    total += bs
    sum_loss += loss.item() * bs # as loss takes average, we multiply by no. windows
    sum_now += l_now * bs
    sum_fut += l_fut * bs

  return {
      "loss": sum_loss / max(1, total),
      "mae_now": sum_now / max(1, total),
      "mae_fut": sum_fut / max(1, total)
  }

### Evaluation Loop

In [None]:
def eval_one_epoch(model, loader, cfg: TrainConfig):
  model.eval()
  total, sum_loss, sum_now, sum_fut = 0, 0.0, 0.0, 0.0

  for xb, y_now, y_fut in loader:

    # device agnostic
    xb = xb.to(cfg.device, non_blocking=True)
    y_now = y_now.to(cfg.device, non_blocking=True)
    y_fut = y_fut.to(cfg.device, non_blocking=True)

    # forward
    y_now_pred, y_fut_pred = model(xb)
    # loss
    loss, l_now, l_fut = multitask_loss(y_now_pred, y_fut_pred, y_now, y_fut, cfg.lambda_fut)

    bs = xb.size(0) # number of windows (batch size)
    total += bs
    sum_loss += loss.item() * bs # as loss takes average, we multiply by no. windows
    sum_now += l_now * bs
    sum_fut += l_fut * bs

  return {
      "loss": sum_loss / max(1, total),
      "mae_now": sum_now / max(1, total),
      "mae_fut": sum_fut / max(1, total)
  }

### LOSO Fold

15 subjects (as tests) will be looped through, performing `fit_fold()`

In [None]:
def fit_fold(model, train_loader, val_loader, cfg: TrainConfig, fold_tag="Sx", iterator=0):

  # create checkpoint folder
  os.makedirs(cfg.ckpt_dir, exist_ok=True)

  # model and optimizer
  model = model.to(cfg.device)
  optimizer = torch.optim.AdamW(model.parameters(), lr=cfg.lr, weight_decay=cfg.weight_decay)

  scaler = GradScaler(enabled=cfg.use_amp)

  # scheduler
  steps_per_epoch = max(1, len(train_loader))
  warm_sched, main_sched = build_scheduler(optimizer, cfg, steps_per_epoch)
  global_step = 0

  best_val = float("inf")
  best_path = os.path.join(cfg.ckpt_dir, f"best_{fold_tag}_{iterator}.pt")

  patience_left = cfg.patience

  for epoch in range(1, cfg.max_epochs + 1):

    # warmup step-level scheduler
    if warm_sched is not None:
      for _ in range(steps_per_epoch):
        warm_sched.step()
        global_step += 1

    # train and val
    tr = train_one_epoch(model, train_loader, optimizer, scaler, cfg)
    va = eval_one_epoch(model, val_loader, cfg)

    # epoch-level scheduler
    if main_sched is not None:
      main_sched.step()

    # val_MAE
    val_metric = va["mae_fut"]
    improved = val_metric < best_val - 1e-6
    # save and reset patience if improvement
    if improved:
      best_val = val_metric
      torch.save({"model": model.state_dict(), "cfg": cfg.__dict__}, best_path)
      patience_left = cfg.patience

    # performance log
    print(f"Epoch {epoch:03d} | {'** best **' if improved else ''} "
          f"train loss {tr['loss']:.4f} (now {tr['mae_now']:.3f}, fut {tr['mae_fut']:.3f}) | "
          f"val loss {va['loss']:.4f} (now {va['mae_now']:.3f}, fut {va['mae_fut']:.3f}) | ")

    # early stopping
    if cfg.early_stopping and (epoch >= cfg.warmup_epochs) and (not improved):
      patience_left -= 1
      if patience_left <= 0:
        print("Early stopping triggered")
        break

  # load best weights for downstream test
  if os.path.exists(best_path):
    state = torch.load(best_path, map_location=cfg.device)
    model.load_state_dict(state["model"])
  else:
    print("Warning: best checkpoint not found; using last epoch weights")

  return model, best_val

### Running Folds End-to-End

- produce the train/val/test dataloader for a  particular test subject using `build_loso_fold_with_activity()`
- instantiate `MultiTaskGRU`
- train with `fit_fold()`

### (PRACTICE) Running One LOSO Fold [TEST: SUBJECT 1]

- Forecasting to be 60 seconds (H_segments=30); good balance between feasibility and usefulness

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=1,
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=40,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=2,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S1")
print(f"Best validation forecast MAE (S1 fold): {best_val:.4f}")

[Fold S1] Train 47188 | Val 11874 | Test 4573 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 88.7476 (now 43.935, fut 44.813) | val loss 42.0691 (now 21.559, fut 20.510) | 
Epoch 002 | ** best ** train loss 49.1944 (now 28.575, fut 20.619) | val loss 33.3769 (now 17.212, fut 16.165) | 
Epoch 003 | ** best ** train loss 34.4689 (now 18.038, fut 16.431) | val loss 31.7877 (now 16.551, fut 15.236) | 
Epoch 004 | ** best ** train loss 32.1864 (now 16.700, fut 15.487) | val loss 30.1439 (now 15.752, fut 14.391) | 
Epoch 005 | ** best ** train loss 31.1174 (now 16.179, fut 14.938) | val loss 27.9135 (now 14.604, fut 13.309) | 
Epoch 006 | ** best ** train loss 29.5468 (now 15.302, fut 14.245) | val loss 25.9939 (now 13.518, fut 12.476) | 
Epoch 007 |  train loss 27.1259 (now 13.851, fut 13.275) | val loss 26.8550 (now 13.819, fut 13.036) | 
Epoch 008 | ** best ** train loss 23.3826 (now 11.560, fut 11.822) | val loss 24.3191 (now 12.366, fut 11.953) | 
Epoch 009 | ** best ** train loss 22.0379 (now 10.885, fut 11.153) | val loss 23.7879 (now 12.294,

Best val_MAE for S2-15: 19.7259

- now val_MAE: 10.143
- fut val_MAE: 9.583

## Run 15 LOSOs w/ 100 epochs

notes:
- notice from first test subjects that there is overfitting occurring beyond mid-late epochs (50-70).
- in response maybe: increase dropout, increase weight decay...

### [TEST: SUBJECT 1]

Updates:
- epochs: 100
- warmup_epochs: 5

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=1,
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=100,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=5,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S1", iterator=1)
print(f"Best validation forecast MAE (S1 fold): {best_val:.4f}")

[Fold S1] Train 47188 | Val 11874 | Test 4573 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 150.6533 (now 75.387, fut 75.267) | val loss 99.9169 (now 50.842, fut 49.075) | 
Epoch 002 | ** best ** train loss 37.9861 (now 18.865, fut 19.121) | val loss 34.0173 (now 17.149, fut 16.869) | 
Epoch 003 | ** best ** train loss 32.1656 (now 15.800, fut 16.366) | val loss 29.7021 (now 15.078, fut 14.624) | 
Epoch 004 | ** best ** train loss 26.2636 (now 12.968, fut 13.295) | val loss 25.8470 (now 13.108, fut 12.739) | 
Epoch 005 | ** best ** train loss 22.4473 (now 11.028, fut 11.419) | val loss 22.8579 (now 11.523, fut 11.335) | 
Epoch 006 |  train loss 20.8052 (now 10.190, fut 10.615) | val loss 23.2427 (now 11.901, fut 11.342) | 
Epoch 007 | ** best ** train loss 19.8035 (now 9.646, fut 10.158) | val loss 21.8258 (now 11.143, fut 10.683) | 
Epoch 008 | ** best ** train loss 18.8489 (now 9.162, fut 9.687) | val loss 21.5510 (now 10.942, fut 10.609) | 
Epoch 009 | ** best ** train loss 18.1896 (now 8.835, fut 9.355) | val loss 20.5453 (now 10.287, fut

Subject 1

Best val_MAE for S2-15: 17.1195 (100 EPOCHS; reached at epoch 51)

- now val_MAE: 8.145
- fut val_MAE: 8.9748

notes:
- overfitting occurring beyond epoch 51 (decreasing train loss while stagnating val loss)

### [TEST: SUBJECT 2]

- epochs: 100
- warmup epochs: 5

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=2, # test: subject 2
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=100,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=5,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S2", iterator=0)
print(f"Best validation forecast MAE (S2 fold): {best_val:.4f}")

[Fold S2] Train 47590 | Val 11976 | Test 4069 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 149.8399 (now 75.048, fut 74.792) | val loss 101.6944 (now 51.827, fut 49.868) | 
Epoch 002 | ** best ** train loss 40.3009 (now 20.120, fut 20.181) | val loss 33.1882 (now 16.719, fut 16.469) | 
Epoch 003 | ** best ** train loss 33.2284 (now 16.098, fut 17.130) | val loss 32.2436 (now 16.336, fut 15.908) | 
Epoch 004 | ** best ** train loss 28.0981 (now 13.724, fut 14.374) | val loss 27.4649 (now 13.805, fut 13.660) | 
Epoch 005 | ** best ** train loss 25.8786 (now 12.573, fut 13.305) | val loss 25.7128 (now 13.051, fut 12.662) | 
Epoch 006 | ** best ** train loss 24.0947 (now 11.702, fut 12.393) | val loss 23.6987 (now 11.874, fut 11.825) | 
Epoch 007 |  train loss 23.5538 (now 11.369, fut 12.185) | val loss 23.8237 (now 11.964, fut 11.859) | 
Epoch 008 | ** best ** train loss 23.2883 (now 11.245, fut 12.044) | val loss 23.2498 (now 11.676, fut 11.574) | 
Epoch 009 | ** best ** train loss 21.6460 (now 10.481, fut 11.165) | val loss 22.7965 (now 11.45

Subject 2

Best val_MAE for S1, S3-15: 19.1966 (100 EPOCHS; reached at epoch 68)

- now val_MAE: 9.473
- fut val_MAE: 9.7235

notes:
- overfitting occurring beyond epoch 68 (decreasing train loss while stagnating val loss)

### [TEST: SUBJECT 3]

- epochs: 100
- warmup epochs: 5

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=3, # test: subject 3
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=100,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=5,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S3", iterator=0)
print(f"Best validation forecast MAE (S3 fold): {best_val:.4f}")

[Fold S3] Train 47376 | Val 11922 | Test 4337 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 148.6961 (now 74.631, fut 74.065) | val loss 98.6413 (now 50.466, fut 48.175) | 
Epoch 002 | ** best ** train loss 39.2944 (now 19.621, fut 19.674) | val loss 32.4865 (now 16.219, fut 16.268) | 
Epoch 003 | ** best ** train loss 32.8043 (now 15.511, fut 17.293) | val loss 30.9562 (now 15.448, fut 15.508) | 
Epoch 004 | ** best ** train loss 27.7383 (now 13.423, fut 14.315) | val loss 27.9904 (now 13.954, fut 14.037) | 
Epoch 005 | ** best ** train loss 26.7333 (now 12.793, fut 13.940) | val loss 26.4039 (now 13.049, fut 13.355) | 
Epoch 006 | ** best ** train loss 25.9851 (now 12.368, fut 13.617) | val loss 24.6305 (now 12.105, fut 12.526) | 
Epoch 007 |  train loss 24.1746 (now 11.544, fut 12.631) | val loss 24.9675 (now 12.220, fut 12.748) | 
Epoch 008 | ** best ** train loss 23.3610 (now 11.107, fut 12.254) | val loss 23.2147 (now 11.539, fut 11.676) | 
Epoch 009 | ** best ** train loss 22.1782 (now 10.567, fut 11.611) | val loss 22.9645 (now 11.403

Subject 3

Best val_MAE for S1-2, S4-15: 19.2010 (100 EPOCHS; reached at epoch 96)

- now val_MAE: 9.294
- fut val_MAE: 9.9069

notes:
- subject 3's data seems a little bit more difficult to decipher compared to 2 and 3 given the greater epochs spent in the mid 10s train_loss range.
- slightly stronger overfitting than s1 and s2

### [TEST: SUBJECT 4]

- epochs: 100
- warmup epochs: 5

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=4, # test: subject 4
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=100,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=5,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S4", iterator=0)
print(f"Best validation forecast MAE (S4 fold): {best_val:.4f}")

[Fold S4] Train 47212 | Val 11881 | Test 4542 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 148.9909 (now 74.213, fut 74.778) | val loss 99.2200 (now 49.710, fut 49.510) | 
Epoch 002 | ** best ** train loss 39.0780 (now 19.312, fut 19.766) | val loss 32.9716 (now 16.429, fut 16.542) | 
Epoch 003 | ** best ** train loss 35.7216 (now 17.538, fut 18.184) | val loss 29.1901 (now 14.673, fut 14.518) | 
Epoch 004 | ** best ** train loss 30.0897 (now 15.037, fut 15.053) | val loss 28.6126 (now 14.432, fut 14.180) | 
Epoch 005 | ** best ** train loss 28.9440 (now 14.544, fut 14.400) | val loss 26.3681 (now 13.257, fut 13.111) | 
Epoch 006 | ** best ** train loss 28.0555 (now 14.069, fut 13.987) | val loss 25.5146 (now 12.925, fut 12.590) | 
Epoch 007 | ** best ** train loss 26.4790 (now 13.269, fut 13.210) | val loss 25.5437 (now 13.159, fut 12.385) | 
Epoch 008 | ** best ** train loss 25.8299 (now 12.874, fut 12.956) | val loss 24.7512 (now 12.516, fut 12.235) | 
Epoch 009 | ** best ** train loss 24.7055 (now 12.350, fut 12.356) | val loss 23.7347 (

Subject 4

Best val_MAE for S1-3, S5-15: 19.3839 (100 EPOCHS; reached at epoch 70)

- now val_MAE: 9.633
- fut val_MAE: 9.7505

notes:
- overfitting problem once more
- data seems to be more difficult to decipher compared to 1 and 2, however easier than 3.

### [TEST: SUBJECT 5]

- epochs: 100
- warmup epochs: 5

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=5, # test: subject 5
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=100,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=5,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S5", iterator=0)
print(f"Best validation forecast MAE (S5 fold): {best_val:.4f}")

[Fold S5] Train 47153 | Val 11867 | Test 4619 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 143.7757 (now 71.393, fut 72.383) | val loss 94.5351 (now 46.668, fut 47.867) | 
Epoch 002 | ** best ** train loss 34.6932 (now 16.977, fut 17.716) | val loss 31.2007 (now 15.616, fut 15.585) | 
Epoch 003 | ** best ** train loss 33.5203 (now 18.061, fut 15.459) | val loss 26.2170 (now 13.217, fut 13.000) | 
Epoch 004 | ** best ** train loss 25.9132 (now 13.035, fut 12.878) | val loss 25.5372 (now 12.842, fut 12.695) | 
Epoch 005 | ** best ** train loss 23.7485 (now 11.848, fut 11.900) | val loss 23.1579 (now 11.619, fut 11.539) | 
Epoch 006 | ** best ** train loss 21.6733 (now 10.728, fut 10.946) | val loss 22.4039 (now 11.239, fut 11.165) | 
Epoch 007 |  train loss 21.0493 (now 10.379, fut 10.670) | val loss 25.6169 (now 13.223, fut 12.394) | 
Epoch 008 |  train loss 21.6648 (now 10.833, fut 10.832) | val loss 23.8985 (now 12.180, fut 11.718) | 
Epoch 009 | ** best ** train loss 20.7026 (now 10.297, fut 10.406) | val loss 22.0668 (now 11.431, fut 10.6

Subject 5

Best val_MAE for S1-4, S6-15: 15.9548 (100 EPOCHS; reached at epoch 31)

- now val_MAE: 7.364
- fut val_MAE: 8.5903

notes:
- this dataset seems to be the most decipherable by the model (so far), given the low train and val loss relative to previous 4 (2-3 val loss points lower). potentially indicating that subject 5's data is particularly difficult...(differing physiology, mortion artefacts, activity effort)?
- again overfitting beyond epoch 31...

### [TEST: SUBJECT 6] - note S6 only has 1.5/2.5 hours worth of recorded data

- epochs: 100
- warmup epochs: 5

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=6, # test: subject 6
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=100,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=5,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S6", iterator=0)
print(f"Best validation forecast MAE (S6 fold): {best_val:.4f}")

[Fold S6] Train 48762 | Val 12269 | Test 2592 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 146.0683 (now 73.131, fut 72.938) | val loss 95.3617 (now 49.047, fut 46.315) | 
Epoch 002 | ** best ** train loss 37.5699 (now 18.820, fut 18.750) | val loss 43.3735 (now 21.669, fut 21.705) | 
Epoch 003 | ** best ** train loss 30.9593 (now 15.203, fut 15.756) | val loss 28.9218 (now 14.539, fut 14.383) | 
Epoch 004 | ** best ** train loss 26.2245 (now 12.859, fut 13.366) | val loss 25.1385 (now 12.679, fut 12.460) | 
Epoch 005 | ** best ** train loss 22.6684 (now 11.120, fut 11.549) | val loss 22.8228 (now 11.580, fut 11.243) | 
Epoch 006 | ** best ** train loss 21.5624 (now 10.521, fut 11.041) | val loss 21.0827 (now 10.570, fut 10.513) | 
Epoch 007 | ** best ** train loss 20.3672 (now 9.930, fut 10.437) | val loss 20.9132 (now 10.486, fut 10.427) | 
Epoch 008 | ** best ** train loss 20.6663 (now 10.067, fut 10.600) | val loss 20.4775 (now 10.334, fut 10.144) | 
Epoch 009 | ** best ** train loss 21.3337 (now 10.374, fut 10.960) | val loss 19.8737 (n

Subject 6

Best val_MAE for S1-5, S7-15: 17.3926 (100 EPOCHS; reached at epoch 44)

- now val_MAE: 8.552
- fut val_MAE: 8.8401

notes:
- overfitting occurring again (beyond circa epoch 65)

### [TEST: SUBJECT 7]

- epochs: 100
- warmup epochs: 5

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=7, # test: subject 7
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=100,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=5,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S7", iterator=0)
print(f"Best validation forecast MAE (S7 fold): {best_val:.4f}")

[Fold S7] Train 47136 | Val 11861 | Test 4638 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 150.1076 (now 74.593, fut 75.515) | val loss 102.1986 (now 50.896, fut 51.302) | 
Epoch 002 | ** best ** train loss 41.6237 (now 20.527, fut 21.096) | val loss 46.4067 (now 23.365, fut 23.042) | 
Epoch 003 | ** best ** train loss 36.8147 (now 18.994, fut 17.820) | val loss 31.8956 (now 16.010, fut 15.886) | 
Epoch 004 | ** best ** train loss 30.6596 (now 15.478, fut 15.182) | val loss 28.4163 (now 14.315, fut 14.101) | 
Epoch 005 | ** best ** train loss 28.5025 (now 14.324, fut 14.178) | val loss 27.2175 (now 13.786, fut 13.431) | 
Epoch 006 |  train loss 26.9465 (now 13.580, fut 13.366) | val loss 28.6623 (now 14.470, fut 14.192) | 
Epoch 007 | ** best ** train loss 28.2265 (now 14.247, fut 13.979) | val loss 24.8476 (now 12.562, fut 12.286) | 
Epoch 008 | ** best ** train loss 26.0212 (now 13.171, fut 12.850) | val loss 24.4218 (now 12.519, fut 11.903) | 
Epoch 009 | ** best ** train loss 26.5258 (now 13.426, fut 13.100) | val loss 23.4540 (now 12.15

Subject 7

Best val_MAE for S1-6, S8-15: 18.6568 (100 EPOCHS; reached at epoch 51)

- now val_MAE: 9.306
- fut val_MAE: 9.3504

notes:
- weird spike in loss at circa epoch 58, however eventually began to decrease again. potentially due to gradient overshooting on outlier windows...?

### [TEST: SUBJECT 8]

- epochs: 100
- warmup epochs: 5

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=8, # test: subject 8
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=100,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=5,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S8", iterator=0)
print(f"Best validation forecast MAE (S8 fold): {best_val:.4f}")

[Fold S8] Train 47641 | Val 11987 | Test 4007 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 149.8617 (now 75.122, fut 74.740) | val loss 98.9670 (now 50.775, fut 48.192) | 
Epoch 002 | ** best ** train loss 39.6837 (now 19.898, fut 19.786) | val loss 32.4760 (now 16.237, fut 16.239) | 
Epoch 003 |  train loss 35.6752 (now 16.113, fut 19.562) | val loss 32.3265 (now 16.060, fut 16.266) | 
Epoch 004 | ** best ** train loss 30.5491 (now 14.744, fut 15.805) | val loss 28.7630 (now 14.200, fut 14.563) | 
Epoch 005 | ** best ** train loss 26.6215 (now 12.727, fut 13.895) | val loss 24.5898 (now 12.278, fut 12.311) | 
Epoch 006 | ** best ** train loss 23.5019 (now 11.335, fut 12.167) | val loss 23.3568 (now 11.502, fut 11.855) | 
Epoch 007 | ** best ** train loss 22.1649 (now 10.629, fut 11.536) | val loss 23.3406 (now 11.497, fut 11.844) | 
Epoch 008 | ** best ** train loss 21.0653 (now 10.125, fut 10.940) | val loss 20.6984 (now 10.198, fut 10.500) | 
Epoch 009 |  train loss 19.9689 (now 9.526, fut 10.443) | val loss 21.0938 (now 10.261, fut 10.83

Subject 8

Best val_MAE for S1-7, S9-15: 15.5903 (100 EPOCHS; reached at epoch 42)

- now val_MAE: 6.896
- fut val_MAE: 8.6939

notes:
- dataset without subject 8 is well decipherable and best so far - similar to results from subject 5.
- overfitting

### [TEST: SUBJECT 9]

- epochs: 100
- warmup epochs: 5

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=9, # test: subject 9
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=100,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=5,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S9", iterator=0)
print(f"Best validation forecast MAE (S9 fold): {best_val:.4f}")

[Fold S9] Train 47449 | Val 11939 | Test 4247 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 149.3989 (now 74.341, fut 75.058) | val loss 99.7991 (now 49.976, fut 49.823) | 
Epoch 002 | ** best ** train loss 40.0179 (now 19.835, fut 20.183) | val loss 36.8562 (now 18.535, fut 18.321) | 
Epoch 003 | ** best ** train loss 31.8574 (now 15.890, fut 15.967) | val loss 31.6492 (now 15.846, fut 15.803) | 
Epoch 004 | ** best ** train loss 28.5001 (now 14.051, fut 14.449) | val loss 31.2060 (now 15.724, fut 15.482) | 
Epoch 005 | ** best ** train loss 29.7726 (now 14.738, fut 15.035) | val loss 30.1728 (now 15.154, fut 15.019) | 
Epoch 006 | ** best ** train loss 26.8577 (now 13.270, fut 13.588) | val loss 25.7356 (now 13.064, fut 12.672) | 
Epoch 007 | ** best ** train loss 24.5772 (now 12.181, fut 12.396) | val loss 24.2538 (now 12.196, fut 12.058) | 
Epoch 008 | ** best ** train loss 24.2524 (now 12.027, fut 12.226) | val loss 23.8139 (now 12.060, fut 11.754) | 
Epoch 009 | ** best ** train loss 22.9945 (now 11.410, fut 11.584) | val loss 22.3662 (

Subject 9

Best val_MAE for S1-8, S10-15: 15.5022 (100 EPOCHS; reached at epoch 43)

- now val_MAE: 7.017
- fut val_MAE: 8.4853

notes:
- overfitting

### [TEST: SUBJECT 10]

- epochs: 100
- warmup epochs: 5

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=10, # test: subject 10
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=100,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=5,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S10", iterator=0)
print(f"Best validation forecast MAE (S10 fold): {best_val:.4f}")

[Fold S10] Train 46612 | Val 11732 | Test 5291 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 148.7581 (now 74.676, fut 74.082) | val loss 99.6580 (now 51.681, fut 47.977) | 
Epoch 002 | ** best ** train loss 38.6867 (now 19.490, fut 19.197) | val loss 33.6226 (now 16.854, fut 16.768) | 
Epoch 003 |  train loss 33.7758 (now 15.118, fut 18.658) | val loss 38.2374 (now 15.880, fut 22.357) | 
Epoch 004 | ** best ** train loss 33.5676 (now 15.568, fut 18.000) | val loss 31.2262 (now 14.842, fut 16.385) | 
Epoch 005 | ** best ** train loss 30.4904 (now 14.225, fut 16.265) | val loss 30.7558 (now 14.895, fut 15.860) | 
Epoch 006 | ** best ** train loss 29.6940 (now 13.791, fut 15.903) | val loss 27.5895 (now 12.790, fut 14.799) | 
Epoch 007 | ** best ** train loss 27.4307 (now 12.697, fut 14.733) | val loss 25.8243 (now 12.122, fut 13.702) | 
Epoch 008 | ** best ** train loss 27.2823 (now 12.492, fut 14.791) | val loss 24.9932 (now 11.490, fut 13.504) | 
Epoch 009 | ** best ** train loss 27.2090 (now 12.403, fut 14.806) | val loss 23.6349 (now 10.788

Subject 10 - worst performer

Best val_MAE for S1-9, S11-15: 20.6867 (100 EPOCHS; reached at epoch 39)

- now val_MAE: 9.333
- fut val_MAE: 11.3532

notes:
- worst performer; only val_MAE to break 20 mark
- particularly difficult dataset w/o subject 10

### [TEST: SUBJECT 11]

- epochs: 100
- warmup epochs: 5

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=11, # test: subject 11
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=100,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=5,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S11", iterator=0)
print(f"Best validation forecast MAE (S11 fold): {best_val:.4f}")

[Fold S11] Train 47257 | Val 11891 | Test 4491 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 145.9332 (now 72.439, fut 73.494) | val loss 94.4938 (now 46.982, fut 47.512) | 
Epoch 002 | ** best ** train loss 38.1745 (now 18.865, fut 19.310) | val loss 32.2486 (now 16.340, fut 15.909) | 
Epoch 003 | ** best ** train loss 30.3349 (now 15.646, fut 14.689) | val loss 30.5356 (now 16.390, fut 14.146) | 
Epoch 004 | ** best ** train loss 26.0386 (now 13.065, fut 12.974) | val loss 25.7547 (now 13.133, fut 12.621) | 
Epoch 005 | ** best ** train loss 24.5580 (now 12.368, fut 12.190) | val loss 25.2481 (now 12.935, fut 12.313) | 
Epoch 006 | ** best ** train loss 21.3857 (now 10.571, fut 10.815) | val loss 22.6260 (now 11.381, fut 11.245) | 
Epoch 007 | ** best ** train loss 19.1336 (now 9.314, fut 9.820) | val loss 21.2969 (now 10.696, fut 10.601) | 
Epoch 008 | ** best ** train loss 18.8576 (now 9.193, fut 9.665) | val loss 21.0023 (now 10.507, fut 10.495) | 
Epoch 009 | ** best ** train loss 18.2206 (now 8.849, fut 9.371) | val loss 20.8369 (now 10

Subject 11

Best val_MAE for S1-10, S12-15: 18.4446 (100 EPOCHS; reached at epoch 93)

- now val_MAE: 9.150
- fut val_MAE: 9.2949

notes:
- overfitting

### [TEST: SUBJECT 12] - REDACTED DUE TO HUMAN ERROR

- epochs: 100
- warmup epochs: 5

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=12, # test: subject 12
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=100,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=5,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S12", iterator=0)
print(f"Best validation forecast MAE (S12 fold): {best_val:.4f}")

[Fold S12] Train 47706 | Val 12005 | Test 3924 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 151.1867 (now 76.038, fut 75.149) | val loss 101.4652 (now 53.094, fut 48.371) | 
Epoch 002 | ** best ** train loss 41.1511 (now 20.842, fut 20.309) | val loss 63.3922 (now 30.649, fut 32.744) | 
Epoch 003 | ** best ** train loss 34.2925 (now 15.701, fut 18.591) | val loss 30.3859 (now 15.298, fut 15.087) | 
Epoch 004 | ** best ** train loss 27.9781 (now 13.372, fut 14.607) | val loss 28.5744 (now 14.250, fut 14.324) | 
Epoch 005 | ** best ** train loss 24.5729 (now 11.704, fut 12.869) | val loss 25.1586 (now 12.639, fut 12.520) | 
Epoch 006 | ** best ** train loss 20.7012 (now 10.118, fut 10.584) | val loss 23.5192 (now 11.784, fut 11.735) | 
Epoch 007 | ** best ** train loss 19.7370 (now 9.624, fut 10.113) | val loss 22.0691 (now 11.026, fut 11.043) | 
Epoch 008 | ** best ** train loss 19.3201 (now 9.382, fut 9.938) | val loss 21.8305 (now 10.955, fut 10.876) | 
Epoch 009 | ** best ** train loss 18.6459 (now 9.032, fut 9.613) | val loss 20.7463 (now 

Subject 12

Best val_MAE for S1-11, S13-15: 17.8257 (100 EPOCHS; reached at epoch 48)

- now val_MAE: 8.769
- fut val_MAE: 9.0563

notes:
- stagnation around high 11/low 12 mark for train_loss

### [TEST: SUBJECT 12] - human error forgot to update so this is now the training loop for 12

- epochs: 100
- warmup epochs: 5

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=12, # test: subject 12
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=100,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=5,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S12", iterator=0)
print(f"Best validation forecast MAE (S12 fold): {best_val:.4f}")

[Fold S12] Train 47706 | Val 12005 | Test 3924 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 149.9472 (now 74.693, fut 75.255) | val loss 97.9553 (now 49.211, fut 48.744) | 
Epoch 002 | ** best ** train loss 40.4445 (now 20.038, fut 20.407) | val loss 54.1719 (now 27.177, fut 26.995) | 
Epoch 003 | ** best ** train loss 33.9909 (now 16.883, fut 17.108) | val loss 29.0196 (now 14.531, fut 14.489) | 
Epoch 004 | ** best ** train loss 28.7043 (now 14.069, fut 14.635) | val loss 27.4501 (now 13.832, fut 13.618) | 
Epoch 005 | ** best ** train loss 28.3835 (now 13.965, fut 14.418) | val loss 25.7551 (now 12.852, fut 12.903) | 
Epoch 006 | ** best ** train loss 28.1610 (now 13.875, fut 14.285) | val loss 24.6664 (now 12.430, fut 12.237) | 
Epoch 007 |  train loss 25.9201 (now 12.775, fut 13.146) | val loss 24.9954 (now 12.657, fut 12.339) | 
Epoch 008 | ** best ** train loss 25.3565 (now 12.475, fut 12.882) | val loss 23.2026 (now 11.604, fut 11.598) | 
Epoch 009 |  train loss 24.2619 (now 11.943, fut 12.319) | val loss 23.6332 (now 11.864, fut 11.7

Subject 12

Best val_MAE for S1-11, S13-15: 18.6178 (100 EPOCHS; reached at epoch 74)

- now val_MAE: 9.229
- fut val_MAE: 9.3889

notes:
- overfitting beyond circa epoch 55

### [TEST: SUBJECT 13]

- epochs: 100
- warmup epochs: 5

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=13, # test: subject 13
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=100,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=5,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S13", iterator=0)
print(f"Best validation forecast MAE (S13 fold): {best_val:.4f}")

[Fold S13] Train 47217 | Val 11883 | Test 4535 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 148.5371 (now 73.675, fut 74.862) | val loss 98.1190 (now 48.127, fut 49.992) | 
Epoch 002 | ** best ** train loss 39.1535 (now 19.142, fut 20.011) | val loss 37.5101 (now 18.878, fut 18.633) | 
Epoch 003 |  train loss 34.6419 (now 18.068, fut 16.574) | val loss 44.9472 (now 22.835, fut 22.112) | 
Epoch 004 | ** best ** train loss 33.5700 (now 16.891, fut 16.679) | val loss 30.6985 (now 15.773, fut 14.925) | 
Epoch 005 | ** best ** train loss 30.0206 (now 15.262, fut 14.759) | val loss 28.8703 (now 14.959, fut 13.911) | 
Epoch 006 | ** best ** train loss 26.9987 (now 13.666, fut 13.333) | val loss 24.0821 (now 12.395, fut 11.687) | 
Epoch 007 | ** best ** train loss 22.9295 (now 11.355, fut 11.575) | val loss 22.1600 (now 11.320, fut 10.840) | 
Epoch 008 | ** best ** train loss 22.7381 (now 11.355, fut 11.383) | val loss 21.7409 (now 11.079, fut 10.662) | 
Epoch 009 | ** best ** train loss 21.4833 (now 10.617, fut 10.866) | val loss 22.1513 (now 11.532

Subject 13 - highest performer

Best val_MAE for S1-12, S14-15: 14.7858 (100 EPOCHS; reached at epoch 36)

- now val_MAE: 6.607
- fut val_MAE: 8.1784

notes:
- best performing - subject 13 could be particularly difficult to decipher. very high fitness (6) so maybe HR was less reactive to activity than other subjects
- overfitting

### [TEST: SUBJECT 14]

- epochs: 100
- warmup epochs: 5

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=14, # test: subject 14
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=100,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=5,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S14", iterator=0)
print(f"Best validation forecast MAE (S14 fold): {best_val:.4f}")

[Fold S14] Train 47288 | Val 11901 | Test 4446 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 148.1065 (now 74.105, fut 74.001) | val loss 95.9489 (now 49.192, fut 46.757) | 
Epoch 002 | ** best ** train loss 40.8637 (now 20.458, fut 20.406) | val loss 32.5921 (now 16.430, fut 16.163) | 
Epoch 003 | ** best ** train loss 28.2389 (now 13.932, fut 14.307) | val loss 29.2577 (now 14.629, fut 14.629) | 
Epoch 004 | ** best ** train loss 24.5372 (now 12.042, fut 12.495) | val loss 25.4902 (now 12.771, fut 12.719) | 
Epoch 005 | ** best ** train loss 22.1321 (now 10.856, fut 11.276) | val loss 23.1301 (now 11.656, fut 11.474) | 
Epoch 006 | ** best ** train loss 20.6189 (now 10.131, fut 10.488) | val loss 21.6911 (now 10.949, fut 10.742) | 
Epoch 007 | ** best ** train loss 19.4747 (now 9.519, fut 9.956) | val loss 21.2511 (now 10.676, fut 10.575) | 
Epoch 008 | ** best ** train loss 18.4576 (now 8.982, fut 9.475) | val loss 20.1680 (now 10.152, fut 10.016) | 
Epoch 009 |  train loss 17.5156 (now 8.474, fut 9.042) | val loss 20.0787 (now 10.006, fut 

Subject 14

Best val_MAE for S1-13, S15: 16.3928 (100 EPOCHS; reached at epoch 38)

- now val_MAE: 7.755
- fut val_MAE: 8.6375

notes:
- overfitting

### [TEST: SUBJECT 15]

- epochs: 100
- warmup epochs: 5

In [None]:
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"

# build dataloaders
fold = build_loso_fold_with_activity(
    all_subjects_64hz=all_subjects_64hz,
    activity_dir=ACTIVITY_DIR,
    test_sid=15, # test: subject 15
    H_segments=30,
    val_frac=0.2,
    embargo_seconds=8,
    batch_train=128,
    batch_eval=256
)

train_loader = fold["train_loader"]
val_loader = fold["val_loader"]
test_loader = fold["test_loader"]

# instantiate model
model = MultiTaskGRU(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

# cfg
cfg = TrainConfig(
    lr=1e-3,
    weight_decay=1e-4,
    lambda_fut=1.0,
    max_epochs=100,
    grad_clip=1.0,
    use_amp=True,
    early_stopping=False,
    scheduler="cosine",
    warmup_epochs=5,
    ckpt_dir="/content/drive/MyDrive/PPG_DaLia/checkpoints"
)

# train
model, best_val = fit_fold(model, train_loader, val_loader, cfg, fold_tag="S15", iterator=0)
print(f"Best validation forecast MAE (S15 fold): {best_val:.4f}")

[Fold S15] Train 47697 | Val 12002 | Test 3936 | H=30 seg


  scaler = GradScaler(enabled=cfg.use_amp)
  with autocast():


Epoch 001 | ** best ** train loss 148.9155 (now 74.280, fut 74.635) | val loss 96.9368 (now 48.984, fut 47.953) | 
Epoch 002 | ** best ** train loss 37.9582 (now 18.845, fut 19.113) | val loss 38.1755 (now 19.160, fut 19.016) | 
Epoch 003 | ** best ** train loss 33.1620 (now 17.234, fut 15.928) | val loss 30.4458 (now 15.535, fut 14.911) | 
Epoch 004 |  train loss 33.6599 (now 17.145, fut 16.515) | val loss 33.7538 (now 17.517, fut 16.237) | 
Epoch 005 | ** best ** train loss 32.9916 (now 16.806, fut 16.186) | val loss 29.2647 (now 15.126, fut 14.139) | 
Epoch 006 | ** best ** train loss 29.4194 (now 14.944, fut 14.476) | val loss 27.9390 (now 14.351, fut 13.588) | 
Epoch 007 | ** best ** train loss 28.1121 (now 14.296, fut 13.816) | val loss 27.5364 (now 14.510, fut 13.026) | 
Epoch 008 | ** best ** train loss 25.5145 (now 12.796, fut 12.719) | val loss 26.6445 (now 13.664, fut 12.981) | 
Epoch 009 |  train loss 26.7577 (now 13.543, fut 13.215) | val loss 26.8039 (now 13.775, fut 13.0

Subject 15

Best val_MAE for S1-14: 19.8031 (100 EPOCHS; reached at epoch 59)

- now val_MAE: 10.027
- fut val_MAE: 9.7761

notes:
- overfitting

---------------------------------------------------
## VALIDATION SET RESULTS (bpm)

### Average val_MAE: 17.7819
### Average val now_MAE: 8.5187
### Average val fut_MAE: 9.2630

Thoughts:
- There is natural variation in HR between individuals, whether they're performing physical activity or not.
- Literature suggests that HR can differ by 20-30bpm when resting, submaximal activity 5-15bpm, and high-intensity activity by 10-20bpm.
- Therefore, MAE of 8.5 and 9.3 respectively is very strong, as a majority of that error is due to intrinsic differences.

- It would be interesting to implement an MAE per activity to get an understanding where a majority of the error is coming from.

### Calibration

Comparison to Consumer Wearables (vs ECG):
- Reported MAE ~ 7-10 bpm
- MAE at rest: ~ 3-5 bpm
- MAE during exercise: ~ 10-15 bpm

Comparison to Research-grade Wearables (vs ECG):
- Reported MAE ~ 3-6 bpm

-------------------------------------------

*consumer wearables include but are not limited to: Fitbit, Garmin, Apple Watch.

*research-grade wearables include not are not limited to: polar, chest straps, clinical PPG.

## Evaluation

- All state_dicts from highest performing epochs from each training loop will now be ran on their respective test dataset, which is an entire new subject whose data wasn't exposed to during training.
- The results across all test subjects will then be averaged.

In reality, if we were to deploy this, we would train a new model on the same infrastructure etc. but train on all the subject data. The FOSO method is simply to get an understanding of the model's performance on unseen data (15 times), for which we'd expect that performance to replicate on the the deployed model.

### Function for loading saved state_dict into model for evaluation

In [None]:
import os
import torch

def load_model_from_ckpt(ckpt_path, model_cls, model_kwargs=None, device=None):
  """
  ckpt_path: location of checkpoint
  model_cls: class object
  model_kwargs: dict of infrastructure for model's GRU and heads

  loads saved state_dict into model for evaluation
  """
  # asserting infrastructure and device
  model_kwargs = model_kwargs or {}
  device = device or ("cuda" if torch.cuda.is_available() else "cpu")

  # fresh model with same architecture used during training
  model = model_cls(**model_kwargs).to(device) # ** converts the dictionary into keyword args

  # load checkpoint
  state = torch.load(ckpt_path, map_location=device)
  model.load_state_dict(state["model"], strict=True)
  model.eval()

  return model

### Path and Model Infrastructure

In [None]:
# paths
ACTIVITY_DIR = "/content/drive/MyDrive/PPG_DaLia/activities"
CKPT_DIR = "/content/drive/MyDrive/PPG_DaLia/checkpoints"

# training-time infrastructure
model_kwargs = dict(
    input_dim=6,
    hidden_size=96,
    num_layers=2,
    dropout=0.2,
    bidirectional=False,
    head_hidden=64,
    head_dropout=0.1
)

### Folds and Models Dictionaries Setup

In [None]:
# device agnostic
device = "cuda" if torch.cuda.is_available() else "cpu"

folds = {}
models = {}

# for every evaluation (test subject)
for sid in range (1, 16):
  # rebuild LOSO fold to retrieve test loader
  fold = build_loso_fold_with_activity(
      all_subjects_64hz=all_subjects_64hz,
      activity_dir=ACTIVITY_DIR,
      test_sid=sid,
      H_segments=30,
      val_frac=0.2,
      embargo_seconds=8,
      batch_train=128,
      batch_eval=256
  )

  # store fold for each test
  folds[sid] = fold

  # retrieve specific checkpoint
  ckpt_path = os.path.join(CKPT_DIR, f"best_S{sid}_0.pt")
  if not os.path.exists(ckpt_path):
    print(f"[WARM] Missing checkpoint for S{sid}: {ckpt_path}")
    continue

  # instantiate model with correct parameters
  model = load_model_from_ckpt(
      ckpt_path=ckpt_path,
      model_cls=MultiTaskGRU,
      model_kwargs=model_kwargs,
      device=device
  )

  models[sid] = model

[Fold S1] Train 47188 | Val 11874 | Test 4573 | H=30 seg
[Fold S2] Train 47590 | Val 11976 | Test 4069 | H=30 seg
[Fold S3] Train 47376 | Val 11922 | Test 4337 | H=30 seg
[Fold S4] Train 47212 | Val 11881 | Test 4542 | H=30 seg
[Fold S5] Train 47153 | Val 11867 | Test 4619 | H=30 seg
[Fold S6] Train 48762 | Val 12269 | Test 2592 | H=30 seg
[Fold S7] Train 47136 | Val 11861 | Test 4638 | H=30 seg
[Fold S8] Train 47641 | Val 11987 | Test 4007 | H=30 seg
[Fold S9] Train 47449 | Val 11939 | Test 4247 | H=30 seg
[Fold S10] Train 46612 | Val 11732 | Test 5291 | H=30 seg
[Fold S11] Train 47257 | Val 11891 | Test 4491 | H=30 seg
[Fold S12] Train 47706 | Val 12005 | Test 3924 | H=30 seg
[Fold S13] Train 47217 | Val 11883 | Test 4535 | H=30 seg
[Fold S14] Train 47288 | Val 11901 | Test 4446 | H=30 seg
[Fold S15] Train 47697 | Val 12002 | Test 3936 | H=30 seg


### Evaluation Function

In [None]:
def evaluate_test(model, test_loader, device=None):

  device = device or ("cuda" if torch.cuda.is_available() else "cpu")
  model.eval().to(device)

  total, sum_now, sum_fut = 0, 0.0, 0.0

  # loop
  for xb, y_now, y_fut in test_loader:
    # to device
    xb = xb.to(device, non_blocking=True)
    y_now = y_now.to(device, non_blocking=True)
    y_fut = y_fut.to(device, non_blocking=True)

    # forward prop
    y_now_pred, y_fut_pred = model(xb)

    # find MAE loss and accumulate all losses
    sum_now += torch.nn.functional.l1_loss(y_now_pred, y_now, reduction="sum").item()
    sum_fut += torch.nn.functional.l1_loss(y_fut_pred, y_fut, reduction="sum").item()
    total += xb.size(0)

  return {"mae_loss": (sum_now / total) + (sum_fut / total), "mae_now": sum_now / total, "mae_fut": sum_fut / total}

### Evaluation: testing on S1

In [None]:
test_metrics_S1 = evaluate_test(models[1], folds[1]["test_loader"], device)

print(f"[TEST S1] \nmae_loss: {test_metrics_S1["mae_loss"]:.4f}\nmae_now: {test_metrics_S1["mae_now"]:.4f}\nmae_fut: {test_metrics_S1["mae_fut"]:.4f}")

[TEST S1] 
mae_loss: 24.0986
mae_now: 11.3743
mae_fut: 12.7242


### Evaluation: testing on S2

In [None]:
test_metrics_S2 = evaluate_test(models[2], folds[2]["test_loader"], device)

print(f"[TEST S2] \nmae_loss: {test_metrics_S2["mae_loss"]:.4f}\nmae_now: {test_metrics_S2["mae_now"]:.4f}\nmae_fut: {test_metrics_S2["mae_fut"]:.4f}")

[TEST S2] 
mae_loss: 31.0810
mae_now: 15.4181
mae_fut: 15.6629


### Evaluation: testing on S3

In [None]:
test_metrics_S3 = evaluate_test(models[3], folds[3]["test_loader"], device)

print(f"[TEST S3] \nmae_loss: {test_metrics_S3["mae_loss"]:.4f}\nmae_now: {test_metrics_S3["mae_now"]:.4f}\nmae_fut: {test_metrics_S3["mae_fut"]:.4f}")

[TEST S3] 
mae_loss: 23.8951
mae_now: 12.0433
mae_fut: 11.8518


### Evaluation: testing on S4

In [None]:
test_metrics_S4 = evaluate_test(models[4], folds[4]["test_loader"], device)

print(f"[TEST S4] \nmae_loss: {test_metrics_S4["mae_loss"]:.4f}\nmae_now: {test_metrics_S4["mae_now"]:.4f}\nmae_fut: {test_metrics_S4["mae_fut"]:.4f}")

[TEST S4] 
mae_loss: 23.7217
mae_now: 11.4469
mae_fut: 12.2749


### Evaluation: testing on S5 - worst performing

mae_loss of 59.0583 indicates that something about S5's activity data is very different to other subjects - something unusual physiologically? or maybe didn't participate in the activities as properly...

In [None]:
test_metrics_S5 = evaluate_test(models[5], folds[5]["test_loader"], device)

print(f"[TEST S5] \nmae_loss: {test_metrics_S5["mae_loss"]:.4f}\nmae_now: {test_metrics_S5["mae_now"]:.4f}\nmae_fut: {test_metrics_S5["mae_fut"]:.4f}")

[TEST S5] 
mae_loss: 59.0583
mae_now: 28.3316
mae_fut: 30.7267


### Evaluation: testing on S6

- same as S5; S6 reports '1' fitness level

In [None]:
test_metrics_S6 = evaluate_test(models[6], folds[6]["test_loader"], device)

print(f"[TEST S6] \nmae_loss: {test_metrics_S6["mae_loss"]:.4f}\nmae_now: {test_metrics_S6["mae_now"]:.4f}\nmae_fut: {test_metrics_S6["mae_fut"]:.4f}")

[TEST S6] 
mae_loss: 57.0260
mae_now: 28.2792
mae_fut: 28.7468


### Evaluation: testing on S7

In [None]:
test_metrics_S7 = evaluate_test(models[7], folds[7]["test_loader"], device)

print(f"[TEST S7] \nmae_loss: {test_metrics_S7["mae_loss"]:.4f}\nmae_now: {test_metrics_S7["mae_now"]:.4f}\nmae_fut: {test_metrics_S7["mae_fut"]:.4f}")

[TEST S7] 
mae_loss: 21.3194
mae_now: 10.1404
mae_fut: 11.1790


### Evaluation: testing on S8

In [None]:
test_metrics_S8 = evaluate_test(models[8], folds[8]["test_loader"], device)

print(f"[TEST S8] \nmae_loss: {test_metrics_S8["mae_loss"]:.4f}\nmae_now: {test_metrics_S8["mae_now"]:.4f}\nmae_fut: {test_metrics_S8["mae_fut"]:.4f}")

[TEST S8] 
mae_loss: 23.5757
mae_now: 11.0020
mae_fut: 12.5737


### Evaluation: testing on S9 - best performing

S9's physiology seems to best represent the variation amongst the 14 other subjects

In [None]:
test_metrics_S9 = evaluate_test(models[9], folds[9]["test_loader"], device)

print(f"[TEST S9] \nmae_loss: {test_metrics_S9["mae_loss"]:.4f}\nmae_now: {test_metrics_S9["mae_now"]:.4f}\nmae_fut: {test_metrics_S9["mae_fut"]:.4f}")

[TEST S9] 
mae_loss: 20.4885
mae_now: 9.7095
mae_fut: 10.7790


### Evaluation: testing on S10

In [None]:
test_metrics_S10 = evaluate_test(models[10], folds[10]["test_loader"], device)

print(f"[TEST S10] \nmae_loss: {test_metrics_S10["mae_loss"]:.4f}\nmae_now: {test_metrics_S10["mae_now"]:.4f}\nmae_fut: {test_metrics_S10["mae_fut"]:.4f}")

[TEST S10] 
mae_loss: 25.3724
mae_now: 12.0220
mae_fut: 13.3505


### Evaluation: testing on S11

In [None]:
test_metrics_S11 = evaluate_test(models[11], folds[11]["test_loader"], device)

print(f"[TEST S11] \nmae_loss: {test_metrics_S11["mae_loss"]:.4f}\nmae_now: {test_metrics_S11["mae_now"]:.4f}\nmae_fut: {test_metrics_S11["mae_fut"]:.4f}")

[TEST S11] 
mae_loss: 35.8827
mae_now: 17.7575
mae_fut: 18.1252


### Evaluation: testing on S12

In [None]:
test_metrics_S12 = evaluate_test(models[12], folds[12]["test_loader"], device)

print(f"[TEST S12] \nmae_loss: {test_metrics_S12["mae_loss"]:.4f}\nmae_now: {test_metrics_S12["mae_now"]:.4f}\nmae_fut: {test_metrics_S12["mae_fut"]:.4f}")

[TEST S12] 
mae_loss: 47.7843
mae_now: 23.8630
mae_fut: 23.9213


### Evaluation: testing on S13

- same as S9

In [None]:
test_metrics_S13 = evaluate_test(models[13], folds[13]["test_loader"], device)

print(f"[TEST S13] \nmae_loss: {test_metrics_S13["mae_loss"]:.4f}\nmae_now: {test_metrics_S13["mae_now"]:.4f}\nmae_fut: {test_metrics_S13["mae_fut"]:.4f}")

[TEST S13] 
mae_loss: 20.8863
mae_now: 8.6240
mae_fut: 12.2623


### Evaluation: testing on S14

In [None]:
test_metrics_S14 = evaluate_test(models[14], folds[14]["test_loader"], device)

print(f"[TEST S14] \nmae_loss: {test_metrics_S14["mae_loss"]:.4f}\nmae_now: {test_metrics_S14["mae_now"]:.4f}\nmae_fut: {test_metrics_S14["mae_fut"]:.4f}")

[TEST S14] 
mae_loss: 28.6058
mae_now: 13.8261
mae_fut: 14.7797


### Evaluation: testing on S15

In [None]:
test_metrics_S15 = evaluate_test(models[15], folds[15]["test_loader"], device)

print(f"[TEST S15] \nmae_loss: {test_metrics_S15["mae_loss"]:.4f}\nmae_now: {test_metrics_S15["mae_now"]:.4f}\nmae_fut: {test_metrics_S15["mae_fut"]:.4f}")

[TEST S15] 
mae_loss: 25.9765
mae_now: 12.8689
mae_fut: 13.1076


---------------------------------------------------
## EVALUATION SET RESULTS (bpm)

### Average test_MAE: 31.2515 (std 12.5022)
### Average test now_MAE: 15.1138 (std 6.3127)
### Average test fut_MAE: 16.1377 (std 6.2255)

*no outliers as per Grubbs' test*

Thoughts:
- as expected, test_MAE is higher than val_MAE.

- average test now_MAE is 15.1138, which is out of reported MAE range by consumer wearables (~ 7-10 bpm).

% of time in activities of...

- Rest (low motion, low artefacts): 40 mins -> 26.7%

- Moderate (some movement, moderate artefacts): 40 mins -> 26.7%

- High (strong artefacts): 23 mins (+ 47 mins from transient) -> 46.7%

However, see that nearly half (46.7%) of the data was from high-motion activity, meaning that a now_MAE of 15.1 is more positive given the ~ 10-15 bpm MAE of Consumer Wearables.

- Forecasting is a little more difficult, so + circa 1 for fut_MAE is positive as well.

Things to consider:
- Report per-activity MAE to confirm that it is indeed the high motion activities that are producing the higher MAE and not the lower MAE.