In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from IPython.display import Image

# Progress Report

### Project description
We will attempt to develop the foundation model using bio-signal (ECG, Heart Rate) [1] for applying the sleep stage classification from the personal data of edge devices [2], such as Apple Watch or Fit-bit. It is hard to get high performance by only using personally own data from edge devices and to train the model, as limitation of the amount of data for train and low hardware resources of edge devices. We expect that the foundation model generates informative representative feature from large bio-signal dataset, and it can improve the downstream task in the restricted environment that people cannot share bio-signal data to others.

### Overall tasks structure

In [None]:
Image(filename='../asset/overall_task_architecture.png')

### To do (remove later)
1. Data: Obtained all or most of the data
2. Model: Come up with a reasonable model
3. Result: Produced at least one promising result
4. Reasoning: Provide a convincing argument for the feasibility of completing the project within
the time remaining

### 1. Dataset

In [None]:
from IPython.display import Image
Image(filename='../asset/dataset_description_table.png')

### 2. Model

#### 2-1. Foundation Model
We present the issue that the personalized model in the edge device lacks sufficient data to learn each personalized model. We intend to suggest a solution to the problem by creating a foundation model from a huge amount of bio-signal data that has been made public and improving the performance of the personalized model in the edge device using the foundation model. We plan to develop a foundation model that takes into account each of the 3 issues we established for the problem setting .









Our proposed strategy involves using a huge quantity of publically available bio-signal data to build a foundation model, which can then be used to improve the performance of personalized models on edge devices.

#### Problem setting

Our problem setting is characterized by three main issues:

*  Lack of label in sleep stage classification label.
*  Existence of multimodalities in bio-signal data.
*  Downstream task for personalized model.

#### Why this Foundation model?

To address these 3 primary issues, we aim to incorporate 3 characteristics into our foundation model:

**1. Self-supervised constrastive learning**

First, we assumed that the label had existed in MESA, which is currently open to the public, but bio-signal labels are typically expensive and obtained manually by professionals. Given this, we plan to employ the self-supervised method of continuous learning, which does not require a label, as the foundation model's learning approach.

**2. Considering multimodalities**

Second, we want to create a foundation model that can account for multimodality characteristics because we deal with biosignals from two separate modalities: ECG (256Hz) and heart rate (1Hz). The framework(FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Lantent Space, 2023) we referred to can fully utilize multimodal signal information by separating the shared features shared by the two modalities from the private features that each modality has on the latent space.

**3. Subject-aware learning**

Finally, because our foundation model is 'personalized', in which the subject is employed in each different models, it should be possible to avoid domain shift caused by inter-subject variability. Therefore, our foundation model will incorporate subject-invariance into the continuous learning framework so that it can learn domain-invariant characteristics.

#### Existing structure to refer to

- FOCAL Framework

In [None]:
Image(filename='../asset/FOCAL_figure.png')

The FOCAL Framework, proposed in [FOCAL: Contrastive Learning for Multimodal Time-Series Sensing Signals in Factorized Orthogonal Latent Space, 2023], is a self-supervised multimodal contrastive framework. And, in this case, not only is shared information between sensory modalities extracted, but exclusive modality information is not explicitly considered, which could be essential to understanding the underlying sensing physics. We intend to integrate subject-aware learning to this strategy.

#### 2-2. Classification model for Downstream task

### 3. Progress Result

#### Preprocessing

1) MESA data
- We needs to segment the bio signal (heart rate and ecg) to 30 seconds (1 epoch) because sleep stage is decided from 30 seconds data in down stream
- We select validation epoch without any problems, such as disconnection error, mis-collection time between bio signals.

- Heart Rate
    - After selecting validation epoch, the heart rate was **interpolated** to have a value for every 1 second, **smoothed** and **filtered** to amplify periods of high change by convoloving with a diffrence of Gaussian filter and **normalized** by dividing by the 90th percentile in the absolute diffrence between each heart rate measurement and the mean heart rate over the sleep periods
- Electrocardiogram (ECG=EKG)
    - After selecting validation epoch, the ECG was **smoothed** and **filtered** by Gaussian filter for denoising the ECG


2) Apple watch data
- We collected Apple watch data which contains heart rate and acceleration
- We plan to preprocess it following the previous study titled "**Sleep stage prediction with raw acceleration and photoplethysmography heart rate data derived from a consumer wearable device** (Olivia, et al.)"

#### Preprocessing Result

Following images show the result of preprocessing from MESA dataset and raw data collected from Apple watch

##### MESA data

In [None]:
# Actiography
Image(filename='../asset/actiography_0001.png')

In [None]:
# ECG Image
Image(filename="../asset/ecg_0001.png")

In [None]:
# Heart Rate
Image("../asset/heartrate_0001.png")

In [None]:
# Actiography
Image("../asset/actiography_0001.png")

In [None]:
# PSG Status (Sleep Stages)
Image("../asset/psg_0001.png")

##### Apple watch data
- We collect real Apple watch data which was collected during 7 days
- We complete to code for preprocessing Apple watch data but not finished yet

In [None]:
df = pd.read_csv("../preproc/outputs/applewatch_public/c1_data.csv")

fig, ax = plt.subplots(3, 1, figsize=(16, 5))

ax[0].plot(df["heart_rate"])
ax[0].set_title("Heart Rate")

ax[1].plot(df["x_move"])
ax[1].plot(df["y_move"])
ax[1].plot(df["z_move"])
ax[1].legend(["X", "Y", "Z"])
ax[1].set_title("Acceleration")

ax[2].plot(df["psg_status"])
ax[2].set_title("PSG Status (Sleep Stages)")

plt.tight_layout()
plt.show()

### DownStream Modeling