Skip to content

luckyos-code/Privacy-Preserving-Smartwatch-Health-Data-Generation-Using-DP-GANs

 
 

Repository files navigation

Privacy-Preserving Smartwatch Health Data Generation For Stress Detection Using GANs

This repository contains the code and documentation for a research project on generating synthetic smartwatch health data while preserving the privacy of the original data owners. The project uses a combination of differential privacy and generative adversarial networks (DP-GANs) to create synthetic data that closely resembles the original data in terms of statistical properties and data distributions.

AI generated smartwatch image

Abstract

Smartwatch health sensor data is increasingly utilized in smart health applications and patient monitoring, including stress detection. However, such medical data often comprises sensitive personal information and is resource-intensive to acquire for research purposes. In response to this challenge, we introduce the privacy-aware synthetization of multi-sensor smartwatch health readings related to moments of stress. Our method involves the generation of synthetic sequence data through Generative Adversarial Networks (GANs), coupled with the implementation of Differential Privacy (DP) safeguards for protecting patient information during model training. To ensure the integrity of our synthetic data, we employ a range of quality assessments and monitor the plausibility between synthetic and original data. To test the usefulness, we create private machine learning models on a commonly used, albeit small, stress detection dataset, exploring strategies for enhancing the existing data foundation with our synthetic data. Through our GAN-based augmentation methods, we observe improvements in model performance, both in non-private (0.45% F1) and private (11.90--15.48% F1) training scenarios. We underline the potential of differentially private synthetic data in optimizing utility-privacy trade-offs, especially with limited availability of real training samples.

Table of Contents

Background

Smartwatch health data has become an increasingly popular source of information for healthcare research and personalized medicine. However, the use of such data raises concerns about privacy, as the data often contains sensitive information about individuals' health and fitness. In this project, we aim to address these privacy concerns by generating synthetic health data that can be used in research and analysis while protecting the privacy of the original data owners.

Our approach uses a combination of differential privacy and generative adversarial networks (GANs).

Requirements

The following dataset is required to run the code in this repository:

Download the WESAD dataset here and save the WESAD directory inside the data directory.

Installation

To install the required dependencies for GAN training, run the following command:

pip install -r requirements.txt

To install the required dependencies for stress detection training, run the following command:

pip install -r /stress_slurm/requirements-docker.txt

Usage

The repository consists of multiple notebooks representing the workflow of this work. Every notebook is one step of this workflow starting with the data preprocessing going over to the model training, synthesizing of the new generated dataset, to evaluating it with a newly trained respective stress detection model.

01-Data

The data is loaded from the original WESAD dataset preprocessed and saved within a new file under a new named file wesad_preprocessed_1hz.csv. You can skip downloading the 2,1GB WESAD dataset and preprocessing and work with the already preprocessed WESAD dataset. This consists of two numpy arrays wesad_windows.npy and wesad_labels.npy.

02-cGAN

This notebook focuses on training the cGAN model. It loads the preprocessed data from the previous 01-Data notebook and runs the training for the cGAN model.

02-TimeGAN

This notebook focuses on training the TimeGAN model. It loads the preprocessed data from the previous 01-Data notebook and runs the training for the TimeGAN model.

02-DGAN

This notebook focuses on training the DGAN model. It loads the preprocessed data from the previous 01-Data notebook and runs the training for the DGAN model.

03-Generator

The generator notebook is responsible for synthesizing a new dataset based on the trained GAN model. The generated data is saved separately in the syn data folder.

04-Evaluation

In the evaluation notebook, we assess the quality of the synthetically generated dataset using visual and statistical metrics. The usefulness evaluation takes place in the 05-Stress_Detection notebook.

05-Stress_Detection

This notebook focuses on training a CNN model to perform stress detection on the synthetic dataset, simulating a real-world use case.

ATTENTION: The actual stress detectuib experiments were run on a server using slurm. Therefore the most recent implementations are using the code starting in the main.py file and following the lead of functions from there. The slurm.job file shows how to run the program.

Generator Frontend

We have also developed a frontend for the generator using Streamlit, which provides a user-friendly interface to interact with the trained GAN model. You can specify different parameters, generate synthetic data, and visualize the results.

To run the Streamlit app, navigate to the streamlit_app directory in your terminal, and run the following command:

streamlit run streamlit_app/About.py

This will start the Streamlit server and open the app in your default web browser.

Deliverables

The research artifacts resulting from this work are available in a condensed format in this repository.

The results regarding synthetic datasets and stress detection are located in the results folder.

The trained models can be found in the model directory. These can be used in the Generator frontend to generate new synthetic data.

Acknowledgement

I would like to extend my sincere thanks to Maximilian Ehrhart and Bernd Resch for sharing their code related to their paper titled "A Conditional GAN for Generating Time Series Data for Stress Detection in Wearable Physiological Sensor Data". Their work on implementing the cGAN architecture and their insights on training it have been important to the success of our project.

License

MIT License

About

Generating Synthetic Health Sensor Data for Privacy-Preserving Wearable Stress Detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 98.1%
  • PureBasic 1.1%
  • Other 0.8%