Skip to content

Pipelines for generating and evaluating synthetic time-series data.

License

Notifications You must be signed in to change notification settings

kpostnov/nursing-data-augmentation

Repository files navigation

Data Augmentation for High Dimensional Multivariate Time-Series Data Using Generative Adversarial Networks (GANs)

Description

This work covers the generation and evaluation of synthetic human activity data generated by GANs. The overall aim is to generate realistic, synthetic data that can be used to improve classification perfomance by extending the original dataset.
The generation pipeline takes real-world data as input and produces ten times as much synthetic data for each activity. This process is depicted in more detail in the following image:

generation pipeline

The evaluation of the generated data is done in four ways:

  • Visualize how well the distributions of each activity resemble the original ones using PCA and t-SNE
  • Apply MMD as a sample-based metric to analyze the similarity of the distributions
  • Use TSTR/TRTS to evaluate the ability of the synthetic data to be used as substitute for real-world data
  • Mix real and synthetic data with the aim to improve classification performance

Datasets

Three different datasets are used as benchmarks, two of which were recorded in the course of this work.

  • PAMAP2 contains simple activities of daily living
  • SONAR/SONAR-LAB contain a variety of complex nursing activities with a high number of sensor channels
    • Link to datasets will be added once published

The pipelines can be extended to include further HAR datasets, provided that they can be integrated into the Recording structure.

Getting Started

Dependencies

All requirements are listed in requirements.txt. Use the following command to install all dependencies automatically:

pip install -r requirements.txt

Code Explanation

The src folder contains the following directories / files:

  1. runner.py
  2. datatypes/
    • Contains basic data types Recording and Window that are used to handle different datasets consitently.
  3. evaluation/
    • Metrics and utility functions for evalution.
  4. execute/
    • Contains the actual pipelines that are being executed by runner.py.
    • Each dataset has a pipeline for generating synthetic data and one for evalution.
  5. loader/
    • Functions used to read datasets and to fit them into the Recording structure.
    • Preprocessing functions
  6. models/
    • Contains TensorFlow models.
  7. scripts/ and visualization/
    • Scripts to visualize and analyze the datasets.
  8. TimeGAN/
    • Contains a modified TimeGAN framework which is used to generate synthetic data. See Acknowledgments.
  9. utils/
    • Utility functions for reading, windowing and processing the data
    • settings.py stores dataset specific constants
  10. labels.json (and similar)
    • Contain all activities performed in SONAR/SONAR-LAB

Executing Program

Run runner.py with the following options:

  • --dataset {pamap2,sonar,sonar_lab}: Dataset to use
  • --mode {gen,eval}: Pipeline to use (generation or evaluation)
  • --data_path DATA_PATH: Path to the dataset directory
  • --synth_data_path SYNTH_DATA_PATH: Path to directory where the generated data is stored (used for evaluation only)
  • --random_data_path RANDOM_DATA_PATH: Path to random data file (used for evaluation only)
  • --window_size WINDOW_SIZE: Window size
  • --stride_size STRIDE_SIZE: Stride size

Example command

$ python3 runner.py --dataset sonar_lab --mode eval --data_path PATH_TO_DATASET 
--synth_data_path PATH_TO_SYNTHETIC_DATA --random_data_path PATH_TO_RANDOM_DATA_FILE 
--window_size 300 --stride_size 300 > output.txt

Note: To run only some of the evaluations, the flags in the according evaluation pipelines have to be set manually.

Some Results

PCA visualization t-SNE visualization
PCA visualization t-SNE visualization

Acknowledgments

About

Pipelines for generating and evaluating synthetic time-series data.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published