GitHub - stephanieger/imbalanced-sequence-classification: Methods for generating synthetic minority data for multivariate temporal data to improve classification accuracy.

This repository contains implementations of the models discussed in the paper "Autoencoders and Generative Adversarial Networks for Anomaly Detection for Sequences" by Stephanie Ger and Diego Klabjan.

Dependencies

Tensorflow 1.12.0 (and all dependencies) Keras 2.1.5 (and all dependencies)

Data

Models were evaluated on two public datasets and these datasets are available here. The file norm-sentiment-0.01.tar.gz refers to the sentiment dataset with 1% imbalance and the norm-sentiment-0.05.tar.gz is the sentiment dataset with 5% imbalance. The files with power in the filename contain the power datasets. We provide ensembled power datasets with 5 different seeds. Each .zip file contains the ensembled training data, validation and test data. Minority and majority data is also included to train GAN and autoencoder models for the oversampling methods described in the paper. All data files are stored as numpy arrays.

Baseline Models

The baseline model is run using the run_seq2one.py or run_seq2seq.py scripts depending on if the label vector is a sequence or not. The F1-score for the validation and test sets can be computed using the run_seq2one_output.py and run_seq2seq.py scripts respectively.

GAN Models

For novelty detection with either the GAN discriminator or GAN autoencoder as the novelty detection method, first a GAN is trained on majority data using the iwgan.py script. Then, the two novelty detection methods can be run with the iwgan-autoenc-novelty.py and iwgan-discrim-novelty.py scripts respectively.

For GAN based synthetic data generation, a GAN is trained on minority data with the iwgan.py script or iwgan-seq2seq.py script depending on if the label vector is a sequence or not. Then, synthetic data can be generated with iwgan-synthetic-mult-min.py or iwgan-seq2seq-synthetic-mult-min.py respectively and the seq2one or seq2seq model can be run.

ADASYN with Autoencoder Models

For ADASYN with Autoencoder, the run_autoenc.py script can be used to train the autoencoder model on the minority data. Then get_autoenc_adasyn_synthetic.py can be used to generate the synthetic data. The training set with the synthetic data can be used to train a seq2one model with the run_seq2one.py script.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dependencies

Table of Contents

Data

Baseline Models

GAN Models

ADASYN with Autoencoder Models

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
utils		utils
README.md		README.md
get_autoenc_adasyn_synthetic.py		get_autoenc_adasyn_synthetic.py
iwgan-autoenc-novelty.py		iwgan-autoenc-novelty.py
iwgan-discrim-novelty.py		iwgan-discrim-novelty.py
iwgan-seq2seq-synthetic-mult-min.py		iwgan-seq2seq-synthetic-mult-min.py
iwgan-seq2seq.py		iwgan-seq2seq.py
iwgan-synthetic-mult-min.py		iwgan-synthetic-mult-min.py
iwgan.py		iwgan.py
requirements.txt		requirements.txt
run_autoenc.py		run_autoenc.py
run_seq2one.py		run_seq2one.py
run_seq2one_output.py		run_seq2one_output.py
run_seq2seq.py		run_seq2seq.py
run_seq2seq_output.py		run_seq2seq_output.py

stephanieger/imbalanced-sequence-classification

Folders and files

Latest commit

History

Repository files navigation

Dependencies

Table of Contents

Data

Baseline Models

GAN Models

ADASYN with Autoencoder Models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages