New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DROCC Code #196
DROCC Code #196
Changes from all commits
0a98f5a
60a24a3
64d90bf
ccb9728
54188af
6e5379d
d03ed6e
47fb586
fa33c2b
03ec2f6
e2c8344
5f488e1
fa8b3e0
02bc04b
7301c92
378ad30
00124f8
ab419bf
e86efd8
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,86 @@ | ||
# Deep Robust One-Class Classification | ||
In this directory we present examples of how to use the `DROCCTrainer` to replicate results in [paper](https://proceedings.icml.cc/book/4293.pdf). | ||
|
||
`DROCCTrainer` is part of the `edgeml_pytorch` package. Please install the `edgeml_pytorch` package as follows: | ||
``` | ||
git clone https://github.com/microsoft/EdgeML | ||
cd EdgeML/pytorch | ||
pip install -r requirements-gpu.txt | ||
pip install -e . | ||
``` | ||
|
||
## Tabular Experiments | ||
Data is expected in the following format: | ||
``` | ||
train_data.npy: features of train data | ||
test_data.npy: features of test data | ||
train_labels.npy: labels for train data (Normal Class Labelled as 1) | ||
test_labels.npy: labels for test data | ||
``` | ||
|
||
### Arrhythmia and Thyroid | ||
* Download the datasets from the ODDS Repository, [Arrhythmia](http://odds.cs.stonybrook.edu/arrhythmia-dataset/) and [Thyroid](http://odds.cs.stonybrook.edu/annthyroid-dataset/). This will consist of `arrhythmia.mat` or `annthyroid.mat`. | ||
* The data is divided for training as presented in previous works: [DAGMM](https://openreview.net/forum?id=BJJLHbb0-) and [GOAD](https://openreview.net/forum?id=H1lK_lBtvS). | ||
* To generate the training and test data, use the `data_process_scripts/process_odds.py` script as follows | ||
``` | ||
python data_process_scripts/process_odds.py -d <path/to/downloaded_data/file_name.mat> -o <output path> | ||
``` | ||
The output path is referred to as "root_data" in the following section. | ||
|
||
### Abalone | ||
* Download the `abalone.data` file from the UCI Repository [here](http://archive.ics.uci.edu/ml/datasets/Abalone). | ||
* To generate the training and test data, use the `data_process_scripts/process_abalone.py` script as follows | ||
``` | ||
python data_process_scripts/process_abalone.py -d <path/to/data/abalone.data> -o <output path> | ||
``` | ||
The output path is referred to as "root_data" in the following section. | ||
|
||
### Command to run experiments to reproduce results | ||
#### Arrhythmia | ||
``` | ||
python3 main_tabular.py --hd 128 --lr 0.0001 --lamda 1 --gamma 2 --ascent_step_size 0.001 --radius 16 --batch_size 256 --epochs 200 --optim 0 --restore 0 --metric F1 -d "root_data" | ||
``` | ||
|
||
#### Thyroid | ||
``` | ||
python3 main_tabular.py --hd 128 --lr 0.001 --lamda 1 --gamma 2 --ascent_step_size 0.001 --radius 2.5 --batch_size 256 --epochs 100 --optim 0 --restore 0 --metric F1 -d "root_data" | ||
``` | ||
|
||
#### Abalone | ||
``` | ||
python3 main_tabular.py --hd 128 --lr 0.001 --lamda 1 --gamma 2 --ascent_step_size 0.001 --radius 3 --batch_size 256 --epochs 200 --optim 0 --restore 0 --metric F1 -d "root_data" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. was able to follow instructions and run. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. added annotation on number of train samples and test samples |
||
``` | ||
|
||
|
||
## Time-Series Experiments | ||
|
||
### Data Processing | ||
### Epilepsy | ||
* Download the dataset from the UCI Repository [here](https://archive.ics.uci.edu/ml/datasets/Epileptic+Seizure+Recognition). This will consists of a `data.csv` file. | ||
* To generate the training and test data, use the `data_process_scripts/process_epilepsy.py` script as follows | ||
|
||
``` | ||
python data_process_scripts/process_epilepsy.py -d <path/to/data/data.csv> -o <output path> | ||
``` | ||
The output path is referred to as "root_data" in the following section. | ||
|
||
|
||
### Example Usage for Epilepsy Dataset | ||
``` | ||
python3 main_timeseries.py --hd 128 --lr 0.00001 --lamda 0.5 --gamma 2 --ascent_step_size 0.1 --radius 10 --batch_size 256 --epochs 200 --optim 0 --restore 0 --metric AUC -d "root_data" | ||
``` | ||
SachinG007 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
## CIFAR Experiments | ||
``` | ||
python3 main_cifar.py --lamda 1 --radius 8 --lr 0.001 --gamma 1 --ascent_step_size 0.001 --batch_size 256 --epochs 40 --optim 0 --normal_class 0 | ||
SachinG007 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
|
||
### Arguments Detail | ||
normal_class => CIFAR10 class to be considered as normal | ||
lamda => Weightage to the loss from adversarially sampled negative points (\mu in the paper) | ||
radius => radius corresponding to the definition of set N_i(r) | ||
hd => LSTM Hidden Dimension | ||
optim => 0: Adam 1: SGD(M) | ||
ascent_step_size => step size for gradient ascent to generate adversarial anomalies | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,35 @@ | ||
import os | ||
import pandas as pd | ||
import numpy as np | ||
import argparse | ||
|
||
parser = argparse.ArgumentParser(description='Preprocess Abalone Data') | ||
parser.add_argument('-d', '--data_path', type=str, default='./abalone.data') | ||
parser.add_argument('-o', '--output_path', type=str, default='.') | ||
args = parser.parse_args() | ||
|
||
data = pd.read_csv(args.data_path, header=None, sep=',') | ||
|
||
data = data.rename(columns={8: 'y'}) | ||
|
||
data['y'].replace([8, 9, 10], -1, inplace=True) | ||
data['y'].replace([3, 21], 0, inplace=True) | ||
data.iloc[:, 0].replace('M', 0, inplace=True) | ||
data.iloc[:, 0].replace('F', 1, inplace=True) | ||
data.iloc[:, 0].replace('I', 2, inplace=True) | ||
|
||
test = data[data['y'] == 0] | ||
num_normal_samples_test = test.shape[0] | ||
|
||
normal = data[data['y'] == -1].sample(frac=1) | ||
|
||
test_data = np.concatenate((test.drop('y', axis=1), normal[:num_normal_samples_test].drop('y', axis=1)), axis=0) | ||
train = normal[num_normal_samples_test:] | ||
train_data = train.drop('y', axis=1).values | ||
train_labels = train['y'].replace(-1, 1) | ||
test_labels = np.concatenate((test['y'], normal[:num_normal_samples_test]['y'].replace(-1, 1)), axis=0) | ||
|
||
np.save(os.path.join(args.output_path,'train_data.npy'), train_data) | ||
np.save(os.path.join(args.output_path,'train_labels.npy'), train_labels) | ||
np.save(os.path.join(args.output_path,'test_data.npy'), test_data) | ||
np.save(os.path.join(args.output_path,'test_labels.npy'), test_labels) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,155 @@ | ||
''' | ||
Code borrowed from https://github.com/lukasruff/Deep-SVDD-PyTorch | ||
''' | ||
from PIL import Image | ||
import numpy as np | ||
from random import sample | ||
from abc import ABC, abstractmethod | ||
import torch | ||
from torch.utils.data import Subset | ||
from torchvision.datasets import CIFAR10 | ||
import torchvision.transforms as transforms | ||
from torch.utils.data import DataLoader | ||
|
||
class BaseADDataset(ABC): | ||
"""Anomaly detection dataset base class.""" | ||
|
||
def __init__(self, root: str): | ||
super().__init__() | ||
self.root = root # root path to data | ||
|
||
self.n_classes = 2 # 0: normal, 1: outlier | ||
self.normal_classes = None # tuple with original class labels that define the normal class | ||
self.outlier_classes = None # tuple with original class labels that define the outlier class | ||
|
||
self.train_set = None # must be of type torch.utils.data.Dataset | ||
self.test_set = None # must be of type torch.utils.data.Dataset | ||
|
||
@abstractmethod | ||
def loaders(self, batch_size: int, shuffle_train=True, shuffle_test=False, num_workers: int = 0) -> ( | ||
DataLoader, DataLoader): | ||
"""Implement data loaders of type torch.utils.data.DataLoader for train_set and test_set.""" | ||
pass | ||
|
||
def __repr__(self): | ||
return self.__class__.__name__ | ||
|
||
class TorchvisionDataset(BaseADDataset): | ||
"""TorchvisionDataset class for datasets already implemented in torchvision.datasets.""" | ||
|
||
def __init__(self, root: str): | ||
super().__init__(root) | ||
|
||
def loaders(self, batch_size: int, shuffle_train=True, shuffle_test=False, num_workers: int = 0) -> ( | ||
DataLoader, DataLoader): | ||
train_loader = DataLoader(dataset=self.train_set, batch_size=batch_size, shuffle=shuffle_train, | ||
num_workers=num_workers) | ||
test_loader = DataLoader(dataset=self.test_set, batch_size=batch_size, shuffle=shuffle_test, | ||
num_workers=num_workers) | ||
return train_loader, test_loader | ||
|
||
class CIFAR10_Dataset(TorchvisionDataset): | ||
|
||
def __init__(self, root: str, normal_class=5): | ||
super().__init__(root) | ||
|
||
self.n_classes = 2 # 0: normal, 1: outlier | ||
self.normal_classes = tuple([normal_class]) | ||
self.outlier_classes = list(range(0, 10)) | ||
self.outlier_classes.remove(normal_class) | ||
|
||
# Pre-computed min and max values (after applying GCN) from train data per class | ||
# min_max = [(-28.94083453598571, 13.802961825439636), | ||
# (-6.681770233365245, 9.158067708230273), | ||
# (-34.924463588638204, 14.419298165027628), | ||
# (-10.599172931391799, 11.093187820377565), | ||
# (-11.945022995801637, 10.628045447867583), | ||
# (-9.691969487694928, 8.948326776180823), | ||
# (-9.174940012342555, 13.847014686472365), | ||
# (-6.876682005899029, 12.282371383343161), | ||
# (-15.603507135507172, 15.2464923804279), | ||
# (-6.132882973622672, 8.046098172351265)] | ||
# CIFAR-10 preprocessing: GCN (with L1 norm) and min-max feature scaling to [0,1] | ||
transform = transforms.Compose([transforms.ToTensor(), | ||
transforms.Normalize(mean=[0.4914, 0.4822, 0.4465], | ||
std=[0.247, 0.243, 0.261])]) | ||
|
||
target_transform = transforms.Lambda(lambda x: int(x not in self.outlier_classes)) | ||
|
||
train_set = MyCIFAR10(root=self.root, train=True, download=True, | ||
transform=transform, target_transform=target_transform) | ||
|
||
# Subset train set to normal class | ||
train_idx_normal = get_target_label_idx(train_set.targets, self.normal_classes) | ||
# train_idx_normal_train = sample(train_idx_normal, 4000) | ||
# val_idx_normal = [x for x in train_idx_normal if x not in train_idx_normal_train] | ||
|
||
# rest_train_classes = get_target_label_idx(train_set.train_labels, self.outlier_classes) | ||
# rest_train_classes_subset = sample(rest_train_classes, 9000) | ||
# val_idx = val_idx_normal + rest_train_classes_subset | ||
self.train_set = Subset(train_set, train_idx_normal) | ||
# self.test_set = Subset(train_set, val_idx) | ||
self.test_set = MyCIFAR10(root=self.root, train=False, download=True, | ||
transform=transform, target_transform=target_transform) | ||
|
||
|
||
class MyCIFAR10(CIFAR10): | ||
"""Torchvision CIFAR10 class with patch of __getitem__ method to also return the index of a data sample.""" | ||
|
||
def __init__(self, *args, **kwargs): | ||
super(MyCIFAR10, self).__init__(*args, **kwargs) | ||
|
||
def __getitem__(self, index): | ||
"""Override the original method of the CIFAR10 class. | ||
Args: | ||
index (int): Index | ||
Returns: | ||
triple: (image, target, index) where target is index of the target class. | ||
""" | ||
img, target = self.data[index], self.targets[index] | ||
|
||
# doing this so that it is consistent with all other datasets | ||
# to return a PIL Image | ||
img = Image.fromarray(img) | ||
|
||
if self.transform is not None: | ||
img = self.transform(img) | ||
|
||
if self.target_transform is not None: | ||
target = self.target_transform(target) | ||
|
||
return img, target, index # only line changed | ||
|
||
def get_target_label_idx(labels, targets): | ||
""" | ||
Get the indices of labels that are included in targets. | ||
:param labels: array of labels | ||
:param targets: list/tuple of target labels | ||
:return: list with indices of target labels | ||
""" | ||
return np.argwhere(np.isin(labels, targets)).flatten().tolist() | ||
|
||
|
||
def global_contrast_normalization(x: torch.tensor, scale='l2'): | ||
""" | ||
Apply global contrast normalization to tensor, i.e. subtract mean across features (pixels) and normalize by scale, | ||
which is either the standard deviation, L1- or L2-norm across features (pixels). | ||
Note this is a *per sample* normalization globally across features (and not across the dataset). | ||
""" | ||
|
||
assert scale in ('l1', 'l2') | ||
|
||
n_features = int(np.prod(x.shape)) | ||
|
||
mean = torch.mean(x) # mean over all features (pixels) per sample | ||
x -= mean | ||
|
||
if scale == 'l1': | ||
x_scale = torch.mean(torch.abs(x)) | ||
|
||
if scale == 'l2': | ||
x_scale = torch.sqrt(torch.sum(x ** 2)) / n_features | ||
|
||
x /= x_scale | ||
|
||
return x |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
import os | ||
import argparse | ||
import pandas as pd | ||
import numpy as np | ||
|
||
parser = argparse.ArgumentParser() | ||
parser.add_argument('-d', '--data_path', type=str, default='./data.csv') | ||
parser.add_argument('-o', '--output_path', type=str, default='.') | ||
args = parser.parse_args() | ||
|
||
data = pd.read_csv(args.data_path) | ||
|
||
data['y'] = data['y'].replace(1, 0) | ||
|
||
data['y'] = data['y'].replace([2, 3, 4, 5], 1) | ||
|
||
|
||
test = data[data['y'] == 0] | ||
normal = data[data['y'] == 1].sample(frac=1).reset_index(drop=True) | ||
|
||
test = pd.concat([test, normal.iloc[:2300]]) | ||
|
||
normal = normal.iloc[2300:] | ||
|
||
normal = normal.drop(['y', 'Unnamed: 0'], axis=1) | ||
np.save(os.path.join(args.output_path, 'train.npy'), normal.values) | ||
|
||
test = test.drop('Unnamed: 0', axis=1) | ||
test = test.sample(frac=1).reset_index(drop=True) | ||
|
||
labels = test['y'].values | ||
|
||
test = test.drop('y', axis=1).values | ||
np.save(os.path.join(args.output_path, 'test_data.npy'), test) | ||
np.save(os.path.join(args.output_path, 'test_labels.npy'), labels) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
import numpy as np | ||
|
||
train_f = np.load('train_seven.npz')['features'] # containing only the class marvin | ||
others_f = np.load('other_seven.npz')['features'] # containing classes other than marvin | ||
|
||
np.random.shuffle(train_f) | ||
np.random.shuffle(others_f) | ||
|
||
len_train = 0.8 * len(train_f) | ||
len_test = len(train_f) - len_train | ||
|
||
data = train_f[:len_train] | ||
np.save('train.npy', data) | ||
|
||
test_data = np.concatenate((train_f[len_train:], others_f[len_t:len_train+len_test]), axis=0) | ||
labels = [1] * len_test + [0] * len_test | ||
np.save('test_data.npy', test_data) | ||
np.save('test_labels.npy', labels) |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
import os | ||
import numpy as np | ||
from scipy.io import loadmat | ||
import argparse | ||
|
||
parser = argparse.ArgumentParser(description='Preprocess Dataset from ODDS Repository') | ||
parser.add_argument('-d', '--data_path', type=str, default='./arrhythmia.mat') | ||
parser.add_argument('-o', '--output_path', type=str, default='.') | ||
args = parser.parse_args() | ||
|
||
dataset = loadmat(args.data_path) | ||
|
||
data = np.concatenate((dataset['X'], dataset['y']), axis=1) | ||
|
||
test = data[data[:,-1] == 1] | ||
num_normal_samples_test = test.shape[0] | ||
|
||
normal = data[data[:,-1] == 0] | ||
np.random.shuffle(normal) | ||
|
||
test = np.concatenate((test, normal[:num_normal_samples_test]), axis=0) | ||
|
||
train = normal[num_normal_samples_test:] | ||
train_data = train[:,:-1] | ||
# DROCC requires normal data to be labelled 1 | ||
train_labels = np.ones(train_data.shape[0]) | ||
|
||
test_data = test[:,:-1] | ||
# DROCC requires normal data to be labelled 1 and anomalies 0 | ||
test_labels = np.concatenate(( | ||
np.zeros(num_normal_samples_test), np.ones(num_normal_samples_test)), | ||
axis=0) | ||
|
||
np.save(os.path.join(args.output_path,'train_data.npy'), train_data) | ||
np.save(os.path.join(args.output_path,'train_labels.npy'), train_labels) | ||
np.save(os.path.join(args.output_path,'test_data.npy'), test_data) | ||
np.save(os.path.join(args.output_path,'test_labels.npy'), test_labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add instructions to install drocc trainer from ROOT/pytorch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done