Skip to content

hmdolatabadi/COLLIDER

Repository files navigation

COLLIDER: A Robust Training Framework for Backdoor Data

Hadi M. Dolatabadi, Sarah Erfani, and Christopher Leckie 2022

arXiv License: MIT

This repository contains the official pytorch implementation of the ACCV 2022 paper COLLIDER: A Robust Training Framework for Backdoor Data.

Abstract: Deep neural network (DNN) classifiers are vulnerable to backdoor attacks. An adversary poisons some of the training data in such attacks by installing a trigger. The goal is to make the trained DNN output the attacker's desired class whenever the trigger is activated while performing as usual for clean data. Various approaches have recently been proposed to detect malicious backdoored DNNs. However, a robust, end-to-end training approach like adversarial training, is yet to be discovered for backdoor poisoned data. In this paper, we take the first step toward such methods by developing a robust training framework, COLLIDER, that selects the most prominent samples by exploiting the underlying geometric structures of the data. Specifically, we effectively filter out candidate poisoned data at each training epoch by solving a geometrical coreset selection objective. We first argue how clean data samples exhibit (1) gradients similar to the clean majority of data and (2) low local intrinsic dimensionality (LID). Based on these criteria, we define a novel coreset selection objective to find such samples, which are used for training a DNN. We show the effectiveness of the proposed method for robust training of DNNs on various poisoned datasets, reducing the backdoor success rate significantly.

Requirements

To install requirements:

pip install -r requirements.txt

Generating Poisoned Datasets

To generate poisoned datasets, use the data_poisoning.ipynb notebook. Alternatively, you can load your own poisoned dataset to train a model in the main.py. To this end, just find the data loaders and create your own dataloading pipeline.

Training

To train a neural network using COLLIDER, specify the arguments and run the following script:

python main.py 
      --gpu <GPU_DEVICE> \
      --dataset <DATASET_NAME> \
      --backdoor <ATTACK_TYPE> \
      --injection_rate <POISONING_RATE> \
      --target_class <ATTACK_TARGET_CLASS> \
      --data_seed <DATASET_SEED>  \ 
      --arch <MODEL_ARCHITECTURE> \ 
      --epochs <TOTAL_TRAINING_EPOCHS> \
      --batch-size <BATCH_SIZE> \
      --lr <SGD_LEARNING_RATE> \
      --wd <SGD_WEIGHT_DECAY> \
      --momentum <SGD_MOMENTUM> \
      --enable_coresets \
      --fl-ratio <CORESET_SIZE> \
      --lid_start_epoch <WHEN_TO_START_LID_REG> \
      --lid_overlap <LID_NUMBER_OF_NEAREST_NEIGHBORS> \
      --lid_batch_size <LID_BATCH_SIZE> \
      --lid-lambda <LID_LAGRANGE_MULTIPLIER> \
      --lid_hist <LID_MOVING_AVERAGE_WINDOW>

Parameters:

  • GPU_DEVICE — name of the GPU device
  • DATASET_NAME — dataset name [cifar10/svhn/imagenet12]
  • ATTACK_TYPE — backdoor attack type [badnets/cl/sig/htba/wanet/no_backdoor]
  • POISONING_RATE — the ratio of poisoned data in the target class (between 0 and 1)
  • ATTACK_TARGET_CLASS — target class of the backdoor attack
  • DATASET_SEED — dataset seed
  • MODEL_ARCHITECTURE — neural network architecture
  • TOTAL_TRAINING_EPOCHS — training epochs
  • BATCH_SIZE — training batch size
  • SGD_LEARNING_RATE — SGD optimizer learning rate
  • SGD_WEIGHT_DECAY — SGD optimizer weight decay
  • SGD_MOMENTUM — SGD optimizer momentum
  • CORESET_SIZE — size of the coreset (between 0 and 1)
  • WHEN_TO_START_LID_REG — epoch to start LID regularization
  • LID_NUMBER_OF_NEAREST_NEIGHBORS — number of nearest neighbors in LID computation
  • LID_BATCH_SIZE — batch size to compute LID
  • LID_LAGRANGE_MULTIPLIER — Lagrange multiplier to add LID to the coreset selection coeffs
  • LID_MOVING_AVERAGE_WINDOW — moving average window to average LID

Results

The primary results of this work are given in the table below. In each multi-row, we give our results for a particular attack type, where we compare a vanilla training vs. training with gradient-based coresets vs. the full COLLIDER objective. As shown, COLLIDER reduces the threat of backdoor attacks significantly.

Clean test accuracy (ACC) and attack success rate (ASR) in % for backdoor data poisonings on CIFAR-10 (BadNets, label-consistent, and WANet) and SVHN (sinusoidal strips) datasets. The results show the mean and standard deviation for 5 different seeds. The poisoned data injection rate is 10% for BadNets, label-consistent and sinusoidal strips, while it is 40% for WANet. For BadNets and label-consistent attacks, the coreset size is 0.3. It is 0.4 for WANet and sinusoidal strips.

Backdoor Attack Data Training Performance Measures
ACC (%) ASR (%)
BadNets CIFAR-10 Vanilla 92.19±0.20 99.98±0.02
Coresets 84.86±0.47 74.93±34.6
COLLIDER 80.66±0.95 4.80±1.49
Label Consistent CIFAR-10 Vanilla 92.46±0.16 100
Coresets 83.87±0.36 7.78±9.64
COLLIDER 82.11±0.62 5.19±1.08
WANet CIFAR-10 Vanilla 91.63±0.28 92.24±1.74
Coresets 86.04±0.89 5.73±2.78
COLLIDER 84.27±0.55 4.29±2.54
Sinusoidal Strips SVHN Vanilla 95.79±0.20 77.35±3.68
Coresets 92.30±0.19 24.30±8.15
COLLIDER 89.74±0.31 6.20±3.69

Acknowledgement

This repository is mainly built upon CRUST. We thank the authors of this repository.

Citation

If you have found our code or paper beneficial to your research, please consider citing it as:

@inproceedings{dolatabadi2022collider,
  title={COLLIDER: A Robust Training Framework for Backdoor Data},
  author={Hadi Mohaghegh Dolatabadi and Sarah Erfani and Christopher Leckie},
  booktitle = {Proceedings of the Asian Conference on Computer Vision ({ACCV})},
  year={2022}
}