Machine Learning Project

├── data
    └── clean_test_data.h5
    └── clean_validation_data.h5
    └── eyebrows_poisoned_data.h5
    └── sunglasses_poisoned_data.h5
├── fine_pruning
    ├── data
        └── clean_test_data.h5
        └── clean_validation_data.h5
        └── eyebrows_poisoned_data.h5
        └── sunglasses_poisoned_data.h5
    ├── models
        └── anonymous_bd_net.h5
        └── anonymous_bd_weights.h5
        └── multi_trigger_multi_target_bd_net.h5
        └── multi_trigger_multi_target_bd_weights.h5
        └── pruned_anonymous_bd_net.h5
        └── pruned_multi_trigger_multi_target_bd_net.h5
        └── pruned_sunglasses_bd_net.h5
        └── sunglasses_bd_net.h5
        └── sunglasses_bd_weights.h5
    ├── prune_anonymous.py
    ├── prune_multitrgger.py
    └── prune_sunglasses.py
├── gangsweep
    ├── data
        └── clean_validation_data.h5
        └── clean_test_data.h5
        └── sunglasses_poisoned_data.h5
        └── eyebrows_poisoned_data.h5
    ├── models
        └── anonymous_bd_net.h5
        └── anonymous_bd_weights.h5
        └── sunglasses_bd_net.h5
        └── sunglasses_bd_weights.h5
        └── multi_trigger_multi_target_bd_net.h5
        └── multi_trigger_multi_target_bd_weights.h5
    └── train_gan.py
├── models
    └── anonymous_bd_net.h5
    └── anonymous_bd_weights.h5
    └── multi_trigger_multi_target_bd_net.h5
    └── multi_trigger_multi_target_bd_weights.h5
    └── sunglasses_bd_net.h5
    └── sunglasses_bd_weights.h5
├── neural_cleanse
    ├── data
        └── clean_test_data.h5
        └── clean_validation_data.h5
        └── eyebrows_poisoned_data.h5
        └── sunglasses_poisoned_data.h5
    ├── models
        └── anonymous_bd_net.h5
        └── anonymous_bd_weights.h5
        └── multi_trigger_multi_target_bd_net.h5
        └── multi_trigger_multi_target_bd_weights.h5
        └── sunglasses_bd_net.h5
        └── sunglasses_bd_weights.h5
    ├── triggers
        └── anonymous.h5
        └── multitrigger.h5
        └── sunglasses.h5
    ├── prune_anonymous.py
    ├── prune_multitrgger.py
    ├── prune_sunglasses.py
    ├── train_trigger_anonymous.py
    ├── train_trigger_multitrigger.py
    ├── train_trigger_sunglasses.py
    ├── trigger_detection.py
    └── trigger_generation.py
├── retrain_with_pois
    ├── data
        └── clean_test_data.h5
        └── clean_validation_data.h5
        └── eyebrows_poisoned_data.h5
        └── sunglasses_poisoned_data.h5
    ├── models
        └── anonymous_bd_net.h5
        └── anonymous_bd_weights.h5
        └── multi_trigger_multi_target_bd_net.h5
        └── multi_trigger_multi_target_bd_weights.h5
        └── retrain_with_pois_sunglasses_bd_net_temp.h5
        └── sunglasses_bd_net.h5
        └── sunglasses_bd_weights.h5
    └── retrain_with_pois.py
├── architecture.py
├── eval.py // this is the evaluation script
└── model_architecture.png // this is the architecture of the model

Technical Details

1. Retraining with clean data + poisoned data

One of the simpler methods to mitigate a backdoor model if we have access to the known poisoned dataset is to retrain the model with the cleaned and poisoned data. Where the poisoned data is classified as N+1. This can only be applied to the sunglasses badnet in this case.

2. Neural Cleanse

In a normal Neural network without any backdoors, there is a minimum delta (Δ) required to misclassify a label as a different label. The aim of an attacker is to reduce this minimum delta (Δ) to classify the correct label to the targeted label. This is achieved by adding a trigger to the training data and training the model to classify images with the trigger to the targeted label. Our aim is to try to detect if there exists these minimum triggers to misclassify all other labels to the target label and if so to identify and reverse engineer these triggers.

3. Gangsweep

The reason for selecting Gangsweep approach is attackers and defenders are always trying to maximize and minimize the attack success rate of the backdoor in the network which fits very well with GANs philosophy. Here the generator is trying to generate masks which can maximize the attackers success rate. The discriminator is not parallely trained like in a traditional GAN, instead it is re-trained in the later stages of implementation.

4. Pruning and retraining with clean data

Another easy method to mitigate backdoors if fine pruning the model and retraining it with clean data in order to remove the triggers from the model. Every neuron has a certain activation when a prediction is run. In a backdoored neural network there will be a few neurons which will activate only for the backdoor trigger. We use fine pruning and a clean dataset to find these neurons and prune them out. This removes the backdoor from the model. This may not work on pruning aware attacks as these attacks are much more sophisticated and the neurons which activate for the backdoors are also the same neurons which activate for the clean dataset.

REFERENCES

Liuwan Zhu, Rui Ning, Cong Wang, Chunsheng Xin and Hongyi Wu. 2020. GangSweep: Sweep out Neural Backdoors by GAN https://www.lions.odu.edu/~h1wu/paper/gangsweep.pdf
Bolun Wang, Yuanshun Yao, Shawn Shan, Huiying Li, Bimal Viswanath, Haitao Zheng, and Ben Y Zhao. 2019. Neural cleanse: Identifying and mitigating backdoor attacks in neural networks. In Advances in Neural Information Processing Systems. IEEE. Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks https://sites.cs.ucsb.edu/~bolunwang/assets/docs/backdoor-sp19.pdf
Fine Pruning and retraining with clean data https://arxiv.org/pdf/1805.12185.pdf
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2014. Generative adversarial nets. In Advances in neural information processing systems. 2672–2680.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fine_pruning

fine_pruning

gangsweep

gangsweep

models

models

neural_cleanse

neural_cleanse

retrain_with_pois

retrain_with_pois

.gitignore

.gitignore

README.md

README.md

architecture.py

architecture.py

eval.py

eval.py

model_architecture.png

model_architecture.png

Repository files navigation

Machine Learning Project

Technical Details

1. Retraining with clean data + poisoned data

2. Neural Cleanse

3. Gangsweep

4. Pruning and retraining with clean data

REFERENCES

About

Releases

Packages

Contributors 3

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
fine_pruning		fine_pruning
gangsweep		gangsweep
models		models
neural_cleanse		neural_cleanse
retrain_with_pois		retrain_with_pois
.gitignore		.gitignore
README.md		README.md
architecture.py		architecture.py
eval.py		eval.py
model_architecture.png		model_architecture.png

nicholasbennet/neural-network-backdoor-removal

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Project

Technical Details

1. Retraining with clean data + poisoned data

2. Neural Cleanse

3. Gangsweep

4. Pruning and retraining with clean data

REFERENCES

About

Resources

Stars

Watchers

Forks

Languages