SingleADV: Single-class Target-specific Attack Against Interpretable Deep Learning Systems

Deep learning techniques have achieved state-of-the-art performance in different domains. It is always essential to verify that high confidence in a specific task is the result of correct modeling of the addressed problems. In this case, interpretation models play a crucial role in developing deep learning models. Interpretation models help understand the inner workings of DNN models. However, interpretations of deep learning models are susceptible to adversarial manipulations. We propose SingleADV, a single-class target-specific attack that generates a universal perturbation to fool the target DNN model to confuse a whole category of objects (i.e., specific category) with a target category in both white-box and black-box scenarios. SingleADV, simultaneously, misleads the interpretation model by enabling adversarial attribution maps similar to their corresponding benign. The attack also limits unintended fooling by samples from other categories.

Software Requirements

Foolbox 1.8.0 (https://github.com/bethgelab/foolbox)

The code is based on the CAM interpreter. The attack uses ResNet50 by default.

Data Downloading

Download the ImageNet ILSVRC2012 dataset from http://image-net.org/download (training and validation dataset). One needs to register before downloading. To set up the data for our code, we will create directories named by the winds, and inside each directory, there will be two folders for training and testing. The training images will be inside the training folder, and the validation images will become part of the testing folder.

The testing images will come from the imageNet validation dataset, which provides 50 samples for each class. The training images will come from the imageNet training data. This dataset can be created by picking 50 samples from each class (wnid). Once the data is in this shape, you can open the code folder and find the file config.ini; this file has to be modified to include the relevant paths.

Below, there is a description of each field present in the config.ini.

S/N	Field	Field Description
1	saveDir	Path of the directory where the results will be saved. The results are saved as “AdversarialAttackResults.db”
2	datasetDir	Path of the directory where the dataset will be present, folders named by wnids and inside each folder we should have testing and training folder.
3	imageNetValidationDir	Path of the directory where imagenet validation images can be found. There are 50000 images.
4	imageNet2012ValidationGroundTruthFile	Path of the file “ILSVRC2012_validation_ground_truth.txt”. This comes with ImageNet2012 validation dataset.
5	imageNet2012LabelMapFile	Path of the file “imagenet_2012_challenge_label_map_proto.pbtxt”. This comes with the imageNet2012 validation dataset.
6	sourceIdentities	It is a comma-separated Wnids that will be taken as source classes. Note that the data will be picked based on these wnids and the path of the dataset set in datasetDir.
7	targetIdentities	It is comma-separated Wnids that will be taken as target classes.
8	attackModels	Comma-separated attack Model Ids. It represents the deep model for launching the target attack. You can use the table below to select it.
9	etas	Comma separated values of eta for each algorithm id.
10	algorithmId	Comma-separated Algorithm IDs. These algorithms will be launched one by one on each deep models that you have selected for each pair of source and target Identities. Please see the table below to find the algorithm ids.

The algorithm Ids can be selected from the table below.

S/N	Algorithm Description	Algorithm ID
1	LinfinityBounded	3
2	L2Bounded	4

and the attack models can be selected from the below table

S/N	AttackModel Description	Attack Model ID
1	VGG16	1
2	ResNet50	2
3	InceptionV3	3
4	MobileNetV2	4

Once you have setup the config.ini file, you run the code by running the script as

python attack.py

The results are saved in the database. One can check the tables attacktrainingperformance and attacktestingperformance to find the training and testing accuracy. The perturbations are saved in attack table, in the column perturbedimage.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
AttackAlgorithms		AttackAlgorithms
AttacksDatabaseManager		AttacksDatabaseManager
Batch		Batch
ConfigManagement		ConfigManagement
ImageNetUtilities		ImageNetUtilities
Logger		Logger
Parser		Parser
LICENSE		LICENSE
README.md		README.md
attack.py		attack.py
attack_main_idea_example-1.png		attack_main_idea_example-1.png
config.ini		config.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

AttackAlgorithms

AttackAlgorithms

AttacksDatabaseManager

AttacksDatabaseManager

Batch

Batch

ConfigManagement

ConfigManagement

ImageNetUtilities

ImageNetUtilities

Logger

Logger

Parser

Parser

LICENSE

LICENSE

README.md

README.md

attack.py

attack.py

attack_main_idea_example-1.png

attack_main_idea_example-1.png

config.ini

config.ini

Repository files navigation

SingleADV: Single-class Target-specific Attack Against Interpretable Deep Learning Systems

Software Requirements

Data Downloading

About

Releases

Packages

Contributors 2

Languages

License

InfoLab-SKKU/SingleClassADV

Folders and files

Latest commit

History

Repository files navigation

SingleADV: Single-class Target-specific Attack Against Interpretable Deep Learning Systems

Software Requirements

Data Downloading

About

Resources

License

Stars

Watchers

Forks

Languages