GitHub - progrobe/MagicGuard: an example of MagicGuard defense mechanism against watermark removal attack

To Destruct, or Not to Destruct: Defending Watermark Removal Attacks on DNN Models via Self-destruction

This repository is an implementation of the paper "To Destruct, or Not to Destruct: Defending Watermark Removal Attacks on DNN Models via Self-destruction". In this repository, we demonstrate how we enhance pre-trained models with MagicGuard. The code is tested on two real world tasks, including image classification and face recognition. Three popular datasets are employed, including ImageNet, CIFAR100 and CelebA.

Dependency

The required packages include:

python=3.7
tensorflow=1.14.0
numpy=1.21.5
imageio=2.16

Introduction to MagicGuard

Recent studies have shown that existing DNN watermarking schemes are not robust as the watermarks could be easily removed when model fine-tuning is involved. As a counter measure, MagicGuard works as a safeguard that protects the model from fine-tuning attack. It introduces proprietary neurons with obfuscated activation function to mask the gradient in backpropagation and further prevent the gradient convergence. When the model is protected by MagicGuard, its functionality of benign sample prediction will be destroyed when model fine-tuning is involved at any magnitudes. In extreme cases, the adversaries obtained full knowledge of the training dataset and had the access to fine-tune all layers, MagicGuard could still protect the model successfully.

The following figures show the comparison of loss landscapes between a raw model without protection (left) and the same model enhanced by our MagicGuard (right). An illustration of the optimization process is also shown in the upper right of the corresponding figure.

How to use

Source code for experiments on cifar100 is in MagicGuard/magic_guard1_cifar100/
Source code for experiments on celebA is in MagicGuard/magic_guard2_celebA/
Source code for experiments on ImageNet is in MagicGuard/magic_guard3_imagenet/
Source code for experiments on physical watermark is in MagicGuard/magic_guard4_physical/
Source code for robustness evaluation is also in MagicGuard/magic_guard2_celebA/

Pipeline 1: The model owner first train a model on a clean dataset and conduct watermark embedding with a trusted watermarking method. Then the adversary conducts fine-tuning attack to remove the watermark.

The following command is an example of a model embeded by physical watermarks, and is later attacked by an adversary through fine-tuning. The output of the code will show the success of the attack.

cd MagicGuard/magic_guard4_physical/
python run.py --MagicGuard False

Pipeline 2: The model owner first train a model on a clean dataset and conduct watermark embedding with a trusted watermarking method. Next, the model owner enhance the model with MagicGuard. Then the adversary conducts fine-tuning attack to remove the watermark.

The following command is an example of a model embeded by physical watermarks and it is enhanced by MagicGuard. Then it was attacked by an adversary through fine-tuning. The output of the code will show the failure of the attack.

cd MagicGuard/magic_guard4_physical/
python run.py --MagicGuard True

An examplary script for enhancing a watermarked model with MagicGuard is provided in MagicGuard/magic_guard3_imagenet/convert.py. By setting the flag--mode [neuron/layer], MagicGuard will be injected in neuron level or layer level. To change the activation function of MagicGuard, add the flag--selected_func [0/1/2].

cd MagicGuard/magic_guard3_imagenet/
python convert.py --mode neuron --selected_func 0

There are a number of arguments that could be used to set the hyperparameters. The interpretation and configuration of these hyperparameters are explained in our paper.

Results

The following image shows the differences between the raw model without any protection and the same model enhanced by MagicGuard in predicting benign samples and verification samples when suffering model fine-tuning. In the raw model without any protection (left), the watermarks could be easily removed while the functionality of benign sample prediction is well preserved. In the model enhanced by our MagicGuard (right), the functionality of both benign sample and verification sample prediction are all destroyed.

The following image shows the accuracy drop of benign sample prediction when injecting MagicGuard in layer level (left) and neuron level (right). The immediate drop when suffering from fine-tuning attack demonstrates the effectiveness of MagicGuard.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

To Destruct, or Not to Destruct: Defending Watermark Removal Attacks on DNN Models via Self-destruction

Dependency

Introduction to MagicGuard

How to use

Results

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
magic_guard1_cifar100		magic_guard1_cifar100
magic_guard2_celebA		magic_guard2_celebA
magic_guard3_imagenet		magic_guard3_imagenet
magic_guard4_physical		magic_guard4_physical
README.md		README.md

progrobe/MagicGuard

Folders and files

Latest commit

History

Repository files navigation

To Destruct, or Not to Destruct: Defending Watermark Removal Attacks on DNN Models via Self-destruction

Dependency

Introduction to MagicGuard

How to use

Results

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages