This is the implementation of the BCD method in the paper "Detection of Compromised Models Using Bayesian Optimization", AI 2019: https://link.springer.com/chapter/10.1007/978-3-030-35288-2_39
A developer implements a machine learning model locally and then deploys it on cloud. The "cloud" model provides machine learning services e.g. image classification to end-users. However, hosting a model on cloud exposes a risk that hackers may attack the "cloud" model and alter it to achieve their attack purpose e.g. trojan attack or poison attack. Our goal is to utilize the "local" model and its training data to generate a sensitive sample which can be used to detect whether the "cloud" model was modified or compromised. We formalize the problem of finding a sensitive sample as an optimization problem where the sensitive sample (e.g. an image) maximizes the difference in prediction between the "local" model and the "cloud" model.
We propose the method BCD (Bayesian Optimization for Compromise Detection) to solve our optimization problem. Our method has two main steps: (1) train a generative model (Variational AutoEncoder - VAE) to transform the high-dimensional data space to a non-linear low-dimensional data space and (2) use Bayesian optimization to find the optimal sensitive sample.
- Run "python 1_nn_vae" to train the local model (a neural network) and train VAE to transform the high-dimensional data space to a low-dimensional data space
- Run "python 2_attack_detect" to compare the detection rates of three methods: Using a random training image (Random), Local optimization method (VerIDeep), and our method (BCD)
- Run "python 3_plot_result" to plot the results: detection rate and sensitive image
Deepthi Kuttichira, Sunil Gupta, Dang Nguyen, Santu Rana, Svetha Venkatesh (2019). Detection of Compromised Models Using Bayesian Optimization. AI 2019, Adelaide, Australia. Springer LNCS, 11919, 485-496