Skip to content

mlech26l/qnn_robustness_benchmarks

Repository files navigation

Benchmark networks of the AAAI 2021 paper Scalable Verification of Quantized Neural Networks

Link to the arXiv version

When using or adapting the code and benchmarks, please cite

@inproceedings{henzinger2021scalable,
  title={Scalable Verification of Quantized Neural Networks},
  author={Henzinger, Thomas A and Lechner, Mathias and {\v{Z}}ikeli{\'c}, {\DJ}or{\dj}e},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={35},
  number={5},
  pages={3787--3795},
  year={2021}
}

Setup

Installing boolector

git clone https://github.com/boolector/boolector
cd boolector
./contrib/setup-cadical.sh
./contrib/setup-btor2tools.sh
./configure.sh --python --py3
cd build
make

Running the Bit-vector SMT encodings on the benchmarks

# Checking MNIST test set sample 1 with epsilon=1
python3 check_mnist_robustness.py --sample_id 1 --eps 1
# Checking MNIST test set sample 115 with epsilon=2
python3 check_mnist_robustness.py --sample_id 115 --eps 2
# Checking MNIST test set sample 200 with epsilon=3
python3 check_mnist_robustness.py --sample_id 200 --eps 3 --dataset fashion-mnist

Using the benchmarks for other method

The file iterate_samples.py shows how to load and use the networks. It also iterates over the exact samples of the benchmark and output the corresponding epsilons to use.

python3 iterate_samples.py --dataset fashion-mnist

When implementing some verification method of the networks of these benchmarks, make sure they exactly match the bit-exact quantized rounding semantics in each layer. The files quantization_layers.py and quantization_utils.py should serve as a good starting point.

Benchmark description

The benchmark consists of robustness verification queries.

Benchmark 1 MNIST
Network size 784-64-10
Input quantization 8-bit
Network quanization (weights and activations) 6-bit
Samples 400 (100 per epsilon value)
Epsilons (L-infinity norm) 1,2,3,4
Weights weights/mnist_mlp.h5
Benchmark 2 Fashion-MNIST
Network size 784-64-10
Input quantization 8-bit
Network quanization (weights and activations) 6-bit
Samples 300 (50/100 per epsilon value)
Epsilons (L-infinity norm) 1,2,3,4
Weights weights/fashion-mnist_mlp.h5

Quantitative results from the paper

MNIST samples 0-99

Type Test sample ID
Misclassified (1) 18
Timeout (0) -
Vulnerable (0) -
Robust (99) Remaining

MNIST samples 100-199

Type Test sample ID
Misclassified (1) 149
Timeout (5) 104, 119, 175, 193, 195
Vulnerable (3) 115, 151, 158
Robust (91) Remaining

MNIST samples 200-299

Type Test sample ID
Misclassified (4) 217, 241, 247, 259
Timeout (25) 204, 210, 211, 218, 221, 224, 227, 232, 233, 234, 235, 243, 244, 250, 251, 255, 257, 264, 266, 273, 274, 275, 289, 290, 299
Vulnerable (1) 282
Robust (70) Remaining

MNIST samples 300-399

Type Test sample ID
Misclassified (3) 321, 340, 381
Timeout (43) 300, 301, 303, 307, 308, 322, 324, 325, 326, 328, 329, 335, 336, 337, 339, 341, 344, 345, 349, 350, 352, 354, 357, 358, 359, 362, 366, 368, 370, 372, 373, 377, 376, 379, 383, 385, 386, 388, 389, 391, 393, 394, 397
Vulnerable (1) 320
Robust (53) Remaining

Fashion-MNIST samples 0-99

Type Test sample ID
Misclassified (13) 12, 17, 23, 25, 26, 29, 40, 48, 51, 66, 68, 89, 98
Timeout (11) 4, 21, 28, 42, 43, 44, 49, 57, 67, 73, 91
Vulnerable (0) -
Robust (76) Remaining

Fashion-MNIST samples 100-199

Type Test sample ID
Misclassified (10) 107, 127, 145, 147, 150, 153, 163, 183, 192, 193
Timeout (17) 101, 117, 122, 129, 136, 149, 151, 157, 166, 170, 172, 181, 188, 191, 195, 197, 198
Vulnerable (1) 135
Robust (72) Remaining

Fashion-MNIST samples 200-249

Type Test sample ID
Misclassified (7) 222, 226, 239, 241, 244, 247, 249
Timeout (16) 203, 205, 202, 213, 217, 219, 221, 224, 228, 230, 235, 238, 243, 245, 246, 248
Vulnerable (1) 227
Robust (26) Remaining

Fashion-MNIST samples 300-349

Type Test sample ID
Misclassified (6) 316, 322, 324, 325, 332, 344
Timeout (26) 300, 301, 302, 304, 312, 313, 309, 308, 311, 315, 318, 319, 320, 321, 323, 326, 329, 331, 333, 334, 336, 337, 338, 341, 342, 348
Vulnerable (0) -
Robust (18) Remaining

Using other SMT solvers

Each benchmark SMT formula can be exported and then checked by a different SMT solver than boolector

For instance, CVC4 (note: this will not terminate)

python3 check_mnist_robustness.py --sample_id 115 --eps 2 --export
./cvc4-1.8-x86_64-linux-opt  --lang smt smt2/mnist_0115.smt2

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages