In [19]:
# load necessary modules
%pylab inline
import sys

sys.path.insert(0, 'src')

# import white-box training modules
from mnist_nets.train_mnist_nets import main as build_mnist_nets
from mnist_nets.postprocess_mnist_nets import main as postprocess_mnist_nets

# import metamodel training modules
from mnist_metamodel.mnist_metamodel import config as config_metamodel
from mnist_metamodel.mnist_metamodel import main as train_metamodel

Populating the interactive namespace from numpy and matplotlib


# Towards Reverse-Engineering Black-Box Neural Networks, ICLR'18
## Authors: Seong Joon Oh, Max Augustin, Bernt Schiele, Mario Fritz (Code available at https://github.com/coallaoh/WhitenBlackBox)

### presented by Team Taiwan James Ku, Hannah Chen, Li-Pang Huang

---



In [13]:
# install package
!pip3 install torch==1.3.1
!pip3 install torchvision==0.4.2

You should consider upgrading via the 'pip install --upgrade pip' command.[0m
You should consider upgrading via the 'pip install --upgrade pip' command.[0m


## What is the data?
In this paper, they use a diverse set of white-box models with different model attributes.

## How to crack the black-box-model?
### 1. kennen-o
kennen-o first selects a fixed set of queries (images) from a dataset. Both during training
and testing, always these queries are submitted.
kennen-o learns a classifier $m_\theta$ to map from
the outputs from $f$, $[f(x_
i)]_{i=1···n}$ (n×10 dim for MNIST), to
the simultaneous prediction of the 12 attributes in $f$
### 2. kennen-i
kennen-i crafts a single query input $\widetilde{x}$ over the meta-training models that is trained to repurpose a digit classifier f into a model attribute classifier for a single attribute a. The crafted input drives the classifier to leak internal information via digit prediction. The learned input is submitted to the test black-box model g, and the attribute is predicted by reading off its digit prediction g(˜x). For example, kennen-i for max-pooling layer prediction crafts an input $x$ that is predicted as “1” for
generic MNIST digit classifiers with max-pooling layers and “0” for ones without.

### 3. kennen-io
Kennen-io overcomes the drawbacks of kennen-i that it can only predict one attribute at a time and that the number of predictable classes by attaching an additional interpretation module on top of the output. Our final method kennen-io combines kennen-i and kennen-o approaches: both
input generator and output interpreters are used. Being able to reason over multiple query outputs via MLP layers, kennen-io supports the optimisation of multiple query inputs as well.

In [14]:
# Creating white-box models
# The white-box models are trained by MNIST dataset

if False:
    build_mnist_nets()
    postprocess_mnist_nets()

Train metamodels, the metamodels will predict the architecture, hyperparameters, and data set of the black-box-models.

|              |  Code |    Attribute   |                   Values                   |
|:------------:|:-----:|:--------------:|:------------------------------------------:|
| Architecture |  act  |   Activation   |           ReLU, PReLU, ELU, Tanh           |
|              |  drop |     Dropout    |                   Yes, No                  |
|              |  pool |   Max pooling  |                   Yes, No                  |
|              |   ks  | Conv ker. size |                    3, 5                    |
|              | #conv |  #Conv layers  |                   2, 3, 4                  |
|              |  #FC  |   #FC layers   |                   2, 3, 4                  |
|              |  #par |   #Parameters  |           $$2^{14}, ..., 2^{21}$$          |
|              |  ens  |    Ensemble    |                   Yes, No                  |
|--------------|-------|----------------|--------------------------------------------|
| optimisation |  alg  |    Algorithm   |             SGD, ADAM, RMSprop             |
|              |   bs  |   Batch size   |                64, 128, 256                |
|--------------|-------|----------------|--------------------------------------------|
|     Data     | split |   Data split   | $$All_{0}, Half_{0/1}, Quarter_{0/1/2/3}$$ |
|              |  size |    Data size   |             All, Half, Quarter             |

In [15]:
# example_no = 1 indicates kennen-o method
example_no = 1

if example_no == 1:
    # kennen-o approach with 5000 training models and 100 queries with top-5 ranking outputs
    # under the Random (R) split.
    METHOD = 'm'  # Refers to kennen-o
    N_TRAIN = 5000  # Can be chosen in range [100,5000]
    N_EPOCH = 200  # Default number of epochs used in the paper
    N_QUERY = 100  # Can be chosen in range [1,1000]
    OUTPUT = 'ranking-5'  # ranking-k refers to top-k ranking output
    SPLIT = 'rand'
    SPLIT_TR = [1]  # Train on split 1
    SPLIT_TE = [0]  # Test on split 0
    GPU = None  # No GPU

```
Train Epoch: 200 [0/5000 (0%)]  Loss: 3.461459  Avg acc: 89.3%
Train Epoch: 200 [1100/5000 (22%)]      Loss: 2.787968  Avg acc: 92.0%
Train Epoch: 200 [2200/5000 (44%)]      Loss: 3.086313  Avg acc: 88.8%
Train Epoch: 200 [3300/5000 (66%)]      Loss: 3.748855  Avg acc: 83.9%
Train Epoch: 200 [4400/5000 (88%)]      Loss: 3.476222  Avg acc: 84.0%
Testing..
                 /etc/ens : 51.9% (RC 50.0%)
           /etc/data_size : 73.6% (RC 33.3%)
             /etc/n_param : 43.4% (RC 14.3%)
                     _____
                 /net/act : 67.2% (RC 25.0%)
                /net/drop : 95.5% (RC 50.0%)
                /net/n_fc : 72.6% (RC 33.3%)
              /net/n_conv : 63.0% (RC 33.3%)
                /net/pool : 96.9% (RC 50.0%)
                  /net/ks : 79.1% (RC 50.0%)
                     _____
          /opt/batch_size : 52.0% (RC 33.3%)
           /opt/optimiser : 66.5% (RC 33.3%)
                     _____
             /data/subset : 85.7% (RC 14.3%)
                     _____
                     _____
Test loss: 7.500472, avgacc: 67.37%
[51.9    73.6    43.4    67.2    95.5    72.6    63.0    96.9    79.1    52.0   66.5     85.7   ]
```

In [16]:
# example_no = 2 indicates kennen-i method
example_no = 2

if example_no == 2:
    # kennen-i approach with 3000 training models
    # under the Extrapolation (E) split, with splitting attribute {#conv}.
    METHOD = 'i'  # Refers to kennen-i
    N_TRAIN = 3000
    N_EPOCH = 200
    N_QUERY = 1  # kennen-i always submits a single query
    OUTPUT = 'argmax'  # kennen-i only requires argmax output
    SPLIT = 'ex^net/n_conv'  # Extrapolation (E) split, the format is 'ex^{attr1}^{attr2}'
    # where attr1 and attr2 are the splitting attributes. For the full list of attributes,
    # see the bottom of this script.
    SPLIT_TR = [0, 1]  # Train on splits 0 and 1 (corresponds to #conv=2 or 3 - see bottom of page)
    SPLIT_TE = [2]  # Test on split 2 (corresponds to #conv=4)
    GPU = 1  # GPU ID

Result 1
```
Train Epoch: 200 [0/3000 (0%)]  Loss: 47.693123 Avg acc: 25.5%
Train Epoch: 200 [810/3000 (27%)]       Loss: 82.349965 Avg acc: 28.4%
Train Epoch: 200 [1620/3000 (54%)]      Loss: 42.829766 Avg acc: 37.1%
Train Epoch: 200 [2430/3000 (81%)]      Loss: 29.483426 Avg acc: 34.9%
Testing..
Test batch: [950/1000 (95%)]
                 /etc/ens : 43.2% (RC 100.0%)
             /etc/n_param : 20.2% (RC 33.3%)
           /etc/data_size : 36.5% (RC 33.3%)
                     _____
             /data/subset : 15.9% (RC 14.3%)
                     _____
           /opt/optimiser : 32.4% (RC 33.3%)
          /opt/batch_size : 33.0% (RC 33.3%)
                     _____
                 /net/act : 27.1% (RC 25.0%)
                  /net/ks : 53.1% (RC 50.0%)
                /net/pool : 57.1% (RC 50.0%)
                /net/drop : 53.6% (RC 50.0%)
                /net/n_fc : 35.1% (RC 33.3%)
              /net/n_conv : 29.0% (RC 100.0%)
                     _____
                     _____
Test loss: 49.714847, avgacc: 32.86%
[43.2    20.2    36.5    15.9    32.4    33.0    27.1    53.1    57.1    53.6   35.1     29.0   ]
```

Result 2
```
Train Epoch: 200 [0/3000 (0%)]	Loss: 36.355844	Avg acc: 44.5%
Train Epoch: 200 [810/3000 (27%)]	Loss: 56.740074	Avg acc: 40.7%
Train Epoch: 200 [1630/3000 (54%)]	Loss: 949.217090	Avg acc: 44.3%
Train Epoch: 200 [2430/3000 (81%)]	Loss: 28.991809	Avg acc: 30.4%
Testing..
Test batch: [950/1000 (95%)]
              /net/n_conv : 32.8% (RC 100.0%)
                /net/pool : 55.2% (RC 50.0%)
                /net/drop : 56.7% (RC 50.0%)
                 /net/act : 25.5% (RC 25.0%)
                  /net/ks : 55.7% (RC 50.0%)
                /net/n_fc : 38.4% (RC 33.3%)
                     _____
           /etc/data_size : 37.5% (RC 33.3%)
             /etc/n_param : 24.7% (RC 33.3%)
                 /etc/ens : 50.8% (RC 100.0%)
                     _____
          /opt/batch_size : 33.2% (RC 33.3%)
           /opt/optimiser : 33.4% (RC 33.3%)
                     _____
             /data/subset : 16.1% (RC 14.3%)
                     _____
                     _____
Test loss: 48.639595, avgacc: 41.78%
[32.8	 55.2	 56.7	 25.5	 55.7	 38.4	 37.5	 24.7	 50.8	 33.2	 33.4	 16.1	]
```

In [17]:
# example_no = 3 indicates kennen-io method
example_no = 3

if example_no == 3:
    # kennen-io approach with 100 training models and 100 queries with score outputs
    # under the Extrapolation (E) split, with splitting attribute {#conv,#fc}.
    METHOD = 'mi'  # Refers to kennen-io
    N_TRAIN = 100
    N_EPOCH = 400  # Default number of epochs for kennen-io
    N_QUERY = 100
    OUTPUT = 'score'
    SPLIT = 'ex^net/n_conv^net/n_fc'  # Possible to set multiple splitting attributes separated via '^'
    SPLIT_TR = [0, 1]  # Train on #conv=#fc=2 or 3
    SPLIT_TE = [2]  # Test on #conf=#fc=4
    GPU = 0  # GPU ID

Result 1
```
Train Epoch: 400 [0/100 (0%)]   Loss: 3.739715  Avg acc: 91.1%
Testing..
Test batch: [870/1000 (87%)]
           /etc/data_size : 42.5% (RC 33.3%)
                 /etc/ens : 50.0% (RC 50.0%)
             /etc/n_param : 10.0% (RC 20.0%)
                     _____
           /opt/optimiser : 50.1% (RC 33.3%)
          /opt/batch_size : 39.2% (RC 33.3%)
                     _____
             /data/subset : 32.3% (RC 14.3%)
                     _____
                  /net/ks : 51.5% (RC 50.0%)
                /net/n_fc : 0.0% (RC 100.0%)
              /net/n_conv : 0.2% (RC 100.0%)
                 /net/act : 49.1% (RC 25.0%)
                /net/pool : 62.6% (RC 50.0%)
                /net/drop : 66.8% (RC 50.0%)
                     _____
                     _____
Test loss: 37.318427, avgacc: 36.78%
[42.5    50.0    10.0    50.1    39.2    32.3    51.5    0.0     0.2     49.1   62.6     66.8   ]
```

Result 2
```
Train Epoch: 400 [0/100 (0%)]   Loss: 3.576666  Avg acc: 95.3%
Testing..
Test batch: [860/1000 (86%)]
                /net/drop : 70.3% (RC 50.0%)
              /net/n_conv : 0.0% (RC 100.0%)
                  /net/ks : 52.6% (RC 50.0%)
                /net/n_fc : 0.0% (RC 100.0%)
                /net/pool : 62.9% (RC 50.0%)
                 /net/act : 45.9% (RC 25.0%)
                     _____
          /opt/batch_size : 34.9% (RC 33.3%)
           /opt/optimiser : 50.1% (RC 33.3%)
                     _____
             /etc/n_param : 12.5% (RC 20.0%)
           /etc/data_size : 39.7% (RC 33.3%)
                 /etc/ens : 50.0% (RC 50.0%)
                     _____
             /data/subset : 38.1% (RC 14.3%)
                     _____
                     _____
Test loss: 37.420744, avgacc: 38.66%
[70.3    0.0     52.6    0.0     62.9    45.9    34.9    50.1    12.5    39.7    50.0    38.1   ]
```

In [18]:
co = config_metamodel(
    control=dict(
        method=METHOD,
        data=dict(
            name='dnet10000',
            subset=N_TRAIN,
            eval=1000,
        ),
        seed=0,
        i=dict(
            init='randval',
            clip=[0, 1],
            noise='U1',
            opt=dict(
                optimiser='SGD',
                lr=0.1,
                weight_decay=0.0,
                batch_size=10,
            ),
        ),
        m=dict(
            name='mlp_3_1000',
            opt=dict(
                optimiser='SGD',
                lr=1e-4,
                weight_decay=0.01,
                batch_size=100,
            ),
        ),
        opt=dict(
            epochs=N_EPOCH,
            sequence=['m', 200, 50, 50],
            # sequence=['m', 1, 1, 1],
        ),
        setup=dict(
            nquery=N_QUERY,
            qseed=0,
            target='all',
            outrep=OUTPUT,
            split=SPLIT,
            splitidtr=SPLIT_TR,
            splitidte=SPLIT_TE,
        ),
    ),
    conf=dict(
        exp_phase='mnist_metamodel',
        balanced_eval=True,
        test_batch_size=10,
        test_epoch=1,
        save=False,
        overridecache=True,
        mode='train',
        gpu=GPU,
    )
)

## Discussion of results

### 1. kennen-o
The average accuracy is 67.37% which is similar to the accuracy in the paper. In our implementation, the output we used is ranking-5; in the paper, the autohr used ranking-10.

|         |  act  |  drop  |  pool  |  ks    |  #conv |  #fc   |  #par  |  ens   |  alg   |  bs  |  size  |  split |  avg|
|:------|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
| Impelment result |  51.9  |  73.6  |  43.4  |  67.2  |  95.5  |  72.6  |  63.0  |  96.9  |  79.1  |  52.0  |  66.5  |  85.7  | 67.37  |
| Paper |  63.7  |  93.8  |  90.8  |  80.0  |  63.0  |  73.7  |  44.1  |  62.4  |  65.3  |  47.0  |  66.2  |  86.6  | 69.7  |

### 2. kennen-i
The average accuracy are 32.86% and 41.78 which are both lower than the accuracy in the paper. In kennen-i, the query method is single query, so the result is not stable.

|         |  act  |  drop  |  pool  |  ks    |  #conv |  #fc   |  #par  |  ens   |  alg   |  bs  |  size  |  split |  avg|
|:------|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
| Impelment result 1 |  43.2  |  20.2  |  36.5  |  15.9  |  32.4  |  33.0  |  27.1  |  53.1  |  57.1  |  53.6  |  35.1  |  29.0  | 32.86 |
| Impelment result 2 |  32.8  |  55.2  |  56.7  |  25.5  |  55.7  |  38.4  |  37.5  |  24.7  |  50.8  |  33.2  |  33.4  |  16.1  | 41.78 |
| Paper |  43.5  |  77.0  |  94.8  |  88.5  |  54.5  |  41.0  |  32.3  |  46.5  |  45.7  |  37.0  |  42.6  |  29.3  |  52.7  |

### 3. kennen-io
The average accuracy are 36.78% and 38.66% which are both much lower than the accuracy in the paper. The model we implemented is overfitting with 91.1% training accuracy.

|         |  act  |  drop  |  pool  |  ks    |  #conv |  #fc   |  #par  |  ens   |  alg   |  bs  |  size  |  split |  avg|
|:------|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
| Impelment result 1 |  42.5  |  50.0  |  10.0  |  50.1  |  39.2  |  32.3  |  51.5  |  0.0   |  0.2   |  49.1  |  62.6  |  66.8 | 36.78 |
| Impelment result 2 |  70.3  |  0.0  |  52.6  |  0.0  |  62.9  |  45.9  |  34.9  |  50.1  |  12.5  |  39.7  |  50.0  |  38.1  |  38.66  |
| Paper |  88.4  |  95.8  |  99.5  |  97.7  |  80.3  |  80.2  |  45.2  |  60.2  |  79.3  |  54.3  |  84.8  |  95.6  |  80.1  |