# Emergence of Latent Binary Encoding in Deep Neural Network Classifiers
---

### Supplementary material to reproduce results

The aim of this notebook is to show how to train a network implementing a binary encoding layer, in order to demonstrate the emergence of binary encoding. Results can be compared against other model architectures in an ablation study to highlight the benefits of the implementation of a binary encoding layer. This is a demonstrative notebook, and results obtained here do not reproduce results presented in the manuscript, which require substantially longer trainings.

In [None]:
import pickle
import sys
import os 
import shutil
import yaml
import argparse
import copy
import importlib
import torch
import torchvision
import torchvision.transforms as transforms
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from binary_encoding.networks import ResNet
from binary_encoding.trainer import Trainer

In [None]:
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
if torch.cuda.is_available():
    num_gpus = torch.cuda.device_count()
    print(f"Number of GPUs available: {num_gpus}")
else:
    print("CUDA is not available. Training on CPU.")

In [None]:
def compute_mean_std(dataset):

    loader = torch.utils.data.DataLoader(dataset, batch_size=len(dataset), shuffle=False)
    
    data = next(iter(loader))[0].numpy()

    mean = np.mean(data, axis=(0, 2, 3))
    std = np.std(data, axis=(0, 2, 3))

    return mean, std

In this experiment we perform training with the CIFAR10 dataset. Data are normalized, and augmented during training.

In [None]:
name_dataset = 'CIFAR10'

torch_module = importlib.import_module("torchvision.datasets")
torch_dataset = getattr(torch_module, name_dataset)
transform = transforms.Compose([transforms.ToTensor()])
trainset = torch_dataset('../datasets', train=True, download=True, transform=transform)
trainset_mean, trainset_std = compute_mean_std(trainset)

transform_train = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(trainset_mean, trainset_std),
    transforms.RandomHorizontalFlip(),
    transforms.RandomCrop(trainset[0][0][0][0].shape[0], padding=4),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(trainset_mean, trainset_std),
])


trainset = torch_dataset ('../datasets', train=True, download=True, transform=transform)
testset = torch_dataset ('../datasets', train=False, download=True, transform=transform)

num_classes = len(set(trainset.classes))

We employ a ResNet18 architecture as backbone, implementig SiLU activation function in non-linear layers. In our ablation study the penultimte layer features 4 nodes, that is the minimum possible value of dimensions for a hypercube to have a number of vertices larger than 10 - that is the number of classes of the dataset we utilize.

Optimization is done with a AdamW optimizer with weight decay set to $5\times 10^{-4}$. The starting value of $\gamma$, a positive scalar which multiplies the compressing factor in the loss function, is initially set to $0.01$. This value is scheduled to be multiplied at each epoch by a factor of $1.3$. We use a batch size of $128$ and learnign rate is equal to $5\times 10^{-4}$.

Each training is done for $50$ epochs and logging time is set to $10$.

In [None]:
epochs = 50
logging = 10

In [None]:
architecture={
    
    "backbone": "ResNet",
    "backbone_model": 18,
    "hypers":{
        
        "nodes_head": [],
        "penultimate_nodes": 4,
        "activation": 'SiLU',
        
    }
}    

training={
    "hypers":{
        
        "batch_size": 128,
        "epochs": epochs,
        "gamma": 0.01,
        "gamma_scheduler_factor": 1.3,
        "gamma_scheduler_step": 1,
        "logging_pen": True,
        "logging": logging,
        "optimizer": "AdamW",
        "weight_decay": 0.0005,
        "lr":0.0005,      

  }
}

In [None]:
input_dims = trainset[0][0].shape[0]*trainset[0][0].shape[1]*trainset[0][0].shape[2]

We perform a different training for each of the 4 different models.

In [None]:
models = ['bin_enc', 'no_pen', 'lin_pen', 'nonlin_pen']

results = {}

for model in models:
    
    print(model)
    
    classifier = ResNet (
        model=model,
        architecture=architecture,
        num_classes=num_classes,
        input_dims = input_dims
        )
    classifier = classifier.to(device)
    trainer = Trainer(
        device=device, 
        network=classifier, 
        trainset=trainset,
        testset=testset,
        training_hypers=training['hypers'], 
        model=model, 
        encoding_metrics=True,
        store_penultimate=True,
        verbose=True)

    results[model] = trainer.fit()

In [None]:
color_bin = '#1f77b4'
color_lin = '#2ca02c'
color_no = '#ff7f0e'
color_nonlin = '#d62728'
sns.set(style="whitegrid")
alpha=0.3

Plots showing the accuracy of the networks for the training and testing set.

In [None]:
x = np.arange(logging, epochs+logging,logging)

fig, axes = plt.subplots(1, 2, figsize=(10, 4), sharex=True)

y = results['bin_enc']['accuracy_train'].reshape(-1)
sns.lineplot(x=x, y=y, marker='o',label='BinEnc', ax=axes[0], color=color_bin)
y = results['lin_pen']['accuracy_train'].reshape(-1)
sns.lineplot(x=x, y=y, marker='o',label='LinPen', color=color_lin, ax=axes[0])
y = results['nonlin_pen']['accuracy_train'].reshape(-1)
sns.lineplot(x=x, y=y, marker='o',label='NonlinPen', ax=axes[0], color=color_nonlin)
y = results['no_pen']['accuracy_train'].reshape(-1)
sns.lineplot(x=x, y=y, marker='o',label='NoPen', ax=axes[0], color=color_no)

y = results['bin_enc']['accuracy_test'].reshape(-1)
sns.lineplot(x=x, y=y, marker='o',label='BinEnc', ax=axes[1], color=color_bin)
y = results['lin_pen']['accuracy_test'].reshape(-1)
sns.lineplot(x=x, y=y, marker='o',label='LinPen', color=color_lin, ax=axes[1])
y = results['nonlin_pen']['accuracy_test'].reshape(-1)
sns.lineplot(x=x, y=y, marker='o',label='NonlinPen', ax=axes[1], color=color_nonlin)
y = results['no_pen']['accuracy_test'].reshape(-1)
sns.lineplot(x=x, y=y, marker='o',label='NoPen', ax=axes[1], color=color_no)

axes[0].set_title('Trainset')
axes[1].set_title('Testset')
axes[0].set_ylabel('Accuracy')
axes[0].set_xlabel('Epochs')
axes[1].set_xlabel('Epochs')

Auroc values and perturbation score as defined in the manucscript.

In [None]:
print('Auroc')
for model in models:
    print(model, ': ', np.around(results[model]['mahalanobis_score']['auroc'],4))

In [None]:
print('Perturbation score')
for model in models:
    print(model, ': ', np.around(results[model]['perturbation_score'],4))

Histograms built with the latent penultimate value of all elemements in the training set. In the BinEnc model, we can observe emergence of binary encoding.

In [None]:
fig, axes = plt.subplots(2, 2, figsize=(10, 5))

node_bin_enc = 0
node_lin_pen = 0
node_nonlin_pen = 0
node_no_pen = 0

axes[0][0].hist(results['bin_enc']['penultimate_train'][:,node_bin_enc], bins=100, density=True)
axes[0][1].hist(results['lin_pen']['penultimate_train'][:,node_lin_pen], bins=100, density=True)
axes[1][0].hist(results['nonlin_pen']['penultimate_train'][:,node_nonlin_pen], bins=100, density=True)
axes[1][1].hist(results['no_pen']['penultimate_train'][:,node_no_pen], bins=100, density=True)

axes[0,0].set_title(label='BinEnc',)
axes[0,1].set_title(label='LinPen',)
axes[1,0].set_title(label='NonlinPen')
axes[1,1].set_title(label='NoPen',)

plt.tight_layout()