# Differentially Private Image Classification With ResNet

Author: [Josh Messitte](https://joshmessitte.dev)

Date Created: 2/12/2022

Description: Training a ResNet to classify images from the CIFAR-10 dataset in an epsilon-differentially private manner using PyTorch and Opacus. 

⨁ **UGA CSCI 8960** ⨁

## Project Overview
This project will:


*   Identify & tune important parameters for ϵ-DP training.
*   Use Opacus to identify layers incompatible with ϵ-DP.
*   Train a ϵ-DP ResNet for CIFAR-10 image classification using RMSprop.
*   Maximize accuracy while maintaining privacy.



Ignore Warnings for Clarity

In [1]:
import warnings
warnings.simplefilter("ignore")

## Tuning Hyper-parameters

The normal hyper-parameters for the model we will use are:

*   Step Size: AKA learning rate. Amount the weights are updated during model training.



In [2]:
STEP_SIZE = 1e-3

# Also need to specify number of epochs and batch size
EPOCHS = 20
BATCH_SIZE = 512
MAX_PHYSICAL_BATCH_SIZE = 128

And the following privacy-specific hyper-parameters:
*   Clipping Threshold: The maximum L2 norm (sum of squared values) to which per sample gradients are clipped.
*   Epsilon: The target (maximum) epsilon to use when training and testing the model.
*   Delta: The targer δ for our ϵ-DP guarantee. This is the probability of any information accidentally being leaked. Set to 1e-5.

In [3]:
CLIPPING_THRESHOLD = 1.2
EPSILON = 10
DELTA = 1e-5

## Loading the Data
Here we are loading the CIFAR-10 dataset. No data augmentation is utilized as some prior works suggest that models trained using data augmentation may underestimate the resulting risk of a privacy attack (Yu,2021).

In [4]:
import torch
import torchvision
import torchvision.transforms as transforms

# These values are specific to the CIFAR-10 dataset
CIFAR10_MEAN = (0.4914, 0.4822, 0.4465)
CIFAR10_STD_DEV = (0.2023, 0.1994, 0.2010)

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(CIFAR10_MEAN, CIFAR10_STD_DEV),
])

Then we can load the images and convert the PILImages to data of type Tensor.

In [5]:
from torchvision.datasets import CIFAR10

DATA_ROOT = '../cifar10'

train_dataset = CIFAR10(
    root=DATA_ROOT, train=True, download=True, transform=transform)

train_loader = torch.utils.data.DataLoader(
    train_dataset,
    batch_size=BATCH_SIZE,
)

test_dataset = CIFAR10(
    root=DATA_ROOT, train=False, download=True, transform=transform)

test_loader = torch.utils.data.DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE,
    shuffle=False,
)

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ../cifar10/cifar-10-python.tar.gz


  0%|          | 0/170498071 [00:00<?, ?it/s]

Extracting ../cifar10/cifar-10-python.tar.gz to ../cifar10
Files already downloaded and verified


## Model

Now we will create our ResNet model using the torchvision package. We will be using a ResNet18 which specifies 18 layers in the network.

In [6]:
from torchvision import models

model = models.resnet18(num_classes=10)

## Identifying Incompatible Layers

Some layers are not compatible with Opacus due to privacy implications. For example, we discussed in class how BatchNorm layers cannot be used because ϵ-DP relies upon using only neighboring datasets.

In [7]:
# if opacus is not installed, it can be installed by specifying a command line statement: 
!pip3 install opacus

from opacus.validators import ModuleValidator

errors = ModuleValidator.validate(model, strict=False)
errors[-5:]

Collecting opacus
  Downloading opacus-1.0.2-py3-none-any.whl (145 kB)
[?25l[K     |██▎                             | 10 kB 23.3 MB/s eta 0:00:01[K     |████▌                           | 20 kB 29.2 MB/s eta 0:00:01[K     |██████▊                         | 30 kB 25.2 MB/s eta 0:00:01[K     |█████████                       | 40 kB 15.0 MB/s eta 0:00:01[K     |███████████▎                    | 51 kB 9.4 MB/s eta 0:00:01[K     |█████████████▌                  | 61 kB 10.4 MB/s eta 0:00:01[K     |███████████████▉                | 71 kB 8.1 MB/s eta 0:00:01[K     |██████████████████              | 81 kB 8.9 MB/s eta 0:00:01[K     |████████████████████▎           | 92 kB 9.5 MB/s eta 0:00:01[K     |██████████████████████▋         | 102 kB 8.3 MB/s eta 0:00:01[K     |████████████████████████▉       | 112 kB 8.3 MB/s eta 0:00:01[K     |███████████████████████████     | 122 kB 8.3 MB/s eta 0:00:01[K     |█████████████████████████████▍  | 133 kB 8.3 MB/s eta 0:00:01[K

[opacus.validators.errors.ShouldReplaceModuleError("BatchNorm cannot support training with differential privacy. The reason for it is that BatchNorm makes each sample's normalized value depend on its peers in a batch, ie the same sample x will get normalized to a different value depending on who else is on its batch. Privacy-wise, this means that we would have to put a privacy mechanism there too. While it can in principle be done, there are now multiple normalization layers that do not have this issue: LayerNorm, InstanceNorm and their generalization GroupNorm are all privacy-safe since they don't have this property.We offer utilities to automatically replace BatchNorms to GroupNorms and we will release pretrained models to help transition, such as GN-ResNet ie a ResNet using GroupNorm, pretrained on ImageNet"),
 opacus.validators.errors.ShouldReplaceModuleError("BatchNorm cannot support training with differential privacy. The reason for it is that BatchNorm makes each sample's normal

We can remove incompatible layers using ModuleValidator.fix()

In [8]:
model = ModuleValidator.fix(model)
ModuleValidator.validate(model,strict=False)

[]

## Utilizing GPUs

SaturnCloud supports GPUs, so we can specify our device to be CUDA-compatible.



In [9]:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = model.to(device)


## Optimization Method & Loss Criterion

We can specify our loss criterion (Cross Entropy Loss) and our optimization method (RMSprop):

In [10]:
import torch.nn as nn
import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.RMSprop(model.parameters(), lr=STEP_SIZE)

## Privacy Engine

We now take our privacy-specific hyper-parameters and attach them to the privacy engine provided by Opacus.

In [11]:
from opacus import PrivacyEngine

privacy_engine = PrivacyEngine()

model, optimizer, train_loader = privacy_engine.make_private_with_epsilon(
    module=model,
    optimizer=optimizer,
    data_loader=train_loader,
    epochs=EPOCHS,
    target_epsilon=EPSILON,
    target_delta=DELTA,
    max_grad_norm=CLIPPING_THRESHOLD,
)

print(f"Using sigma={optimizer.noise_multiplier} and C={CLIPPING_THRESHOLD}")

Using sigma=0.6249267578125 and C=1.2


## Accuracy Calculator

This method will measure the accuracy of our model (training $ testing).

In [12]:
def accuracy(preds, labels):
    return (preds == labels).mean()

## Training Function

This function trains the model for one epoch.

In [13]:
from numpy.lib.function_base import append
import numpy as np
from opacus.utils.batch_memory_manager import BatchMemoryManager


def train(model, train_loader, optimizer, epoch, device):
    model.train()
    criterion = nn.CrossEntropyLoss()

    losses = []
    top1_acc = []
    passed = []
    
    with BatchMemoryManager(
        data_loader=train_loader, 
        max_physical_batch_size=MAX_PHYSICAL_BATCH_SIZE, 
        optimizer=optimizer
    ) as memory_safe_data_loader:

        for i, (images, target) in enumerate(memory_safe_data_loader):   
            
            optimizer.zero_grad()
            images = images.to(device)
            target = target.to(device)

            # compute output
            output = model(images)
            loss = criterion(output, target)

            preds = np.argmax(output.detach().cpu().numpy(), axis=1)
            labels = target.detach().cpu().numpy()

            # measure accuracy and record loss
            acc = accuracy(preds, labels)

            losses.append(loss.item())
            top1_acc.append(acc)

            loss.backward()
            optimizer.step()

            if (i+1) % 200 == 0 and epoch not in passed:
                epsilon = privacy_engine.get_epsilon(DELTA)
                passed.append(epoch)
                print(
                    f"\tTrain Epoch: {epoch} \t"
                    f"Loss: {np.mean(losses):.6f} "
                    f"Acc@1: {np.mean(top1_acc) * 100:.6f} "
                    f"(ε = {epsilon:.2f}, δ = {DELTA})"
                )

## Test Function

Our test function will validate our model on the 10k large test dataset.

In [14]:
def test(model, test_loader, device):
    model.eval()
    criterion = nn.CrossEntropyLoss()
    losses = []
    top1_acc = []

    with torch.no_grad():
        for images, target in test_loader:
            images = images.to(device)
            target = target.to(device)

            output = model(images)
            loss = criterion(output, target)
            preds = np.argmax(output.detach().cpu().numpy(), axis=1)
            labels = target.detach().cpu().numpy()
            acc = accuracy(preds, labels)

            losses.append(loss.item())
            top1_acc.append(acc)

    top1_avg = np.mean(top1_acc)

    print(
        f"\tTest set:"
        f"Loss: {np.mean(losses):.6f} "
        f"Acc: {top1_avg * 100:.6f} "
    )
    return np.mean(top1_acc)

## Train the ResNet

This method trains the ResNet model on the 50k training images using our RMSprop optimizer.

In [None]:
from tqdm.notebook import tqdm

for epoch in tqdm(range(EPOCHS), desc="Epoch", unit="epoch"):
    train(model, train_loader, optimizer, epoch + 1, device)



Epoch:   0%|          | 0/20 [00:00<?, ?epoch/s]

	Train Epoch: 1 	Loss: 2.623895 Acc@1: 16.575254 (ε = 3.86, δ = 1e-05)
	Train Epoch: 2 	Loss: 1.756260 Acc@1: 38.197733 (ε = 4.65, δ = 1e-05)
	Train Epoch: 3 	Loss: 1.718924 Acc@1: 43.967947 (ε = 5.17, δ = 1e-05)
	Train Epoch: 4 	Loss: 1.715446 Acc@1: 46.374967 (ε = 5.61, δ = 1e-05)
	Train Epoch: 5 	Loss: 1.712425 Acc@1: 48.704685 (ε = 5.99, δ = 1e-05)
	Train Epoch: 6 	Loss: 1.700526 Acc@1: 50.244815 (ε = 6.35, δ = 1e-05)
	Train Epoch: 7 	Loss: 1.728286 Acc@1: 50.629706 (ε = 6.67, δ = 1e-05)
	Train Epoch: 8 	Loss: 1.681021 Acc@1: 52.934605 (ε = 6.99, δ = 1e-05)
	Train Epoch: 9 	Loss: 1.686935 Acc@1: 53.318167 (ε = 7.28, δ = 1e-05)
	Train Epoch: 10 	Loss: 1.700559 Acc@1: 53.395482 (ε = 7.56, δ = 1e-05)
	Train Epoch: 11 	Loss: 1.680947 Acc@1: 54.309624 (ε = 7.83, δ = 1e-05)
	Train Epoch: 12 	Loss: 1.709707 Acc@1: 54.660555 (ε = 8.09, δ = 1e-05)
	Train Epoch: 13 	Loss: 1.658979 Acc@1: 55.813932 (ε = 8.35, δ = 1e-05)
	Train Epoch: 14 	Loss: 1.660034 Acc@1: 56.476357 (ε = 8.59, δ = 1e-05)
	

## Test the ResNet on Test Data
Now we test the trained model on the 10k testing images.

In [None]:
top1_acc = test(model, test_loader, device)

	Test set:Loss: 1.775980 Acc: 55.926585 


## References
*   Yu, D., Zhang, H., Chen, W., Yin, J., & Liu, T.-Y. (2021). How Does Data Augmentation Affect Privacy in Machine Learning? arXiv [cs.LG]. Opgehaal van http://arxiv.org/abs/2007.10567
*   He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. arXiv [cs.CV]. Opgehaal van http://arxiv.org/abs/1512.03385

