# Chapter 1: Pytorch DP Demo

| Chapter  | Colab   | Kaggle          | Gradient      | Studio Lab             | Binder             |
|:---------|:--------|:----------------|:--------------|:-----------------------|:-------------------|
| [Chapter 1: Pytorch DP Demo](1_privacy/Chapter_1_Pytorch_DP_Demo.ipynb)                         | [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/matthew-mcateer/practicing_trustworthy_machine_learning/blob/main/1_privacy/Chapter_1_Pytorch_DP_Demo.ipynb)            | [![Kaggle](https://kaggle.com/static/images/open-in-kaggle.svg)](https://kaggle.com/kernels/welcome?src=https://github.com/matthew-mcateer/practicing_trustworthy_machine_learning/blob/main/1_privacy/Chapter_1_Pytorch_DP_Demo.ipynb)            | [![Gradient](https://assets.paperspace.io/img/gradient-badge.svg)](https://console.paperspace.com/github/matthew-mcateer/practicing_trustworthy_machine_learning/blob/main/1_privacy/Chapter_1_Pytorch_DP_Demo.ipynb)            | [![Open In SageMaker Studio Lab](https://studiolab.sagemaker.aws/studiolab.svg)](https://studiolab.sagemaker.aws/import/github/matthew-mcateer/practicing_trustworthy_machine_learning/blob/main/1_privacy/Chapter_1_Pytorch_DP_Demo.ipynb)            | [![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/matthew-mcateer/practicing_trustworthy_machine_learning/HEAD?urlpath=https%3A%2F%2Fgithub.com%2Fmatthew-mcateer%2Fpracticing_trustworthy_machine_learning%2Fblob%2Fmain%2F1_privacy%2FChapter_1_Pytorch_DP_Demo.ipynb)              |

<!--
Originally found on GitHub at https://github.com/matthew-mcateer/practicing_trustworthy_machine_learning/blob/main/1_privacy/Chapter_1_Pytorch_DP_Demo.ipynb
-->



In [1]:
!pip -qq install torchcsprng==0.1.3+cu101 -f https://download.pytorch.org/whl/torch_stable.html
!pip -qq install opacus
!pip -qq install datasets
!pip -qq install transformers
!pip -qq install watermark
%load_ext watermark
%watermark -a "Practicing Trustworthy machine Learning" -u -d -v -m -p torchcsprng,opacus,datasets,transformers

[K     |████████████████████████████████| 1.6 MB 837 kB/s 
[K     |████████████████████████████████| 577.3 MB 5.5 kB/s 
[K     |████████████████████████████████| 216 kB 4.7 MB/s 
[K     |██████████████████████████████  | 834.1 MB 1.1 MB/s 
[K     |████████████████████████████████| 887.4 MB 1.7 kB/s 
[K     |████████████████████████████████| 21.0 MB 88.5 MB/s 
[K     |████████████████████████████████| 557.1 MB 9.9 kB/s 
[K     |████████████████████████████████| 849 kB 99.5 MB/s 
[K     |████████████████████████████████| 317.1 MB 29 kB/s 
[K     |████████████████████████████████| 890.2 MB 5.9 kB/s 
[K     |████████████████████████████████| 452 kB 4.4 MB/s 
[K     |████████████████████████████████| 213 kB 95.9 MB/s 
[K     |████████████████████████████████| 132 kB 94.0 MB/s 
[K     |████████████████████████████████| 182 kB 96.4 MB/s 
[K     |████████████████████████████████| 127 kB 90.8 MB/s 
[K     |████████████████████████████████| 5.8 MB 4.9 MB/s 
[K     |█████████████

In [2]:
!nvidia-smi

Tue Jan  3 10:53:50 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|   0  A100-SXM4-40GB      Off  | 00000000:00:04.0 Off |                    0 |
| N/A   32C    P0    54W / 400W |      0MiB / 40536MiB |      0%      Default |
|                               |                      |             Disabled |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Proces

In [3]:
import numpy as np
import torch
import torch.nn as nn
import torch.optim as optim
from datasets import load_dataset
from opacus import PrivacyEngine
from opacus.utils.uniform_sampler import UniformWithReplacementSampler
from torch.functional import F
from torch.nn.utils.rnn import pad_sequence
from torch.utils.data import DataLoader
from tqdm import tqdm, trange
from transformers import DistilBertTokenizerFast
from transformers import (
    DistilBertConfig,
    DistilBertTokenizer,
    DistilBertForSequenceClassification,
)

In [4]:
# A Much smaller model used for testing
class SampleNet(nn.Module):
    def __init__(self, vocab_size: int):
        super().__init__()
        self.emb = nn.Embedding(vocab_size, 16)
        self.pool = nn.AdaptiveAvgPool1d(1)
        self.fc1 = nn.Linear(16, 16)
        self.fc2 = nn.Linear(16, 2)

    def forward(self, x):
        x = self.emb(x)
        x = x.transpose(1, 2)
        x = self.pool(x).squeeze()
        x = self.fc1(x)
        x = F.relu(x)
        x = self.fc2(x)
        return x

    def name(self):
        return "SampleNet"


In [5]:
def binary_accuracy(preds, y):
    correct = (y.long() == torch.argmax(preds, dim=1)).float()
    acc = correct.sum() / len(correct)
    return acc


def padded_collate(batch, padding_idx=0):
    x = pad_sequence(
        [elem["input_ids"] for elem in batch],
        batch_first=True,
        padding_value=padding_idx,
    )
    y = torch.stack([elem["label"] for elem in batch]).long()
    return x, y


In [6]:

#@markdown input batch size for test (default: 64)
BATCH_SIZE_TEST=64 #@param {"type":"integer"}


#@markdown sample rate used for batch construction (default: 0.00256)
SAMPLE_RATE=0.00256  #@param {"type":"number"}


#@markdown umber of epochs to train (default: 10)
EPOCHS=50  #@param {"type":"integer"}


#@markdown learning rate (default: .02)
LR=5e-5  #@param {"type":"number"}


#@markdown Noise multiplier (default 0.56)
SIGMA=0.56  #@param {"type":"number"}


#@markdown Clip per-sample gradients to this norm (default 1.0)
MAX_PER_SAMPLE_GRAD_NORM=1.5  #@param {"type":"number"}

#@markdown Target epsilon (default: 1.0)
TARGET_EPSILON=12.0  #@param {"type":"number"}

#@markdown Target delta (default: 1e-5)
TARGET_DELTA=8e-5  #@param {"type":"number"}


#@markdown Longer sequences will be cut to this length (default: 256)
MAX_SEQUENCE_LENGTH=256  #@param {"type":"integer"}


#@markdown GPU ID for this process (default: 'cuda')
DEVICE="cuda"  #@param {"type":"string"}


#@markdown Save the trained model (default: false)
SAVE_MODEL=True  #@param {"type":"boolean"}


#@markdown Disable privacy training and just train with vanilla optimizer
DISABLE_DP=False  #@param {"type":"boolean"}


#@markdown Enable Secure RNG to have trustworthy privacy guarantees. Comes at a performance cost
SECURE_RNG=True  #@param {"type":"boolean"}


#@markdown Where IMDB is/will be stored
DATA_ROOT="../imdb"  #@param {"type":"string"}


#@markdown number of data loading workers (default: 2)
WORKERS=2 #@param {"type":"integer"}

     

In [7]:
def train(model, train_loader, optimizer, epoch, privacy_engine):
    criterion = nn.CrossEntropyLoss()
    losses = []
    accuracies = []
    device = torch.device(DEVICE)
    model = model.train().to(device)

    for data, label in tqdm(train_loader):
        data = data.to(device)
        label = label.to(device)

        optimizer.zero_grad()
        predictions = model(data).logits  # .squeeze(1)
        loss = criterion(predictions, label)
        acc = binary_accuracy(predictions, label)

        loss.backward()
        optimizer.step()

        losses.append(loss.item())
        accuracies.append(acc.item())

    if not DISABLE_DP and privacy_engine is not None:
        alphas_list = [1 + x / 10.0 for x in range(1, 100)]
        alphas_list += list(range(12, 64))
        epsilon, best_alpha = privacy_engine.accountant.get_privacy_spent(
            delta=TARGET_DELTA, alphas=alphas_list
        )
        print(
            f"Train Epoch: {epoch} \t"
            f"Train Loss: {np.mean(losses):.6f} "
            f"Train Accuracy: {np.mean(accuracies):.6f} "
            f"(\u03B5 = {epsilon:.2f}, "
            f"\u03B4 = {TARGET_DELTA}) for \u03B1 = {best_alpha}"
        )
    else:
        print(
            f"Train Epoch: {epoch} \t"
            f" Loss: {np.mean(losses):.6f} ] \t"
            f" Accuracy: {np.mean(accuracies):.6f}"
        )

In [8]:
def evaluate(model, test_loader):
    criterion = nn.CrossEntropyLoss()
    losses = []
    accuracies = []
    device = torch.device(DEVICE)
    model = model.eval().to(device)

    with torch.no_grad():
        for data, label in tqdm(test_loader):
            data = data.to(device)
            label = label.to(device)

            predictions = model(data).logits

            loss = criterion(predictions, label)
            acc = binary_accuracy(predictions, label)

            losses.append(loss.item())
            accuracies.append(acc.item())

    mean_accuracy = np.mean(accuracies)
    print(
        "\nTest set: Average loss: {:.4f}, Accuracy: {:.2f}%\n".format(
            np.mean(losses), np.mean(accuracies) * 100
        )
    )
    return mean_accuracy     

In [9]:
device = torch.device(DEVICE)

raw_dataset = load_dataset("imdb", cache_dir=DATA_ROOT)
tokenizer = DistilBertTokenizerFast.from_pretrained(
    "distilbert-base-cased", do_lower_case=True
)
dataset = raw_dataset.map(
    lambda x: tokenizer(
        x["text"], truncation=True, max_length=MAX_SEQUENCE_LENGTH
    ),
    batched=True,
)
dataset.set_format(type="torch", columns=["input_ids", "label"])

train_dataset = dataset["train"]
test_dataset = dataset["test"]

Downloading builder script:   0%|          | 0.00/4.31k [00:00<?, ?B/s]

Downloading metadata:   0%|          | 0.00/2.17k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/7.59k [00:00<?, ?B/s]

Downloading and preparing dataset imdb/plain_text to /content/../imdb/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1...


Downloading data:   0%|          | 0.00/84.1M [00:00<?, ?B/s]

Generating train split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/25000 [00:00<?, ? examples/s]

Generating unsupervised split:   0%|          | 0/50000 [00:00<?, ? examples/s]

Dataset imdb downloaded and prepared to /content/../imdb/imdb/plain_text/1.0.0/2fdd8b9bcadd6e7055e742a706876ba43f19faee861df134affd7a3f60fc38a1. Subsequent calls will reuse this data.


  0%|          | 0/3 [00:00<?, ?it/s]

Downloading:   0%|          | 0.00/29.0 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/213k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/436k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/411 [00:00<?, ?B/s]

  0%|          | 0/25 [00:00<?, ?ba/s]

  0%|          | 0/25 [00:00<?, ?ba/s]

  0%|          | 0/50 [00:00<?, ?ba/s]

In [10]:
if SECURE_RNG:
    try:
        import torchcsprng as prng
    except ImportError as e:
        message = (
            "Need to install the torchcsprng package! "
            "Documentation: https://github.com/pytorch/csprng#installation"
        )
        raise ImportError(message) from e

    generator = prng.create_random_device_generator("/dev/urandom")

else:
    generator = None

In [11]:
train_loader = DataLoader(
    train_dataset,
    num_workers=WORKERS,
    generator=generator,
    batch_sampler=UniformWithReplacementSampler(
        num_samples=len(train_dataset),
        sample_rate=SAMPLE_RATE,
        generator=generator,
    ),
    collate_fn=padded_collate,
    pin_memory=True,
)

In [12]:
test_loader = torch.utils.data.DataLoader(
    test_dataset,
    batch_size=BATCH_SIZE_TEST,
    shuffle=False,
    num_workers=WORKERS,
    collate_fn=padded_collate,
    pin_memory=True,
)

In [13]:
model = DistilBertForSequenceClassification.from_pretrained(
    "distilbert-base-cased",
    config=DistilBertConfig.from_pretrained(
        "distilbert-base-cased", num_labels=2
    ),
)
trainable_layers = [
    model.distilbert.transformer.layer[-1],
    model.pre_classifier,
    model.classifier,
]
total_params = 0
trainable_params = 0

for p in model.parameters():
    p.requires_grad = False
    total_params += p.numel()

for layer in trainable_layers:
    for p in layer.parameters():
        p.requires_grad = True
        trainable_params += p.numel()

print(f"Total parameters count: {total_params}")  # ~65M
print(f"Trainable parameters count: {trainable_params}")  # ~7M

Downloading:   0%|          | 0.00/263M [00:00<?, ?B/s]

Some weights of the model checkpoint at distilbert-base-cased were not used when initializing DistilBertForSequenceClassification: ['vocab_layer_norm.weight', 'vocab_transform.bias', 'vocab_layer_norm.bias', 'vocab_projector.weight', 'vocab_transform.weight', 'vocab_projector.bias']
- This IS expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing DistilBertForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of DistilBertForSequenceClassification were not initialized from the model checkpoint at distilbert-base-cased and are newly initialized: ['classifier.bias', 'pre_classifier.weight', 'pre_classifier

Total parameters count: 65783042
Trainable parameters count: 7680002


In [14]:
# Move the model to appropriate device
model = model.to(device)
# Set the model to train mode (HuggingFace models load in eval mode)
model = model.train()
optimizer = optim.Adam(model.parameters(), lr=LR)

In [17]:
if not DISABLE_DP:
    privacy_engine = PrivacyEngine(
        secure_mode=SECURE_RNG,
        accountant="rdp"
    )

    model, optimizer, train_loader = privacy_engine.make_private_with_epsilon(
        module=model,
        optimizer=optimizer,
        data_loader=train_loader,
        max_grad_norm=MAX_PER_SAMPLE_GRAD_NORM,
        target_delta=TARGET_DELTA,
        target_epsilon=TARGET_EPSILON,
        epochs=EPOCHS,
    )
else:
    privacy_engine = None

mean_accuracy = 0
for epoch in range(1, EPOCHS + 1):
    train(model, train_loader, optimizer, epoch, privacy_engine)
    mean_accuracy = evaluate(model, test_loader)

if not DISABLE_DP:
    torch.save(mean_accuracy, "distilbert_imdb_class_dp.pt")
else:
    torch.save(mean_accuracy, "distilbert_imdb_class_nodp.pt")

100%|██████████| 390/390 [01:02<00:00,  6.25it/s]


Train Epoch: 1 	Train Loss: 0.705121 Train Accuracy: 0.501089 (ε = 3.81, δ = 8e-05) for α = 3.3


100%|██████████| 391/391 [00:38<00:00, 10.14it/s]



Test set: Average loss: 0.7052, Accuracy: 49.95%



100%|██████████| 390/390 [01:02<00:00,  6.24it/s]


Train Epoch: 2 	Train Loss: 0.706853 Train Accuracy: 0.500664 (ε = 4.28, δ = 8e-05) for α = 3.1


100%|██████████| 391/391 [00:38<00:00, 10.15it/s]



Test set: Average loss: 0.7065, Accuracy: 49.95%



100%|██████████| 390/390 [01:02<00:00,  6.22it/s]


Train Epoch: 3 	Train Loss: 0.718036 Train Accuracy: 0.505899 (ε = 4.63, δ = 8e-05) for α = 3.1


100%|██████████| 391/391 [00:38<00:00, 10.16it/s]



Test set: Average loss: 0.7346, Accuracy: 49.95%



100%|██████████| 390/390 [01:02<00:00,  6.24it/s]


Train Epoch: 4 	Train Loss: 0.728596 Train Accuracy: 0.500094 (ε = 4.93, δ = 8e-05) for α = 3.0


100%|██████████| 391/391 [00:38<00:00, 10.16it/s]



Test set: Average loss: 0.7290, Accuracy: 49.95%



100%|██████████| 390/390 [01:02<00:00,  6.27it/s]


Train Epoch: 5 	Train Loss: 0.722679 Train Accuracy: 0.499265 (ε = 5.21, δ = 8e-05) for α = 2.9


100%|██████████| 391/391 [00:38<00:00, 10.16it/s]



Test set: Average loss: 0.7281, Accuracy: 49.95%



100%|██████████| 390/390 [01:02<00:00,  6.25it/s]


Train Epoch: 6 	Train Loss: 0.729241 Train Accuracy: 0.500317 (ε = 5.46, δ = 8e-05) for α = 2.9


100%|██████████| 391/391 [00:38<00:00, 10.15it/s]



Test set: Average loss: 0.7330, Accuracy: 49.95%



100%|██████████| 390/390 [01:02<00:00,  6.26it/s]


Train Epoch: 7 	Train Loss: 0.732324 Train Accuracy: 0.501354 (ε = 5.70, δ = 8e-05) for α = 2.9


100%|██████████| 391/391 [00:38<00:00, 10.16it/s]



Test set: Average loss: 0.7443, Accuracy: 49.95%



100%|██████████| 390/390 [01:02<00:00,  6.25it/s]


Train Epoch: 8 	Train Loss: 0.745775 Train Accuracy: 0.498575 (ε = 5.92, δ = 8e-05) for α = 2.8


100%|██████████| 391/391 [00:38<00:00, 10.15it/s]



Test set: Average loss: 0.7334, Accuracy: 49.95%



100%|██████████| 390/390 [01:03<00:00,  6.18it/s]


Train Epoch: 9 	Train Loss: 0.748469 Train Accuracy: 0.502048 (ε = 6.13, δ = 8e-05) for α = 2.8


100%|██████████| 391/391 [00:38<00:00, 10.16it/s]



Test set: Average loss: 0.7418, Accuracy: 49.95%



100%|██████████| 390/390 [01:02<00:00,  6.23it/s]


Train Epoch: 10 	Train Loss: 0.737759 Train Accuracy: 0.502161 (ε = 6.34, δ = 8e-05) for α = 2.8


100%|██████████| 391/391 [00:38<00:00, 10.14it/s]



Test set: Average loss: 0.7470, Accuracy: 50.10%



100%|██████████| 390/390 [01:02<00:00,  6.27it/s]


Train Epoch: 11 	Train Loss: 0.775953 Train Accuracy: 0.509029 (ε = 6.53, δ = 8e-05) for α = 2.7


100%|██████████| 391/391 [00:38<00:00, 10.14it/s]



Test set: Average loss: 0.8231, Accuracy: 50.23%



100%|██████████| 390/390 [01:02<00:00,  6.27it/s]


Train Epoch: 12 	Train Loss: 0.787445 Train Accuracy: 0.522784 (ε = 6.72, δ = 8e-05) for α = 2.7


100%|██████████| 391/391 [00:38<00:00, 10.16it/s]



Test set: Average loss: 0.8064, Accuracy: 52.75%



100%|██████████| 390/390 [01:02<00:00,  6.23it/s]


Train Epoch: 13 	Train Loss: 0.729222 Train Accuracy: 0.594750 (ε = 6.90, δ = 8e-05) for α = 2.7


100%|██████████| 391/391 [00:38<00:00, 10.13it/s]



Test set: Average loss: 0.6436, Accuracy: 65.85%



100%|██████████| 390/390 [01:02<00:00,  6.21it/s]


Train Epoch: 14 	Train Loss: 0.595840 Train Accuracy: 0.706420 (ε = 7.08, δ = 8e-05) for α = 2.7


100%|██████████| 391/391 [00:38<00:00, 10.15it/s]



Test set: Average loss: 0.5334, Accuracy: 73.90%



100%|██████████| 390/390 [01:02<00:00,  6.28it/s]


Train Epoch: 15 	Train Loss: 0.565539 Train Accuracy: 0.733119 (ε = 7.26, δ = 8e-05) for α = 2.6


100%|██████████| 391/391 [00:38<00:00, 10.16it/s]



Test set: Average loss: 0.5372, Accuracy: 75.46%



100%|██████████| 390/390 [01:02<00:00,  6.23it/s]


Train Epoch: 16 	Train Loss: 0.596986 Train Accuracy: 0.742604 (ε = 7.42, δ = 8e-05) for α = 2.6


100%|██████████| 391/391 [00:38<00:00, 10.16it/s]



Test set: Average loss: 0.5645, Accuracy: 76.77%



100%|██████████| 390/390 [01:02<00:00,  6.25it/s]


Train Epoch: 17 	Train Loss: 0.610877 Train Accuracy: 0.763291 (ε = 7.59, δ = 8e-05) for α = 2.6


100%|██████████| 391/391 [00:38<00:00, 10.14it/s]



Test set: Average loss: 0.6031, Accuracy: 77.83%



100%|██████████| 390/390 [01:02<00:00,  6.27it/s]


Train Epoch: 18 	Train Loss: 0.616431 Train Accuracy: 0.778267 (ε = 7.75, δ = 8e-05) for α = 2.6


100%|██████████| 391/391 [00:38<00:00, 10.13it/s]



Test set: Average loss: 0.6032, Accuracy: 79.36%



100%|██████████| 390/390 [01:02<00:00,  6.24it/s]


Train Epoch: 19 	Train Loss: 0.644330 Train Accuracy: 0.785597 (ε = 7.91, δ = 8e-05) for α = 2.6


100%|██████████| 391/391 [00:38<00:00, 10.13it/s]



Test set: Average loss: 0.6170, Accuracy: 80.45%



100%|██████████| 390/390 [01:02<00:00,  6.26it/s]


Train Epoch: 20 	Train Loss: 0.640001 Train Accuracy: 0.794417 (ε = 8.08, δ = 8e-05) for α = 2.6


100%|██████████| 391/391 [00:38<00:00, 10.14it/s]



Test set: Average loss: 0.6344, Accuracy: 81.05%



100%|██████████| 390/390 [01:02<00:00,  6.28it/s]


Train Epoch: 21 	Train Loss: 0.657507 Train Accuracy: 0.805678 (ε = 8.23, δ = 8e-05) for α = 2.5


100%|██████████| 391/391 [00:38<00:00, 10.13it/s]



Test set: Average loss: 0.6495, Accuracy: 81.70%



100%|██████████| 390/390 [01:02<00:00,  6.27it/s]


Train Epoch: 22 	Train Loss: 0.656055 Train Accuracy: 0.809804 (ε = 8.38, δ = 8e-05) for α = 2.5


100%|██████████| 391/391 [00:38<00:00, 10.13it/s]



Test set: Average loss: 0.6579, Accuracy: 82.07%



100%|██████████| 390/390 [01:01<00:00,  6.30it/s]


Train Epoch: 23 	Train Loss: 0.651776 Train Accuracy: 0.818775 (ε = 8.52, δ = 8e-05) for α = 2.5


100%|██████████| 391/391 [00:38<00:00, 10.14it/s]



Test set: Average loss: 0.6665, Accuracy: 82.45%



100%|██████████| 390/390 [01:01<00:00,  6.30it/s]


Train Epoch: 24 	Train Loss: 0.670505 Train Accuracy: 0.814783 (ε = 8.67, δ = 8e-05) for α = 2.5


100%|██████████| 391/391 [00:38<00:00, 10.15it/s]



Test set: Average loss: 0.6775, Accuracy: 82.79%



100%|██████████| 390/390 [01:02<00:00,  6.24it/s]


Train Epoch: 25 	Train Loss: 0.685977 Train Accuracy: 0.817913 (ε = 8.81, δ = 8e-05) for α = 2.5


100%|██████████| 391/391 [00:38<00:00, 10.15it/s]



Test set: Average loss: 0.6788, Accuracy: 83.12%



100%|██████████| 390/390 [01:02<00:00,  6.20it/s]


Train Epoch: 26 	Train Loss: 0.666799 Train Accuracy: 0.826351 (ε = 8.96, δ = 8e-05) for α = 2.5


100%|██████████| 391/391 [00:38<00:00, 10.13it/s]



Test set: Average loss: 0.6750, Accuracy: 83.32%



100%|██████████| 390/390 [01:02<00:00,  6.22it/s]


Train Epoch: 27 	Train Loss: 0.662549 Train Accuracy: 0.829331 (ε = 9.10, δ = 8e-05) for α = 2.5


100%|██████████| 391/391 [00:38<00:00, 10.13it/s]



Test set: Average loss: 0.6780, Accuracy: 83.59%



100%|██████████| 390/390 [01:02<00:00,  6.24it/s]


Train Epoch: 28 	Train Loss: 0.678609 Train Accuracy: 0.827686 (ε = 9.25, δ = 8e-05) for α = 2.5


100%|██████████| 391/391 [00:38<00:00, 10.12it/s]



Test set: Average loss: 0.6796, Accuracy: 83.79%



100%|██████████| 390/390 [01:02<00:00,  6.20it/s]


Train Epoch: 29 	Train Loss: 0.650340 Train Accuracy: 0.835321 (ε = 9.38, δ = 8e-05) for α = 2.4


100%|██████████| 391/391 [00:38<00:00, 10.11it/s]



Test set: Average loss: 0.6824, Accuracy: 84.11%



100%|██████████| 390/390 [01:02<00:00,  6.25it/s]


Train Epoch: 30 	Train Loss: 0.657498 Train Accuracy: 0.837515 (ε = 9.51, δ = 8e-05) for α = 2.4


100%|██████████| 391/391 [00:38<00:00, 10.12it/s]



Test set: Average loss: 0.6824, Accuracy: 84.21%



100%|██████████| 390/390 [01:02<00:00,  6.29it/s]


Train Epoch: 31 	Train Loss: 0.673283 Train Accuracy: 0.834330 (ε = 9.65, δ = 8e-05) for α = 2.4


100%|██████████| 391/391 [00:38<00:00, 10.13it/s]



Test set: Average loss: 0.6828, Accuracy: 84.28%



100%|██████████| 390/390 [01:02<00:00,  6.21it/s]


Train Epoch: 32 	Train Loss: 0.656418 Train Accuracy: 0.837652 (ε = 9.78, δ = 8e-05) for α = 2.4


100%|██████████| 391/391 [00:38<00:00, 10.14it/s]



Test set: Average loss: 0.6765, Accuracy: 84.42%



100%|██████████| 390/390 [01:02<00:00,  6.23it/s]


Train Epoch: 33 	Train Loss: 0.670619 Train Accuracy: 0.837099 (ε = 9.91, δ = 8e-05) for α = 2.4


100%|██████████| 391/391 [00:38<00:00, 10.15it/s]



Test set: Average loss: 0.6718, Accuracy: 84.47%



100%|██████████| 390/390 [01:02<00:00,  6.21it/s]


Train Epoch: 34 	Train Loss: 0.668367 Train Accuracy: 0.839475 (ε = 10.04, δ = 8e-05) for α = 2.4


100%|██████████| 391/391 [00:38<00:00, 10.15it/s]



Test set: Average loss: 0.6707, Accuracy: 84.57%



100%|██████████| 390/390 [01:02<00:00,  6.25it/s]


Train Epoch: 35 	Train Loss: 0.649230 Train Accuracy: 0.841713 (ε = 10.17, δ = 8e-05) for α = 2.4


100%|██████████| 391/391 [00:38<00:00, 10.15it/s]



Test set: Average loss: 0.6697, Accuracy: 84.72%



100%|██████████| 390/390 [01:02<00:00,  6.27it/s]


Train Epoch: 36 	Train Loss: 0.656333 Train Accuracy: 0.841943 (ε = 10.30, δ = 8e-05) for α = 2.4


100%|██████████| 391/391 [00:38<00:00, 10.14it/s]



Test set: Average loss: 0.6660, Accuracy: 84.73%



100%|██████████| 390/390 [01:02<00:00,  6.21it/s]


Train Epoch: 37 	Train Loss: 0.652314 Train Accuracy: 0.846147 (ε = 10.43, δ = 8e-05) for α = 2.4


100%|██████████| 391/391 [00:38<00:00, 10.15it/s]



Test set: Average loss: 0.6637, Accuracy: 84.74%



100%|██████████| 390/390 [01:02<00:00,  6.24it/s]


Train Epoch: 38 	Train Loss: 0.653156 Train Accuracy: 0.842681 (ε = 10.56, δ = 8e-05) for α = 2.4


100%|██████████| 391/391 [00:38<00:00, 10.15it/s]



Test set: Average loss: 0.6628, Accuracy: 84.80%



100%|██████████| 390/390 [01:03<00:00,  6.18it/s]


Train Epoch: 39 	Train Loss: 0.662168 Train Accuracy: 0.842832 (ε = 10.69, δ = 8e-05) for α = 2.3


100%|██████████| 391/391 [00:38<00:00, 10.14it/s]



Test set: Average loss: 0.6594, Accuracy: 84.87%



100%|██████████| 390/390 [01:02<00:00,  6.24it/s]


Train Epoch: 40 	Train Loss: 0.646581 Train Accuracy: 0.847853 (ε = 10.81, δ = 8e-05) for α = 2.3


100%|██████████| 391/391 [00:38<00:00, 10.14it/s]



Test set: Average loss: 0.6571, Accuracy: 84.97%



100%|██████████| 390/390 [01:02<00:00,  6.27it/s]


Train Epoch: 41 	Train Loss: 0.643337 Train Accuracy: 0.849339 (ε = 10.93, δ = 8e-05) for α = 2.3


100%|██████████| 391/391 [00:38<00:00, 10.14it/s]



Test set: Average loss: 0.6541, Accuracy: 85.07%



100%|██████████| 390/390 [01:02<00:00,  6.20it/s]


Train Epoch: 42 	Train Loss: 0.638152 Train Accuracy: 0.847371 (ε = 11.04, δ = 8e-05) for α = 2.3


100%|██████████| 391/391 [00:38<00:00, 10.13it/s]



Test set: Average loss: 0.6511, Accuracy: 85.11%



100%|██████████| 390/390 [01:02<00:00,  6.24it/s]


Train Epoch: 43 	Train Loss: 0.652712 Train Accuracy: 0.844472 (ε = 11.16, δ = 8e-05) for α = 2.3


100%|██████████| 391/391 [00:38<00:00, 10.13it/s]



Test set: Average loss: 0.6449, Accuracy: 85.11%



100%|██████████| 390/390 [01:02<00:00,  6.21it/s]


Train Epoch: 44 	Train Loss: 0.625926 Train Accuracy: 0.849177 (ε = 11.28, δ = 8e-05) for α = 2.3


100%|██████████| 391/391 [00:38<00:00, 10.14it/s]



Test set: Average loss: 0.6447, Accuracy: 85.28%



100%|██████████| 390/390 [01:02<00:00,  6.26it/s]


Train Epoch: 45 	Train Loss: 0.646866 Train Accuracy: 0.846606 (ε = 11.40, δ = 8e-05) for α = 2.3


100%|██████████| 391/391 [00:38<00:00, 10.13it/s]



Test set: Average loss: 0.6408, Accuracy: 85.36%



100%|██████████| 390/390 [01:02<00:00,  6.22it/s]


Train Epoch: 46 	Train Loss: 0.656614 Train Accuracy: 0.843718 (ε = 11.52, δ = 8e-05) for α = 2.3


100%|██████████| 391/391 [00:38<00:00, 10.15it/s]



Test set: Average loss: 0.6357, Accuracy: 85.42%



100%|██████████| 390/390 [01:02<00:00,  6.21it/s]


Train Epoch: 47 	Train Loss: 0.642717 Train Accuracy: 0.846314 (ε = 11.64, δ = 8e-05) for α = 2.3


100%|██████████| 391/391 [00:38<00:00, 10.13it/s]



Test set: Average loss: 0.6330, Accuracy: 85.47%



100%|██████████| 390/390 [01:01<00:00,  6.30it/s]


Train Epoch: 48 	Train Loss: 0.626406 Train Accuracy: 0.850581 (ε = 11.76, δ = 8e-05) for α = 2.3


100%|██████████| 391/391 [00:38<00:00, 10.14it/s]



Test set: Average loss: 0.6321, Accuracy: 85.53%



100%|██████████| 390/390 [01:03<00:00,  6.16it/s]


Train Epoch: 49 	Train Loss: 0.613910 Train Accuracy: 0.849415 (ε = 11.88, δ = 8e-05) for α = 2.3


100%|██████████| 391/391 [00:38<00:00, 10.13it/s]



Test set: Average loss: 0.6304, Accuracy: 85.54%



100%|██████████| 390/390 [01:02<00:00,  6.21it/s]


Train Epoch: 50 	Train Loss: 0.606496 Train Accuracy: 0.854095 (ε = 12.00, δ = 8e-05) for α = 2.3


100%|██████████| 391/391 [00:38<00:00, 10.15it/s]


Test set: Average loss: 0.6335, Accuracy: 85.63%






## Resources

- [Opacus 0.x to 1.x Migration guide](https://github.com/pytorch/opacus/blob/main/Migration_Guide.md)
- [Opacus - Building LSTM Name Classifier](https://opacus.ai/tutorials/building_lstm_name_classifier)
- [HuggingFace - Text Classification](https://colab.research.google.com/github/huggingface/notebooks/blob/master/examples/text_classification.ipynb#scrollTo=n9qywopnIrJH)
- [HuggingFace Transformers - Fine-funing with native PyTorch Tensorflow](https://huggingface.co/docs/transformers/custom_datasets#fine-tuning-with-native-pytorch-tensorflow)
- [HuggingFace Transformers - Custom Datasets (PyTorch)](https://colab.research.google.com/github/huggingface/notebooks/blob/master/transformers_doc/pytorch/custom_datasets.ipynb)