## Training a differentially private LSTM model for name classification

In this tutorial we will build a differentially-private LSTM model to classify names to their source languages, which is the same task as in the tutorial **NLP From Scratch** (https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html). Since the objective of this tutorial is to demonstrate the effective use of an LSTM with privacy guarantees, we will be utilizing it in place of the bare-bones RNN model defined in the original tutorial. Specifically, we use the `DPLSTM` module from `opacus.layers.dp_lstm` to facilitate calculation of the per-example gradients, which are utilized in the addition of noise during application of differential privacy. `DPLSTM` has the same API and functionality as the `nn.LSTM`, with some restrictions (ex. we currently support single layers, the full list is given below).  

## Dataset

First, let us download the dataset of names and their associated language labels as given in https://pytorch.org/tutorials/intermediate/char_rnn_classification_tutorial.html. We train our differentially-private LSTM on the same dataset as in that tutorial.

In [1]:
import os
import requests

NAMES_DATASET_URL = "https://download.pytorch.org/tutorial/data.zip"
DATA_DIR = "names"

import zipfile
import urllib

def download_and_extract(dataset_url, data_dir):
    print("Downloading and extracting ...")
    filename = "data.zip"

    urllib.request.urlretrieve(dataset_url, filename)
    with zipfile.ZipFile(filename) as zip_ref:
        zip_ref.extractall(data_dir)
    os.remove(filename)
    print("Completed!")

download_and_extract(NAMES_DATASET_URL, DATA_DIR)

Downloading and extracting ...
Completed!


In [2]:
names_folder = os.path.join(DATA_DIR, 'data', 'names')
all_filenames = []

for language_file in os.listdir(names_folder):
    all_filenames.append(os.path.join(names_folder, language_file))
    
print(os.listdir(names_folder))

['Arabic.txt', 'Chinese.txt', 'Czech.txt', 'Dutch.txt', 'English.txt', 'French.txt', 'German.txt', 'Greek.txt', 'Irish.txt', 'Italian.txt', 'Japanese.txt', 'Korean.txt', 'Polish.txt', 'Portuguese.txt', 'Russian.txt', 'Scottish.txt', 'Spanish.txt', 'Vietnamese.txt']


In [3]:
import torch
import torch.nn as nn

class CharByteEncoder(nn.Module):
    """
    This encoder takes a UTF-8 string and encodes its bytes into a Tensor. It can also
    perform the opposite operation to check a result.
    Examples:
    >>> encoder = CharByteEncoder()
    >>> t = encoder('Ślusàrski')  # returns tensor([256, 197, 154, 108, 117, 115, 195, 160, 114, 115, 107, 105, 257])
    >>> encoder.decode(t)  # returns "<s>Ślusàrski</s>"
    """

    def __init__(self):
        super().__init__()
        self.start_token = "<s>"
        self.end_token = "</s>"
        self.pad_token = "<pad>"

        self.start_idx = 256
        self.end_idx = 257
        self.pad_idx = 258

    def forward(self, s: str, pad_to=0) -> torch.LongTensor:
        """
        Encodes a string. It will append a start token <s> (id=self.start_idx) and an end token </s>
        (id=self.end_idx).
        Args:
            s: The string to encode.
            pad_to: If not zero, pad by appending self.pad_idx until string is of length `pad_to`.
                Defaults to 0.
        Returns:
            The encoded LongTensor of indices.
        """
        encoded = s.encode()
        n_pad = pad_to - len(encoded) if pad_to > len(encoded) else 0
        return torch.LongTensor(
            [self.start_idx]
            + [c for c in encoded]  # noqa
            + [self.end_idx]
            + [self.pad_idx for _ in range(n_pad)]
        )

    def decode(self, char_ids_tensor: torch.LongTensor) -> str:
        """
        The inverse of `forward`. Keeps the start, end and pad indices.
        """
        char_ids = char_ids_tensor.cpu().detach().tolist()

        out = []
        buf = []
        for c in char_ids:
            if c < 256:
                buf.append(c)
            else:
                if buf:
                    out.append(bytes(buf).decode())
                    buf = []
                if c == self.start_idx:
                    out.append(self.start_token)
                elif c == self.end_idx:
                    out.append(self.end_token)
                elif c == self.pad_idx:
                    out.append(self.pad_token)

        if buf:  # in case some are left
            out.append(bytes(buf).decode())
        return "".join(out)

    def __len__(self):
        """
        The length of our encoder space. This is fixed to 256 (one byte) + 3 special chars
        (start, end, pad).
        Returns:
            259
        """
        return 259

## Training / Validation Set Preparation

In [4]:
from torch.nn.utils.rnn import pad_sequence

def padded_collate(batch, padding_idx=0):
    x = pad_sequence(
        [elem[0] for elem in batch], batch_first=True, padding_value=padding_idx
    )
    y = torch.stack([elem[1] for elem in batch]).long()

    return x, y

In [5]:
from torch.utils.data import Dataset
from pathlib import Path


class NamesDataset(Dataset):
    def __init__(self, root):
        self.root = Path(root)

        self.labels = list({langfile.stem for langfile in self.root.iterdir()})
        self.labels_dict = {label: i for i, label in enumerate(self.labels)}
        self.encoder = CharByteEncoder()
        self.samples = self.construct_samples()

    def __getitem__(self, i):
        return self.samples[i]

    def __len__(self):
        return len(self.samples)

    def construct_samples(self):
        samples = []
        for langfile in self.root.iterdir():
            label_name = langfile.stem
            label_id = self.labels_dict[label_name]
            with open(langfile, "r") as fin:
                for row in fin:
                    samples.append(
                        (self.encoder(row.strip()), torch.tensor(label_id).long())
                    )
        return samples

    def label_count(self):
        cnt = Counter()
        for _x, y in self.samples:
            label = self.labels[int(y)]
            cnt[label] += 1
        return cnt


VOCAB_SIZE = 256 + 3  # 256 alternatives in one byte, plus 3 special characters.


We split the dataset into a 80-20 split for training and validation. 

In [6]:
secure_rng = False
train_split = 0.8
test_every = 5
batch_size = 800

ds = NamesDataset(names_folder)
train_len = int(train_split * len(ds))
test_len = len(ds) - train_len

print(f"{train_len} samples for training, {test_len} for testing")

if secure_rng:
    try:
        import torchcsprng as prng
    except ImportError as e:
        msg = (
            "To use secure RNG, you must install the torchcsprng package! "
            "Check out the instructions here: https://github.com/pytorch/csprng#installation"
        )
        raise ImportError(msg) from e

    generator = prng.create_random_device_generator("/dev/urandom")

else:
    generator = None

train_ds, test_ds = torch.utils.data.random_split(
    ds, [train_len, test_len], generator=generator
)

16059 samples for training, 4015 for testing


In [7]:
from torch.utils.data import DataLoader
from opacus.utils.uniform_sampler import UniformWithReplacementSampler

sample_rate = batch_size / len(train_ds)

train_loader = DataLoader(
    train_ds,
    num_workers=8,
    pin_memory=True,
    generator=generator,
    batch_sampler=UniformWithReplacementSampler(
        num_samples=len(train_ds),
        sample_rate=sample_rate,
        generator=generator,
    ),
    collate_fn=padded_collate,
)

test_loader = DataLoader(
    test_ds,
    batch_size=2 * batch_size,
    shuffle=False,
    num_workers=8,
    pin_memory=True,
    collate_fn=padded_collate,
)

After splitting the dataset into a training and a validation set, we now have to convert the data into a numeric form suitable for training the LSTM model. For each name, we set a maximum sequence length of 15, and if a name is longer than the threshold, we truncate it (this rarely happens this dataset !). If a name is smaller than the threshold, we add a dummy `#` character to pad it to the desired length. We also batch the names in the dataset and set a batch size of 256 for all the experiments in this tutorial. The function `line_to_tensor()` returns a tensor of shape [15, 256] where each element is the index (in `all_letters`) of the corresponding character.

## Training/Evaluation Cycle 

The training and the evaluation functions `train()` and `test()` are defined below. During the training loop, the per-example gradients are computed and the parameters are updated subsequent to gradient clipping (to bound their sensitivity) and addition of noise.  

In [8]:
from statistics import mean

def train(model, criterion, optimizer, train_loader, epoch, device="cuda:0"):
    accs = []
    losses = []
    for x, y in tqdm(train_loader):
        x = x.to(device)
        y = y.to(device)

        logits = model(x)
        loss = criterion(logits, y)
        loss.backward()

        optimizer.step()
        optimizer.zero_grad()

        preds = logits.argmax(-1)
        n_correct = float(preds.eq(y).sum())
        batch_accuracy = n_correct / len(y)

        accs.append(batch_accuracy)
        losses.append(float(loss))

    printstr = (
        f"\t Epoch {epoch}. Accuracy: {mean(accs):.6f} | Loss: {mean(losses):.6f}"
    )
    try:
        privacy_engine = optimizer.privacy_engine
        epsilon, best_alpha = privacy_engine.get_privacy_spent()
        printstr += f" | (ε = {epsilon:.2f}, δ = {privacy_engine.target_delta}) for α = {best_alpha}"
    except AttributeError:
        pass
    print(printstr)
    return


def test(model, test_loader, privacy_engine, device="cuda:0"):
    accs = []
    with torch.no_grad():
        for x, y in tqdm(test_loader):
            x = x.to(device)
            y = y.to(device)

            preds = model(x).argmax(-1)
            n_correct = float(preds.eq(y).sum())
            batch_accuracy = n_correct / len(y)

            accs.append(batch_accuracy)
    printstr = "\n----------------------------\n" f"Test Accuracy: {mean(accs):.6f}"
    if privacy_engine:
        epsilon, best_alpha = privacy_engine.get_privacy_spent()
        printstr += f" (ε = {epsilon:.2f}, δ = {privacy_engine.target_delta}) for α = {best_alpha}"
    print(printstr + "\n----------------------------\n")
    return


## Hyper-parameters

There are two sets of hyper-parameters associated with this model. The first are hyper-parameters which we would expect in any machine learning training, such as the learning rate and batch size. The second set are related to the privacy engine, where for example we define the amount of noise added to the gradients (`noise_multiplier`), and the maximum L2 norm to which the per-sample gradients are clipped (`max_grad_norm`). 

In [9]:
# Training hyper-parameters
epochs = 50
learning_rate = 2.0

# Privacy engine hyper-parameters
max_per_sample_grad_norm = 1.5
delta = 8e-5
epsilon = 12.0

## Model

We define the name classification model in the cell below. Note that it is a simple char-LSTM classifier, where the input characters are passed through an `nn.Embedding` layer, and are subsequently input to the DPLSTM. 

In [10]:
import torch
from torch import nn
from opacus.layers import DPLSTM

class CharNNClassifier(nn.Module):
    def __init__(
        self,
        embedding_size,
        hidden_size,
        output_size,
        num_lstm_layers=1,
        bidirectional=False,
        vocab_size=VOCAB_SIZE,
    ):
        super().__init__()

        self.embedding_size = embedding_size
        self.hidden_size = hidden_size
        self.output_size = output_size
        self.vocab_size = vocab_size

        self.embedding = nn.Embedding(vocab_size, embedding_size)
        self.lstm = DPLSTM(
            embedding_size,
            hidden_size,
            num_layers=num_lstm_layers,
            bidirectional=bidirectional,
            batch_first=True,
        )
        self.out_layer = nn.Linear(hidden_size, output_size)

    def forward(self, x, hidden=None):
        x = self.embedding(x)  # -> [B, T, D]
        x, _ = self.lstm(x, hidden)  # -> [B, T, H]
        x = x[:, -1, :]  # -> [B, H]
        x = self.out_layer(x)  # -> [B, C]
        return x

We now proceed to instantiate the objects (privacy engine, model and optimizer) for our differentially-private LSTM training.  However, the `nn.LSTM` is replaced with a `DPLSTM` module which enables us to calculate per-example gradients. 

In [11]:
# Set the device to run on a GPU
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

# Define classifier parameters
embedding_size = 64
hidden_size = 128  # Number of neurons in hidden layer after LSTM
n_lstm_layers = 1
bidirectional_lstm = False

model = CharNNClassifier(
    embedding_size,
    hidden_size,
    len(ds.labels),
    n_lstm_layers,
    bidirectional_lstm,
).to(device)

## Defining the privacy engine, optimizer and loss criterion for the problem

In [12]:
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

In [13]:
from opacus import PrivacyEngine

privacy_engine = PrivacyEngine(
    model,
    sample_rate=sample_rate,
    max_grad_norm=max_per_sample_grad_norm,
    target_delta=delta,
    target_epsilon=epsilon,
    epochs=epochs,
    secure_rng=secure_rng,
)
privacy_engine.attach(optimizer)

  "Secure RNG turned off. This is perfectly fine for experimentation as it allows "


## Training the name classifier with privacy

Finally we can start training ! We will be training for 50 epochs iterations (where each epoch corresponds to a pass over the whole dataset). We will be reporting the privacy epsilon every `test_every` epochs. We will also benchmark this differentially-private model against a model without privacy and obtain almost identical performance. Further, the private model trained with Opacus incurs only minimal overhead in training time, with the differentially-private classifier only slightly slower (by a couple of minutes) than the non-private model.

In [14]:
from tqdm import tqdm

print("Train stats: \n")
for epoch in tqdm(range(epochs)):
    train(model, criterion, optimizer, train_loader, epoch, device=device)
    if test_every:
        if epoch % test_every == 0:
            test(model, test_loader, privacy_engine, device=device)

test(model, test_loader, privacy_engine, device=device)

Train stats: 



	 Epoch 0. Accuracy: 0.429625 | Loss: 2.185477 | (ε = 2.59, δ = 8e-05) for α = 5.6



----------------------------
Test Accuracy: 0.470559 (ε = 2.59, δ = 8e-05) for α = 5.6
----------------------------



	 Epoch 1. Accuracy: 0.468625 | Loss: 1.940962 | (ε = 3.07, δ = 8e-05) for α = 5.2


	 Epoch 2. Accuracy: 0.468188 | Loss: 1.928116 | (ε = 3.46, δ = 8e-05) for α = 5.0


	 Epoch 3. Accuracy: 0.468250 | Loss: 1.907378 | (ε = 3.81, δ = 8e-05) for α = 4.8


	 Epoch 4. Accuracy: 0.477125 | Loss: 1.833195 | (ε = 4.12, δ = 8e-05) for α = 4.6


	 Epoch 5. Accuracy: 0.538125 | Loss: 1.566213 | (ε = 4.41, δ = 8e-05) for α = 4.5



----------------------------
Test Accuracy: 0.552575 (ε = 4.41, δ = 8e-05) for α = 4.5
----------------------------



	 Epoch 6. Accuracy: 0.556937 | Loss: 1.512054 | (ε = 4.67, δ = 8e-05) for α = 4.3


	 Epoch 7. Accuracy: 0.562937 | Loss: 1.503791 | (ε = 4.93, δ = 8e-05) for α = 4.2


	 Epoch 8. Accuracy: 0.563813 | Loss: 1.510231 | (ε = 5.17, δ = 8e-05) for α = 4.1


	 Epoch 9. Accuracy: 0.568250 | Loss: 1.504014 | (ε = 5.41, δ = 8e-05) for α = 4.0


	 Epoch 10. Accuracy: 0.571063 | Loss: 1.497570 | (ε = 5.63, δ = 8e-05) for α = 3.9



----------------------------
Test Accuracy: 0.574104 (ε = 5.63, δ = 8e-05) for α = 3.9
----------------------------



	 Epoch 11. Accuracy: 0.576250 | Loss: 1.489521 | (ε = 5.85, δ = 8e-05) for α = 3.9


	 Epoch 12. Accuracy: 0.580812 | Loss: 1.485841 | (ε = 6.06, δ = 8e-05) for α = 3.8


	 Epoch 13. Accuracy: 0.589063 | Loss: 1.482238 | (ε = 6.27, δ = 8e-05) for α = 3.7


	 Epoch 14. Accuracy: 0.604000 | Loss: 1.459263 | (ε = 6.46, δ = 8e-05) for α = 3.7


	 Epoch 15. Accuracy: 0.622437 | Loss: 1.423232 | (ε = 6.66, δ = 8e-05) for α = 3.6



----------------------------
Test Accuracy: 0.630179 (ε = 6.66, δ = 8e-05) for α = 3.6
----------------------------



	 Epoch 16. Accuracy: 0.633563 | Loss: 1.397400 | (ε = 6.85, δ = 8e-05) for α = 3.6


	 Epoch 17. Accuracy: 0.647188 | Loss: 1.359415 | (ε = 7.03, δ = 8e-05) for α = 3.5


	 Epoch 18. Accuracy: 0.659375 | Loss: 1.327021 | (ε = 7.22, δ = 8e-05) for α = 3.5


	 Epoch 19. Accuracy: 0.668937 | Loss: 1.309529 | (ε = 7.39, δ = 8e-05) for α = 3.4


	 Epoch 20. Accuracy: 0.668500 | Loss: 1.327612 | (ε = 7.57, δ = 8e-05) for α = 3.4



----------------------------
Test Accuracy: 0.649833 (ε = 7.57, δ = 8e-05) for α = 3.4
----------------------------



	 Epoch 21. Accuracy: 0.672687 | Loss: 1.296074 | (ε = 7.74, δ = 8e-05) for α = 3.4


	 Epoch 22. Accuracy: 0.681250 | Loss: 1.267687 | (ε = 7.91, δ = 8e-05) for α = 3.3


	 Epoch 23. Accuracy: 0.684500 | Loss: 1.268492 | (ε = 8.07, δ = 8e-05) for α = 3.3


	 Epoch 24. Accuracy: 0.693063 | Loss: 1.245834 | (ε = 8.24, δ = 8e-05) for α = 3.3


	 Epoch 25. Accuracy: 0.698000 | Loss: 1.233152 | (ε = 8.40, δ = 8e-05) for α = 3.2



----------------------------
Test Accuracy: 0.691701 (ε = 8.40, δ = 8e-05) for α = 3.2
----------------------------



	 Epoch 26. Accuracy: 0.698812 | Loss: 1.231949 | (ε = 8.56, δ = 8e-05) for α = 3.2


	 Epoch 27. Accuracy: 0.701313 | Loss: 1.222030 | (ε = 8.72, δ = 8e-05) for α = 3.2


	 Epoch 28. Accuracy: 0.707562 | Loss: 1.209349 | (ε = 8.87, δ = 8e-05) for α = 3.1


	 Epoch 29. Accuracy: 0.708750 | Loss: 1.213605 | (ε = 9.02, δ = 8e-05) for α = 3.1


	 Epoch 30. Accuracy: 0.713750 | Loss: 1.192162 | (ε = 9.17, δ = 8e-05) for α = 3.1



----------------------------
Test Accuracy: 0.707720 (ε = 9.17, δ = 8e-05) for α = 3.1
----------------------------



	 Epoch 31. Accuracy: 0.719562 | Loss: 1.166927 | (ε = 9.32, δ = 8e-05) for α = 3.1


	 Epoch 32. Accuracy: 0.718938 | Loss: 1.181886 | (ε = 9.47, δ = 8e-05) for α = 3.0


	 Epoch 33. Accuracy: 0.724938 | Loss: 1.163102 | (ε = 9.62, δ = 8e-05) for α = 3.0


	 Epoch 34. Accuracy: 0.724000 | Loss: 1.162879 | (ε = 9.76, δ = 8e-05) for α = 3.0


	 Epoch 35. Accuracy: 0.726375 | Loss: 1.164932 | (ε = 9.90, δ = 8e-05) for α = 3.0



----------------------------
Test Accuracy: 0.702265 (ε = 9.90, δ = 8e-05) for α = 3.0
----------------------------



	 Epoch 36. Accuracy: 0.721313 | Loss: 1.173956 | (ε = 10.05, δ = 8e-05) for α = 3.0


	 Epoch 37. Accuracy: 0.735437 | Loss: 1.122051 | (ε = 10.19, δ = 8e-05) for α = 2.9


	 Epoch 38. Accuracy: 0.736687 | Loss: 1.125166 | (ε = 10.32, δ = 8e-05) for α = 2.9


	 Epoch 39. Accuracy: 0.736500 | Loss: 1.140181 | (ε = 10.46, δ = 8e-05) for α = 2.9


	 Epoch 40. Accuracy: 0.734750 | Loss: 1.132542 | (ε = 10.60, δ = 8e-05) for α = 2.9



----------------------------
Test Accuracy: 0.715653 (ε = 10.60, δ = 8e-05) for α = 2.9
----------------------------



	 Epoch 41. Accuracy: 0.734875 | Loss: 1.125319 | (ε = 10.74, δ = 8e-05) for α = 2.9


	 Epoch 42. Accuracy: 0.739062 | Loss: 1.114325 | (ε = 10.87, δ = 8e-05) for α = 2.8


	 Epoch 43. Accuracy: 0.740750 | Loss: 1.111604 | (ε = 11.00, δ = 8e-05) for α = 2.8


	 Epoch 44. Accuracy: 0.740125 | Loss: 1.099650 | (ε = 11.13, δ = 8e-05) for α = 2.8


	 Epoch 45. Accuracy: 0.750812 | Loss: 1.062885 | (ε = 11.26, δ = 8e-05) for α = 2.8



----------------------------
Test Accuracy: 0.739057 (ε = 11.26, δ = 8e-05) for α = 2.8
----------------------------



	 Epoch 46. Accuracy: 0.749750 | Loss: 1.076617 | (ε = 11.40, δ = 8e-05) for α = 2.8


	 Epoch 47. Accuracy: 0.749437 | Loss: 1.091281 | (ε = 11.53, δ = 8e-05) for α = 2.8


	 Epoch 48. Accuracy: 0.751625 | Loss: 1.067713 | (ε = 11.66, δ = 8e-05) for α = 2.8


	 Epoch 49. Accuracy: 0.752250 | Loss: 1.067498 | (ε = 11.78, δ = 8e-05) for α = 2.7



----------------------------
Test Accuracy: 0.732004 (ε = 11.78, δ = 8e-05) for α = 2.7
----------------------------



The differentially-private name classification model obtains a test accuracy of 0.73 with an epsilon of just under 12. This shows that we can achieve a good accuracy on this task, with minimal loss of privacy.

## Training the name classifier without privacy

 We also run a comparison with a non-private model to see if the performance obtained with privacy is comparable to it. To do this, we keep the parameters such as learning rate and batch size the same, and only define a different instance of the model along with a separate optimizer.

In [17]:
model_nodp = CharNNClassifier(
    embedding_size,
    hidden_size,
    len(ds.labels),
    n_lstm_layers,
    bidirectional_lstm,
).to(device)


optimizer_nodp = torch.optim.SGD(model_nodp.parameters(), lr=0.5)

In [18]:
for epoch in tqdm(range(epochs)):
    train(model_nodp, criterion, optimizer_nodp, train_loader, epoch, device=device)
    if test_every:
        if epoch % test_every == 0:
            test(model_nodp, test_loader, None, device=device)

test(model_nodp, test_loader, None, device=device)

	 Epoch 0. Accuracy: 0.446188 | Loss: 1.975067



----------------------------
Test Accuracy: 0.470559
----------------------------



	 Epoch 1. Accuracy: 0.468625 | Loss: 1.851975


	 Epoch 2. Accuracy: 0.468438 | Loss: 1.851132


	 Epoch 3. Accuracy: 0.468750 | Loss: 1.860505


	 Epoch 4. Accuracy: 0.469000 | Loss: 1.852566


	 Epoch 5. Accuracy: 0.468500 | Loss: 1.851430



----------------------------
Test Accuracy: 0.470559
----------------------------



	 Epoch 6. Accuracy: 0.467375 | Loss: 1.847835


	 Epoch 7. Accuracy: 0.498937 | Loss: 1.702268


	 Epoch 8. Accuracy: 0.540625 | Loss: 1.550873


	 Epoch 9. Accuracy: 0.551125 | Loss: 1.507487


	 Epoch 10. Accuracy: 0.556312 | Loss: 1.488815



----------------------------
Test Accuracy: 0.558185
----------------------------



	 Epoch 11. Accuracy: 0.560312 | Loss: 1.477035


	 Epoch 12. Accuracy: 0.563562 | Loss: 1.456275


	 Epoch 13. Accuracy: 0.561875 | Loss: 1.466037


	 Epoch 14. Accuracy: 0.569187 | Loss: 1.442874


	 Epoch 15. Accuracy: 0.570063 | Loss: 1.443760



----------------------------
Test Accuracy: 0.571025
----------------------------



	 Epoch 16. Accuracy: 0.581313 | Loss: 1.416902


	 Epoch 17. Accuracy: 0.613812 | Loss: 1.352785


	 Epoch 18. Accuracy: 0.623250 | Loss: 1.322635


	 Epoch 19. Accuracy: 0.636062 | Loss: 1.275303


	 Epoch 20. Accuracy: 0.643125 | Loss: 1.246226



----------------------------
Test Accuracy: 0.655041
----------------------------



	 Epoch 21. Accuracy: 0.655000 | Loss: 1.214294


	 Epoch 22. Accuracy: 0.658250 | Loss: 1.188098


	 Epoch 23. Accuracy: 0.667500 | Loss: 1.159147


	 Epoch 24. Accuracy: 0.679438 | Loss: 1.136272


	 Epoch 25. Accuracy: 0.687438 | Loss: 1.102815



----------------------------
Test Accuracy: 0.695590
----------------------------



	 Epoch 26. Accuracy: 0.686562 | Loss: 1.093785


	 Epoch 27. Accuracy: 0.688063 | Loss: 1.083488


	 Epoch 28. Accuracy: 0.697000 | Loss: 1.054640


	 Epoch 29. Accuracy: 0.702250 | Loss: 1.028159


	 Epoch 30. Accuracy: 0.706187 | Loss: 1.015101



----------------------------
Test Accuracy: 0.682311
----------------------------



	 Epoch 31. Accuracy: 0.701688 | Loss: 1.024621


	 Epoch 32. Accuracy: 0.707500 | Loss: 1.002122


	 Epoch 33. Accuracy: 0.716375 | Loss: 0.976570


	 Epoch 34. Accuracy: 0.719875 | Loss: 0.966798


	 Epoch 35. Accuracy: 0.720125 | Loss: 0.964287



----------------------------
Test Accuracy: 0.695451
----------------------------



	 Epoch 36. Accuracy: 0.719375 | Loss: 0.954905


	 Epoch 37. Accuracy: 0.728688 | Loss: 0.932315


	 Epoch 38. Accuracy: 0.731250 | Loss: 0.928614


	 Epoch 39. Accuracy: 0.741875 | Loss: 0.894388


	 Epoch 40. Accuracy: 0.737875 | Loss: 0.901502



----------------------------
Test Accuracy: 0.744666
----------------------------



	 Epoch 41. Accuracy: 0.747875 | Loss: 0.869028


	 Epoch 42. Accuracy: 0.741750 | Loss: 0.880296


	 Epoch 43. Accuracy: 0.740563 | Loss: 0.890370


	 Epoch 44. Accuracy: 0.751375 | Loss: 0.849146


	 Epoch 45. Accuracy: 0.750125 | Loss: 0.851319



----------------------------
Test Accuracy: 0.757715
----------------------------



	 Epoch 46. Accuracy: 0.754750 | Loss: 0.838435


	 Epoch 47. Accuracy: 0.756875 | Loss: 0.828314


	 Epoch 48. Accuracy: 0.763813 | Loss: 0.809650


	 Epoch 49. Accuracy: 0.763750 | Loss: 0.806187



----------------------------
Test Accuracy: 0.751958
----------------------------






We run the training loop again, this time without privacy and for the same number of iterations. 

The non-private classifier obtains a test accuracy of around 0.75 with the same parameters and number of epochs. We are effectively trading off performance on the name classification task for a lower loss of privacy.