In [1]:
import warnings
warnings.filterwarnings('ignore')
import pandas as pd
import numpy as np
import copy
from torch import nn
from torch import optim
import torch.nn.functional as F
import syft as sy
import torch as th
from helpers import Model, connect_to_workers

# BEWARE, ignoreing warnings is not always a good idea
# I am doing it for presentation

W0821 23:03:26.408246 140031293081408 secure_random.py:26] Falling back to insecure randomness since the required custom op could not be found for the installed version of TensorFlow. Fix this by compiling custom ops. Missing file was '/home/mkucz/p_venv/lib/python3.6/site-packages/tf_encrypted/operations/secure_random/secure_random_module_tf_1.14.0.so'
W0821 23:03:26.417490 140031293081408 deprecation_wrapper.py:119] From /home/mkucz/p_venv/lib/python3.6/site-packages/tf_encrypted/session.py:26: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.



<a id="encrypted_dl"></a>
## Encrypted Deep Learning
Encrypted Deep Learning aims to preserve model accuracy and predictive power, without compromising the privacy and identity of individual users in the data. Encrypted deep learning provides privacy by enciphering the values that are being computed. Encrypted deep learning can involve encrypting the gradients or encrypting the data as well. I will walk through examples of encrypted deep learning using secure multi-party computation.

<a id="smpc"></a>
#### Secure Multi-Party Computation (SMPC)
PySyft has employed encryption using secure multi-party computation (SMPC). To learn more about the basics of SMPC and differential privacy [check out my SMPC (PySyft inspired) notebook](https://htmlpreview.github.io/?https://github.com/mkucz95/private_ai_finance/blob/master/secure_multi_party_computation.html). This will help you understand how the steps below successfully encrypt data while preserving model accuracy.

<a id="fl_encrypt_avg"></a>
### Encrypted Gradient Aggregation

The previous implementations of federated learning have all relied on a *'trusted aggregator'*. Unfortunately, in many scenarios we would probably not want to have to rely on such a third-party, potentially because no third-party can be deemed trustworthy enough.

Encrypted gradient aggregation follows largely the same process that unencrypted federated learning with trusted aggregator does. The difference exists in how training is conducted, since now we employ secure multi-party computation to aggregate the gradients (the gradients are encrypted across multiple workers). Therefore, only the training function changes. Since it is largely the same as the previous step, I won't provide a worked example, however visit [PySyft's tutorial to learn more](https://github.com/OpenMined/PySyft/blob/dev/examples/tutorials/Part%2010%20-%20Federated%20Learning%20with%20Secure%20Aggregation.ipynb). To summarize encrypted gradient aggregation, since each remote worker has their own model, encrypting this model includes sharing the parameters (weights and biases of the network) across all the workers. Using SMPC, we can aggregate the encrypted parameters after each remote model has passed through a training run. Since we would only get the aggregated model, we are unable to deduce individual worker's model parameters or gradients, ensuring privacy without the need for a trusted third-party aggregator.

Instead, let's work out how to train a network where the data, model parameters, AND the gradients are all encrypted!

In [17]:
features = np.load('data/features.npy')
labels = np.load('data/labels_dim.npy')
data = th.tensor(features, dtype=th.float32, requires_grad=True)
target = th.tensor(labels, dtype=th.float32, requires_grad=True).reshape(-1,2)
hook = sy.TorchHook(th)

W0821 23:05:51.507230 140031293081408 hook.py:102] Torch was already hooked... skipping hooking process


In [18]:
class Arguments():
    def __init__(self, in_size, out_size, hidden_layers,
                       activation=F.softmax, dim=-1):
        self.batch_size = 1
        self.drop_p = None
        self.epochs = 10
        self.lr = 0.001
        self.in_size = in_size
        self.out_size = out_size
        self.hidden_layers = hidden_layers
        self.precision_fractional=20
        self.activation = activation
        self.dim = dim

In [19]:
dataset = [(data[i], target[i]) for i in range(len(data))]

#instantiate model
in_size = data[0].shape[0]
out_size = 1
hidden_layers=[30,15]

args = Arguments(in_size, out_size, hidden_layers, activation=None)
#PyTorch's softmax activation only works with floats

*Please Note* that PyTorch's Softmax activation function only works with float values. However float values are incompatible with SMPC, especially since we have to fix the precision before encrypting. Therefore we have to use an alternate approach to calculating loss, without an activation function.

### End-to-End Encryption
There are certain scenarios where for maximum privacy it is ideal to keep data encrypted as well as keep each federated model encrypted. **end-to-end encryption**

There are scenarios in which a model will have already been trained, for example from past customer data (before the implementation of differentially private techniques), or that we want to train a new secure model on entirely encrypted data.

In [20]:
workers = connect_to_workers(2, hook, secure_worker=False)
crypto_provider = sy.VirtualWorker(hook, id='crypto_provider')

The `crypto_provider` is needed to provide random numbers and the field quotient `Q` as outlined in the [SMPC tutorial](https://github.com/mkucz95/private_ai_finance/blob/master/secure_multi_party_computation.ipynb). The `crypto_provider` never 'owns' or handles any data, it is simply there to ensure secure computation.

In [21]:
# for SMPC we need to work with integers.
# Therefore we convert all decimals to integers depending on the precision we want.
# this adds some noise/error to the data
print("Original Tensor \n", data[0][:5])
print("\n Fixed Precision Tensor\n",data.fix_precision(5)[0][:5])

Original Tensor 
 tensor([ 0.0000, 30.8300,  0.0000,  1.2500,  0.0000], grad_fn=<SliceBackward>)

 Fixed Precision Tensor
 (Wrapper)>FixedPrecisionTensor>tensor([    0, 30830,     0,  1250,     0])


In [22]:
# We don't use the whole dataset for efficiency purpose, but feel free to increase these numbers
n_train_items = 10  # len(dataset)
n_test_items = 10  # len(dataset)

def get_private_data_loaders(dataset, precision_fractional,
                             workers, crypto_provider):
    '''
    Encrypt training and test data (both the features and targets)
    '''
    
    def secret_share(tensor):
        """
        Transform to fixed precision and secret share a tensor
        """
        return (
            tensor
            .fix_precision(precision_fractional=precision_fractional)
            .share(*workers, crypto_provider=crypto_provider,
                   requires_grad=True)
        )

    private_train_loader = [
        (secret_share(data), secret_share(target.reshape(1,2)))
        for i, (data, target) in enumerate(dataset)
        if i < n_train_items
    ]

    return private_train_loader


private_train_loader = get_private_data_loaders(
    dataset,
    precision_fractional=args.precision_fractional,
    workers=workers,
    crypto_provider=crypto_provider
)

RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

Please note, that the data now also is also type `AutogradTensor`. As is explained by PySyft, we require the data tensors to maintain gradients, but since we fix the precision and PyTorch's autograd only works on float type tensors, PySyft has a special `AutogradTensor` to compute the gradient graph for backpropagation.

In [None]:
import sys

In [None]:
# new training logic to reflect federated learning
# generally speaking the training of fully encrypted networks is very similar
# to normal training

def encrypted_federated_train(model, datasets, optimizer, args):
    print(f'SMPC Training on {len(datasets)} remote workers (dataowners)')
    steps = 0
    model.train()  # training mode

    for e in range(1, args.epochs+1):
        running_loss = 0
        for ii, (data, target) in enumerate(datasets):
            # iterates over pointers to remote data
            #sys.exit()
            steps += 1
            # NB the steps below all happen remotely
            # zero out gradients so that one forward pass doesnt pick up
            # previous forward's gradients
            optimizer.zero_grad()
            outputs = model.forward(data)  # make prediction
            # get shape of (1,2) as we need at least two dimension
            outputs = outputs.reshape(1, -1)
            #MSELoss
            loss = ((outputs - target)**2).sum().refresh()
            loss.backward()
            optimizer.step()

            # get loss from remote worker and unencrypt
            _loss = loss.get().float_precision().data
            #print(_loss)
            print(_loss.item())
            f = outputs-target
            print(outputs.get().float_precision())
            print(target.get().float_precision())

            print((f).get().float_precision())
            running_loss += _loss

        print('Train Epoch: {} \tLoss: {:.6f}'.format(e,
                                                    _loss/len(dataset)))
        running_loss = 0

In [None]:
#instantiate model with fixed precision, and share the model across workers

smpc_model = Model(args)
smpc_model = smpc_model\
                 .fix_precision(precision_fractional=args.precision_fractional)\
                 .share(*workers, crypto_provider=crypto_provider,
                        requires_grad=True)

smpc_opt = optim.SGD(params=smpc_model.parameters(), lr=args.lr)\
                .fix_precision(precision_fractional=args.precision_fractional)

In [66]:
%%time
encrypted_federated_train(smpc_model, private_train_loader, smpc_opt, args)

SMPC Training on 10 remote workers (dataowners)


RuntimeError: _thnn_mse_loss_forward not supported on CPUType for Long

###### Notes

**Loss Functions**
Using negative log-likelihood loss is not yet supported for multi-party computation. This is due to the nature of computation required for the loss function calculation.

_Options_
1. train on non-encrypted data (could be differentially private though) and then make predictions using encrypted data. This way we can use NLLLoss for training
2. Train the model on federated, encrypted data using mean squared error

The type of loss we use [MSELoss](https://pytorch.org/docs/stable/nn.html#mseloss) vs [NLLLoss](https://pytorch.org/docs/stable/nn.html#nllloss) would indicate that we need to handle our target tensors a little differently. These loss functions expect different shapes as the target inputs. Read the documentation if you want to find out more.
***
**Feature Normalization**<br>
Normalization can be handled on a per-datum basis. When working with images, for example, you can pass in normalization parameters before hand, so that each remote worker can normalize their data. However, normalization generally becomes difficult for encrypted data since it is not possible to ensure total privacy. However, data could generally be normalized with such a trusted party, but this introduces inherent privacy problems.

## Conclusion

Even though all the data here is encrypted it does not prevent an adversarial attack where shares are intentionally corrupted during computation. This is generally considered an open problem in SMPC and encrypted deep learning.

<a id="dp_dl"></a>
#### Differential Privacy for Deep Learning
Differential privacy techniques provide certain guarantees for privacy in the context of deep learning. Instead of encrypting data, we add noise to the data (local DP) or to the output of a query (global DP) such that privacy is preserved to an acceptable degree. To familiarize yourself with Differential Privacy, visit a short guide I have put together [here](https://htmlpreview.github.io/?https://github.com/mkucz95/private_ai_finance/blob/master/differential-privacy.html). For the purpose of this example, however, I have not implemented differential privacy since data will be encrypted end-to-end anyway. However, one could have private deep learning employing differential privacy on a local or global level, and then work with unencrypted data, gradients, and models.