In [12]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
# BEWARE, ignoreing warnings is not always a good idea
# I am doing it for presentation

# Private and Encrypted AI - Credit Approval Application

This notebook is meant for my exploratory development of en encrypted federated deep learning approach.
I will develop a final model in a separate folder.

### Glossary
1. [Data Preparation & Setup](#data_prep)
2. [Classical Deep Learning](#classical_dl)
3. [Federated Deep Learning](#federated_dl)<br>
    3.1 [Model Averaging with Trusted Aggregator](#fl_model_avg)
4. [Encrypted Deep Learning](#encrypted_dl)<br>
   4.1 [Secured Multi-Party Computation (SMPC)](#smpc) <br>
   4.2 [Encrypted Gradient Averaging](#fl_encrypt_avg)<br>
   4.3 [Differential Privacy for DL](#dp_dl)
   
<hr>

_Notes_ <br>This project was inspired by lectures of [Andrew Trask](https://iamtrask.github.io/) in the [Private AI Scholarship Challenge on Udacity](https://www.udacity.com/facebook-AI-scholarship). Furthermore, segments of the code are inspired by the [PySyft tutorials on GitHub](https://github.com/OpenMined/PySyft/tree/dev/examples/tutorials); an excellent resource for people starting off with Private AI. 

<a id='data_prep'></a>
## Data Preparation
- only using non-NaN values. I drop NaN values because the dataset is not very big regardless, and we are not dropping very many values.
- Convert binary variables to a numeric representation, and one-hot-encode categorical variables. We do not want to use label encoder since a label encoder would make it 

In [2]:
cols = [ f"A{i}" for i in range(1,16)]
cols.append('label')

In [3]:
df = pd.read_csv('data/crx.data', names=cols)\
    .replace(to_replace='?', value=np.nan).dropna()
print(df.shape, "\n ------- \n")
print(df.head(2))

(653, 16) 
 ------- 

  A1     A2    A3 A4 A5 A6 A7    A8 A9 A10  A11 A12 A13    A14  A15 label
0  b  30.83  0.00  u  g  w  v  1.25  t   t    1   f   g  00202    0     +
1  a  58.67  4.46  u  g  q  h  3.04  t   t    6   f   g  00043  560     +


### Data Analysis

Let's check out what this data looks like first, so that we have an idea of what we are dealing with. In true encrypted, federated learning we would not have this luxury though...

In [4]:
def to_binary(df, col):
    u = df[col].unique()
    mapping =dict(zip(u, [i for i in range(0,len(u))]))
    return df[col].map(mapping)

In [5]:
df.A1.head()

0    b
1    a
2    a
3    b
4    b
Name: A1, dtype: object

In [6]:
#convert to float
for col in ['A2', 'A3', 'A8', 'A11', 'A14', 'A15']:
    df[col] = df[col].astype(float)
    
#binarize
for col in ['A1', 'A9', 'A10', 'A12', 'label']:
    df[col] = to_binary(df, col)
    
onehot_cols = ['A4', 'A5', 'A6', 'A7', 'A13']

#perform one hot encoding, and drop original columns
df  = df.join(pd.get_dummies(df[onehot_cols], dtype=int))\
                                .drop(onehot_cols, axis=1)

In [9]:
set(df.dtypes) #check that we have the data types we expect, no object types

{dtype('int64'), dtype('float64')}

In [8]:
#distribution of numeric-only columns
df[['A2', 'A3', 'A8', 'A11', 'A14', 'A15']].describe().iloc[1:, :10].round(3)

Unnamed: 0,A2,A3,A8,A11,A14,A15
mean,31.504,4.83,2.244,2.502,180.36,1013.761
std,11.838,5.027,3.371,4.968,168.297,5253.279
min,13.75,0.0,0.0,0.0,0.0,0.0
25%,22.58,1.04,0.165,0.0,73.0,0.0
50%,28.42,2.835,1.0,0.0,160.0,5.0
75%,38.25,7.5,2.625,3.0,272.0,400.0
max,76.75,28.0,28.5,67.0,2000.0,100000.0


In [9]:
df.head(2) #double check what our DF looks like

Unnamed: 0,A1,A2,A3,A8,A9,A10,A11,A12,A14,A15,...,A7_ff,A7_h,A7_j,A7_n,A7_o,A7_v,A7_z,A13_g,A13_p,A13_s
0,0,30.83,0.0,1.25,0,0,1.0,0,202.0,0.0,...,0,0,0,0,0,1,0,1,0,0
1,1,58.67,4.46,3.04,0,0,6.0,0,43.0,560.0,...,0,1,0,0,0,0,0,1,0,0


### Simulate Real People's Data

To illustrate how this model would work in real life, I want to simulate this data belonging to people. I am generating random names to be associated with each row. I know that this is not an ideal example since I am in fact starting with the data all collated on my computer with peoples names and data being directly exposed. Not private at all...

In [10]:
import names #used to get random names
names.get_first_name()+' ' +names.get_last_name() #call random name

'Maria Crowe'

In [11]:
users = []
used_names = set()
for idx in range(len(df)):
    name = names.get_first_name()+' ' +names.get_last_name()
    while name in used_names:
        name = names.get_first_name()+' ' +names.get_last_name()
        
    used_names.add(name)
    users.append(name)

In [12]:
df['name'] = users
df.head(2)

Unnamed: 0,A1,A2,A3,A8,A9,A10,A11,A12,A14,A15,...,A7_h,A7_j,A7_n,A7_o,A7_v,A7_z,A13_g,A13_p,A13_s,name
0,0,30.83,0.0,1.25,0,0,1.0,0,202.0,0.0,...,0,0,0,0,1,0,1,0,0,Rufus Owston
1,1,58.67,4.46,3.04,0,0,6.0,0,43.0,560.0,...,1,0,0,0,0,0,1,0,0,David Johnson


In [13]:
#get features and labels as numpy arrays which we can convert to tensors
features = df.drop(['label', 'name'], axis=1).values.astype(float)
labels = df['label'].values.astype(float)
#labels=pd.get_dummies(df['label']).values.astype(float)

_Please Note_ <br>
Normalization is not necessary for any machine learning algorithm, but it is recommended for deep learning for training purposes. Read more [here](https://datascience.stackexchange.com/a/13221/60648).

## Model Development
I am using PyTorch to create a neural network to classify whether someone is accepted for credit or not. PyTorch integrates will with PySyft, the package used to encrypt our deep learning model

In [11]:
from torch import nn
from torch import optim
import torch.nn.functional as F
import syft as sy
import torch as th
th.manual_seed(42) #so that dropout affects same layers

data = th.tensor(features, dtype=th.float32, requires_grad=True)
target = th.tensor(labels, dtype=th.int64, requires_grad=False).reshape(-1,1)

class Model(nn.Module):
    '''
    Neural Network Example Model
    
    Attributes
    :hidden_layers (nn.ModuleList) - hidden units and dimensions for each layer of network
    :output (nn.Linear) - final fully-connected layer that handles output for model
    :dropout (nn.Dropout) - handling of layer-wise drop-out parameter
    
    Functions
    :forward - handling of forward pass of datum through the network.
    '''
    def __init__(self, args):
        super(Model, self).__init__()
        self.hidden_layers = nn.ModuleList([nn.Linear(args.in_size, args.hidden_layers[0])])

        #create hidden layers
        layer_sizes = zip(args.hidden_layers[:-1], args.hidden_layers[1:]) #gives input/output sizes for each layer
        self.hidden_layers.extend([nn.Linear(h1, h2) for h1, h2 in layer_sizes])
        self.output = nn.Linear(args.hidden_layers[-1], args.out_size)
        self.dropout = None if args.drop_p is not None else nn.Dropout(p=args.drop_p)
        
    def forward(self, x):
        for each in self.hidden_layers:
            x = F.relu(each(x)) #apply relu to each hidden node
            
            if self.dropout is not None:
                x = self.dropout(x) #apply dropout
                
        x = self.output(x) #apply output weights
        return args.activation(x, dim=args.dim) #apply activation log softmax

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  return f(*args, **kwds)
  return f(*args, **kwds)
  _config = json.load(open(_config_path))
  return f(*args, **kwds)
  return f(*args, **kwds)
  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
W0819 21:28:49.545506 140361539536704 secure_random.py:26] Falling back to insecure randomness since the required custom op could not be found for the installed version of TensorFlow. Fix this by compiling custom ops. Missing file was '/home/mkucz/p_venv/lib/python3.6/site-packages/tf_encrypted/operations/secure_random/secure_random_module_tf_1.14.0.so'
W0819 21:28:49.553843 140361539536704 deprecation_wrapper.py:119] From /home/mkucz/p_venv/lib/python3.6/site-packages/tf_encrypted/session.py:26: The name tf.Session is deprecated. Please use tf.compat.v1.Session instead.



NameError: name 'features' is not defined

<a id='classical_dl'></a>
## Classical Deep Learning
Here we train our network on data that is not distributed (therefore this is not yet a federated or encrypted problem). However, this exercise is useful in showing how we can transition from traditional deep learning to federated deep learning.

First create a dataset of batch size one. This is realistic since most people would only have their own credit score data. This might be different if we decide to use a secure or trusted third party to manage parts of the data, but we don't trust the credit rating company with our data.

In [15]:
class Arguments():
    def __init__(self, in_size, out_size, hidden_layers, activation=F.log_softmax, dim=-1):
        self.batch_size = 1
        self.drop_p = 0.2
        self.epochs = 5
        self.lr = 0.001
        self.in_size = in_size
        self.out_size = out_size
        self.hidden_layers = hidden_layers
        self.precision_fractional=10
        self.activation = activation
        self.dim = dim

In [16]:
dataset = [(data[i], target[i]) for i in range(len(data))]

#instantiate model
in_size = data[0].shape[0]
out_size = 2
hidden_layers=[30,20,10,5]

In [17]:
args = Arguments(in_size, out_size, hidden_layers)
base_model = Model(args)

In [18]:
_data, _target = dataset[0]
_data, _target

(tensor([  0.0000,  30.8300,   0.0000,   1.2500,   0.0000,   0.0000,   1.0000,
           0.0000, 202.0000,   0.0000,   0.0000,   1.0000,   0.0000,   1.0000,
           0.0000,   0.0000,   0.0000,   0.0000,   0.0000,   0.0000,   0.0000,
           0.0000,   0.0000,   0.0000,   0.0000,   0.0000,   0.0000,   0.0000,
           1.0000,   0.0000,   0.0000,   0.0000,   0.0000,   0.0000,   0.0000,
           0.0000,   0.0000,   1.0000,   0.0000,   1.0000,   0.0000,   0.0000],
        grad_fn=<SelectBackward>), tensor([0]))

In [19]:
def train(model, datasets, criterion):
    #use a simple stochastic gradient descent optimizer
    #define optimizer for each model
    optimizer = optim.SGD(params=model.parameters(), lr=args.lr)
    steps=0
    model.train() #training mode
    for e in range(1, args.epochs+1):
        running_loss=0
        for ii, (data,target) in enumerate(datasets): #iterates over pointers to remote data
            steps+=1
            optimizer.zero_grad()#zero out gradients so that one forward pass doesnt pick up previous forward's gradients
            outputs = model.forward(data) #make prediction
            outputs = outputs.reshape(1,-1) #get shape of (1,2) as we need at least two dimension
            loss = criterion(outputs, target)
            #loss = ((outputs - target.float())**2).sum()
            '''
            print(outputs, target, outputs-target.float())
            print(outputs.shape[0])
            print(loss)
            break
            '''
            
            #loss = criterion(outputs,target)
            loss.backward()
            optimizer.step()
            
            #print(f"step: {steps}", loss.item())
            running_loss+=loss.item()

            print_every = 200
            if (ii+1)%print_every==0:
                print(f'Epoch: {e} [{ii+1}/{len(datasets)}] \tLoss: {running_loss/print_every:.6f}')
                running_loss=0


In [596]:
model = base_model.copy()
train(model, dataset, nn.NLLLoss())

Epoch: 1 [200/653] 	Loss: 0.335800
Epoch: 1 [400/653] 	Loss: 4.242975
Epoch: 1 [600/653] 	Loss: 0.680553
Epoch: 2 [200/653] 	Loss: 0.779483
Epoch: 2 [400/653] 	Loss: 0.646549
Epoch: 2 [600/653] 	Loss: 0.680794
Epoch: 3 [200/653] 	Loss: 0.772166
Epoch: 3 [400/653] 	Loss: 0.649821
Epoch: 3 [600/653] 	Loss: 0.681052
Epoch: 4 [200/653] 	Loss: 0.766927
Epoch: 4 [400/653] 	Loss: 0.652247
Epoch: 4 [600/653] 	Loss: 0.681284
Epoch: 5 [200/653] 	Loss: 0.763166
Epoch: 5 [400/653] 	Loss: 0.654033
Epoch: 5 [600/653] 	Loss: 0.681475


We can also use PyTorch's `Dataset` class to make the processing of data a little easier, but for the purpose of this example it will not give any clear benefits. If you would like to read more about PyTorch's abstract `Dataset` class [read here](https://pytorch.org/tutorials/beginner/data_loading_tutorial.html), with another example [here](https://stanford.edu/~shervine/blog/pytorch-how-to-generate-data-parallel). Generally speaking, using `Dataset` and `DataLoader` makes the handling of training and testing data much easier.

In [597]:
from torch.utils.data import Dataset, DataLoader, TensorDataset
dataset_ = TensorDataset(data, target.view(-1))
data_loader = DataLoader(dataset_, batch_size=1, shuffle=False) #this gives us an identical implementation

In [598]:
%%time
#training loss will look a little different since the dataset is shuffled
model = base_model.copy()
train(model, data_loader, nn.NLLLoss())

Epoch: 1 [200/653] 	Loss: 0.335800
Epoch: 1 [400/653] 	Loss: 4.242975
Epoch: 1 [600/653] 	Loss: 0.680553
Epoch: 2 [200/653] 	Loss: 0.779483
Epoch: 2 [400/653] 	Loss: 0.646549
Epoch: 2 [600/653] 	Loss: 0.680794
Epoch: 3 [200/653] 	Loss: 0.772166
Epoch: 3 [400/653] 	Loss: 0.649821
Epoch: 3 [600/653] 	Loss: 0.681052
Epoch: 4 [200/653] 	Loss: 0.766927
Epoch: 4 [400/653] 	Loss: 0.652247
Epoch: 4 [600/653] 	Loss: 0.681284
Epoch: 5 [200/653] 	Loss: 0.763166
Epoch: 5 [400/653] 	Loss: 0.654033
Epoch: 5 [600/653] 	Loss: 0.681475
CPU times: user 4.3 s, sys: 69.6 ms, total: 4.37 s
Wall time: 4.36 s


Now we have a credit application model that is training on our data. However, this is by no means yet federated learning. The implementation above simply trains a model with a batch size of 1. We will federate the model in the upcoming section.

<a id="federated_dl"></a>
## Federated Deep Learning
The idea behind federated learning is that we train a model on subsets of data (encrypted or otherwise) that never leaves the ownership of an individual. In this example of credit rating scores it would allow people to submit claims without ever losing ownership of their data. It requires very little trust of the party to which the application is being submitted.

Even though we currently have our dataset located locally, we want to simulate having many people in our network who each maintain ownership of their data. Therefore we have to create a virtual worker for each datum. The work/data flow in this situation would be as follows:

- get pointers to training data on each remote worker <br>
**Training Steps:**
- send model to remote worker
- train model on data located with remote worker
- receive updated model from remote worker
- repeat for all workers

In [58]:
def connect_to_workers(n_workers):
    return [sy.VirtualWorker(hook, id=name) for name in df.name.str.replace(' ', '').values[:n_workers]]

In [59]:
hook = sy.TorchHook(th)
workers = connect_to_workers(len(dataset))

W0814 22:21:27.369957 140383773411136 hook.py:98] Torch was already hooked... skipping hooking process


In [60]:
workers[:5]

[<VirtualWorker id:PaulinePike #objects:10>,
 <VirtualWorker id:SharonReed #objects:12>,
 <VirtualWorker id:AlyshaWalker #objects:12>,
 <VirtualWorker id:StevenSpeziale #objects:12>,
 <VirtualWorker id:LouisLembke #objects:12>]

### Send Data to Remote Worker
In reality the data of each person would already be on a remote worker. Either each person's device or aggregated into multiple remote workers by a secure third party.

Here we have two options:
1. send the data to each worker individually
2. use PySyft's implementation of PyTorch's `Dataset` and `DataLoader`

I will use PySyft's `BaseDataset`, `FederatedDataset` and `FederatedDataLoader` since this simplifies dataprocessing for larger applications, even though it is not necessary for this example.


In [61]:
# Option 1
remote_dataset = []
for i in range(len(dataset)):
    d, t = dataset[i]
    
    r_d = d.send(workers[i])
    r_t = t.send(workers[i])
    
    remote_dataset.append((r_d, r_t))
    
r_d, r_t = remote_dataset[0]
r_d #this is now a pointer to remote data rather than an actual tensor on our device

(Wrapper)>[PointerTensor | me:89706398710 -> PaulinePike:18834545422]

In [324]:
# Option 2
# Cast the result in BaseDatasets
remote_dataset_list = []
for i in range(len(dataset)):
    d, t = dataset[i] #get data

    #send to worker before adding to dataset
    r_d = d.reshape(1,-1).send(workers[i])
    r_t = t.send(workers[i])
    
    dtset = sy.BaseDataset(r_d, r_t)
    remote_dataset_list.append(dtset)

# Build the FederatedDataset object
remote_dataset = sy.FederatedDataset(remote_dataset_list)
print(remote_dataset.workers[:5])


['MichaelBerry', 'SaraHoyos', 'KevinMack', 'DominickKern', 'SandraSmith']


In [325]:
train_loader = sy.FederatedDataLoader(remote_dataset, batch_size=1,
                                      shuffle=True, drop_last=False)

In [326]:
#new training logic to reflect federated learning
def federated_train(model, datasets, criterion):
    #use a simple stochastic gradient descent optimizer
    #define optimizer for each model
    optimizer = optim.SGD(params=model.parameters(), lr=args.lr)
    
    print(f'Federated Training on {len(datasets)} remote workers (dataowners)')
    steps=0
    model.train() #training mode

    for e in range(1, args.epochs+1):
        running_loss=0
        for ii, (data,target) in enumerate(datasets): #iterates over pointers to remote data
            steps+=1
            
            #FEDERATION STEP
            model.send(data.location) #send model to remote worker
            
            #NB the steps below all happen remotely
            optimizer.zero_grad()#zero out gradients so that one forward pass doesnt pick up previous forward's gradients
            outputs = model.forward(data) #make prediction
            outputs = outputs.reshape(1,-1) #get shape of (1,2) as we need at least two dimension
            loss = criterion(outputs,target)
            loss.backward()
            optimizer.step()
            
            #FEDERATION STEP
            model.get() #get model with new gradients back from remote worker
            
            #FEDERATION STEP
            _loss = loss.get() #get loss from remote worker
            running_loss+=_loss
            
            print_every= 200
            if (ii+1) % print_every == 0:
                print('Train Epoch: {} [{}/{}]  \tLoss: {:.6f}'.format(
                    e, ii+1, len(datasets), running_loss/print_every))
                
                running_loss=0
            

In [327]:
%%time
model = Model(args)
federated_train(model, train_loader, 1, nn.NLLLoss(), opt)

Federated Training on 653 remote workers (dataowners)
Train Epoch: 1 [100/653]  	Loss: 0.005786
Train Epoch: 1 [200/653]  	Loss: 0.004275
Train Epoch: 1 [300/653]  	Loss: 0.006068
Train Epoch: 1 [400/653]  	Loss: 0.008939
Train Epoch: 1 [500/653]  	Loss: 0.008683
Train Epoch: 1 [600/653]  	Loss: 0.003058
CPU times: user 6.17 s, sys: 3.42 ms, total: 6.17 s
Wall time: 6.19 s


_Viola!_ Now we have a federated model where the data never leaves the ownership of a remote device. We can implement this in a way where each user's device is a worker. The problem that occurs here, is that even though the data never leaves an owner's device, `model.get()` returns a new version of the model, which in turn violates privacy of the data owners by revealing information on their data through the updates that were made to the model. A solution to this problem is to use a **trusted third-party aggregator** to combine the remotely trained models into one, *before* sending it to the end-user (in this case me, the credit provider).

Notice how the federated model is about 6.5x slower than the non-federated model. This is simply one of the trade-offs that we have to be willing to make.

<a id="fl_model_avg"></a>
### Federated Learning with Model Averaging
We can perform federated learning in a way that trains a model on the data of each remote worker, and uses a *'trusted aggregator'* to combine the models into one. In this way, the non-trusted party, me for example, cannot tell which remote worker has updated gradients in what way. Gradient updates can be reverse engineered to understand what data has been passed through the network. This is an added layer of privacy protection in federated learning. The downside of this approach, however, is that it requires all parties to trust said aggregator.

In [219]:
# we want to use MSELoss because NLLLoss does not work well with federated aggregation
# transform the label to handle it better,

labels = pd.get_dummies(df['label']).values.astype(float)
target = th.tensor(labels, dtype=th.float32,
                   requires_grad=False).reshape(-1, 2)

In [261]:
def connect_to_workers(n_workers, secure_worker=False):
    workers = [sy.VirtualWorker(hook, id=name) for name in df.name.str/
                                        .replace(' ', '').values[:n_workers]]

    if secure_worker:
        return workers, sy.VirtualWorker(hook, id='trusted_aggregator')

    else:
        return workers

In [262]:
hook = sy.TorchHook(th)
workers, trusted_aggregator = connect_to_workers(len(dataset),
                                                 secure_worker=True)

W0814 23:31:57.275970 140383773411136 hook.py:98] Torch was already hooked... skipping hooking process


In [263]:
workers[:5]

[<VirtualWorker id:PaulinePike #objects:30>,
 <VirtualWorker id:SharonReed #objects:30>,
 <VirtualWorker id:AlyshaWalker #objects:30>,
 <VirtualWorker id:StevenSpeziale #objects:30>,
 <VirtualWorker id:LouisLembke #objects:30>]

In [None]:
#### Send Data to Remote Worker
In this step we need to send a copy of the model to each remote worker, as well as a new optimizer object


In [264]:
# Send data to remote workers
# Cast the result in BaseDatasets
remote_dataset_list = []
for i in range(len(dataset)):

    d, t = data[i], target[i]
    # send to worker before adding to dataset
    r_d = d.reshape(1, -1).send(workers[i])
    r_t = t.reshape(1, -1).send(workers[i])

    dtset = sy.BaseDataset(r_d, r_t)
    remote_dataset_list.append(dtset)

# Build the FederatedDataset object
remote_dataset = sy.FederatedDataset(remote_dataset_list)
print(remote_dataset.workers[:5])

['PaulinePike', 'SharonReed', 'AlyshaWalker', 'StevenSpeziale', 'LouisLembke']


In [265]:
args = Arguments(in_size, out_size, hidden_layers, activation=F.softmax, dim=1)
# for MSE loss, we want to use softmax and not log_softmax
base_model = Model(args)

models = [base_model.copy().send(w) for w in workers]
optimizers = [optim.SGD(params=m.parameters(), lr=args.lr) for m in models]

In [266]:
# new training logic to reflect PARALLEL federated learning with trusted aggregator
def federated_train_trusted_agg(models, datasets, optimizers):
    for e in range(1, args.epochs+1):
        running_loss = 0
        for i in range(len(models)):  # train each model concurrently
            model = models[i]
            opt = optimizers[i]
            _d = datasets.datasets[model.location.id]  # remote dataset

            # NB the steps below all happen remotely
            opt.zero_grad()  # zero out gradients so that one forward pass doesnt pick up previous forward's gradients
            outputs = model.forward(_d.data)  # make prediction
            # get shape of (1,2) as we need at least two dimension
            outputs = outputs.reshape(1, -1)
            # NllLoss does not work well here...
            loss = ((outputs - _d.targets)**2).sum()
            loss.backward()
            opt.step()

            # FEDERATION STEP
            _loss = loss.get().data  # get loss from remote worker
            if th.isnan(_loss) or _loss > 10:
                print(model.location.id, outputs.get(), _d.targets.get(), _loss)
                continue

            running_loss += _loss
        print('Epoch: {} \tLoss: {:.6f}'.format(
            e, running_loss/i))

    # move trained models to trusted thrid party
    for m in models:
        m.move(trusted_aggregator)

In [267]:
federated_train_trusted_agg(models, remote_dataset, optimizers)

Federated Training on 653 remote workers with Trusted Aggregator
Train Epoch: 1 	Loss: 0.559299
Train Epoch: 2 	Loss: 0.515398
Train Epoch: 3 	Loss: 0.479686
Train Epoch: 4 	Loss: 0.449864
Train Epoch: 5 	Loss: 0.424561


In [304]:
def set_model_avg(base_model, models):
    '''
    Average weights and biases of models on trusted aggregator

    Parameters
    ::models - list: pointers to remote models, should be on trusted aggregator

    Returns
    ::avg_weights
    ::avg_bias
    '''
    # average out each hidden layer individually
    for i in range(len(base_model.hidden_layers)):
        weights, biases = zip(*[(m.hidden_layers[i].weight.data,
                                 m.hidden_layers[i].bias.data) for m in models])
        base_model.hidden_layers[i].weight/
                                   .set_((sum(weights)/len(models)).get())
        base_model.hidden_layers[i].bias.set_((sum(biases)/len(models)).get())

    # average out output layer
    weights, biases = zip(*[(m.output.weight.data,
                             m.output.bias.data) for m in models])
    base_model.output.weight.set_((sum(weights)/len(models)).get())
    base_model.output.bias.set_((sum(biases)/len(models)).get())

In [305]:
# Average the model on trusted aggregator
with th.no_grad():
    set_model_avg(base_model, models)

In [307]:
# We now have to put the training and averaging steps together, so that we see an improvement in the whole model

print((f'Federated Training on {len(remote_dataset)}'
       ' remote workers with Trusted Aggregator'))
for i in range(1, args.epochs+1):
    print(f"Model Training Iteration {i}")
    base_model = Model(args)

    models = [base_model.copy().send(w) for w in workers]
    optimizers = [optim.SGD(params=m.parameters(), lr=args.lr) for m in models]

    federated_train_trusted_agg(models, remote_dataset, optimizers)

    # Average the model on trusted aggregator
    with th.no_grad():
        set_model_avg(base_model, models)

Federated Training on 653 remote workers with Trusted Aggregator
Model Training Iteration 1
Federated Training on 653 remote workers with Trusted Aggregator
Train Epoch: 1 	Loss: 0.484676
Train Epoch: 2 	Loss: 0.459549
Train Epoch: 3 	Loss: 0.448780
Train Epoch: 4 	Loss: 0.440922
Train Epoch: 5 	Loss: 0.433375
Model Training Iteration 2
Federated Training on 653 remote workers with Trusted Aggregator
Train Epoch: 1 	Loss: 0.641504
Train Epoch: 2 	Loss: 0.547193
Train Epoch: 3 	Loss: 0.528542
Train Epoch: 4 	Loss: 0.508749
Train Epoch: 5 	Loss: 0.499939
Model Training Iteration 3
Federated Training on 653 remote workers with Trusted Aggregator
Train Epoch: 1 	Loss: 0.625306
Train Epoch: 2 	Loss: 0.593438
Train Epoch: 3 	Loss: 0.563196
Train Epoch: 4 	Loss: 0.541335
Train Epoch: 5 	Loss: 0.527835
Model Training Iteration 4
Federated Training on 653 remote workers with Trusted Aggregator
Train Epoch: 1 	Loss: 0.489112
Train Epoch: 2 	Loss: 0.441764
Train Epoch: 3 	Loss: 0.402509
Train Epo

We have not trained a deep learning model using federated learning with a trusted aggregator! Make sure to test the model on a hold-out dataset. For the purpose of these examples, I will exclude testing sets for the sake of time.
Nevertheless, this **data is not yet encrypted** and we could deduce things specific to the applicant just by getting or looking at the remote data. <br>
In comes **encrypted deep learning**! Here we want to encrypt gradients such that no trusted aggregator is needed!

<a id="encrypted_dl"></a>
## Encrypted Deep Learning
Encrypted Deep Learning aims to preserve model accuracy and predictive power, without compromising the privacy and identity of individual users in the data. Encrypted deep learning provides privacy by enciphering the values that are being computed. Encrypted deep learning can involve encrypting the gradients or encrypting the data as well. I will walk through examples of encrypted deep learning using secure multi-party computation.

<a id="smpc"></a>
#### Secure Multi-Party Computation (SMPC)
PySyft has employed encryption using secure multi-party computation (SMPC). To learn more about the basics of SMPC and differential privacy [check out my SMPC (PySyft inspired) notebook](https://htmlpreview.github.io/?https://github.com/mkucz95/private_ai_finance/blob/master/secure_multi_party_computation.html). This will help you understand how the steps below successfully encrypt data while preserving model accuracy.

<a id="fl_encrypt_avg"></a>
### Encrypted Gradient Aggregation

The previous implementations of federated learning have all relied on a *'trusted aggregator'*. Unfortunately, in many scenarios we would probably not want to have to rely on such a third-party, potentially because no third-party can be deemed trustworthy enough.

Encrypted gradient aggregation follows largely the same process that unencrypted federated learning with trusted aggregator does. The difference exists in how training is conducted, since now we employ secure multi-party computation to aggregate the gradients (the gradients are encrypted across multiple workers). Therefore, only the training function changes. Since it is largely the same as the previous step, I won't provide a worked example, however visit [PySyft's tutorial to learn more](https://github.com/OpenMined/PySyft/blob/dev/examples/tutorials/Part%2010%20-%20Federated%20Learning%20with%20Secure%20Aggregation.ipynb). To summarize encrypted gradient aggregation, since each remote worker has their own model, encrypting this model includes sharing the parameters (weights and biases of the network) across all the workers. Using SMPC, we can aggregate the encrypted parameters after each remote model has passed through a training run. Since we would only get the aggregated model, we are unable to deduce individual worker's model parameters or gradients, ensuring privacy without the need for a trusted third-party aggregator.

Instead, let's work out how to train a network where the data, model parameters, AND the gradients are all encrypted!

### End-to-End Encryption
There are certain scenarios where for maximum privacy it is ideal to keep data encrypted as well as keep each federated model encrypted. **end-to-end encryption**

There are scenarios in which a model will have already been trained, for example from past customer data (before the implementation of differentially private techniques), or that we want to train a new secure model on entirely encrypted data.

In [194]:
crypto_provider = sy.VirtualWorker(hook, id='crypto_provider')

The `crypto_provider` is needed to provide random numbers and the field quotient `Q` as outlined in the [SMPC tutorial](https://github.com/mkucz95/private_ai_finance/blob/master/secure_multi_party_computation.ipynb). The `crypto_provider` never 'owns' or handles any data, it is simply there to ensure secure computation.

In [195]:
# for SMPC we need to work with integers.
# Therefore we convert all decimals to integers depending on the precision we want.
# this adds some noise/error to the data
data[0][:5], data.fix_precision(5)[0][:5]

(tensor([ 0.0000, 30.8300,  0.0000,  1.2500,  0.0000], grad_fn=<SliceBackward>),
 (Wrapper)>FixedPrecisionTensor>tensor([    0, 30830,     0,  1250,     0]))

In [197]:
# We don't use the whole dataset for efficiency purpose, but feel free to increase these numbers
n_train_items = 10  # len(dataset)
n_test_items = 10  # len(dataset)


def get_private_data_loaders(precision_fractional, workers, crypto_provider):
    '''
    Encrypt training and test data (both the features and targets)
    '''
    def secret_share(tensor):
        """
        Transform to fixed precision and secret share a tensor
        """
        return (
            tensor
            .fix_precision(precision_fractional=precision_fractional)
            .share(*workers, crypto_provider=crypto_provider,
                   requires_grad=True)
        )

    private_train_loader = [
        (secret_share(data), secret_share(target))
        for i, (data, target) in enumerate(dataset)
        if i < n_train_items
    ]

    # TODO iterate on this
    private_test_loader = [
        (secret_share(data), secret_share(target.float()))
        for i, (data, target) in enumerate(dataset)
        if i < n_test_items
    ]

    return private_train_loader, private_test_loader


private_train_loader, private_test_loader = get_private_data_loaders(
    precision_fractional=args.precision_fractional,
    workers=workers,
    crypto_provider=crypto_provider
)

In [198]:
private_train_loader[0]

((Wrapper)>AutogradTensor>FixedPrecisionTensor>[AdditiveSharingTensor]
 	-> [PointerTensor | me:14753523827 -> MichaelBerry:81453547002]
 	-> [PointerTensor | me:27673415801 -> SaraHoyos:6285248345]
 	*crypto provider: crypto_provider*,
 (Wrapper)>AutogradTensor>FixedPrecisionTensor>[AdditiveSharingTensor]
 	-> [PointerTensor | me:98745188905 -> MichaelBerry:63953560831]
 	-> [PointerTensor | me:76257879237 -> SaraHoyos:58040980791]
 	*crypto provider: crypto_provider*)

In [181]:
smpc_remote_dataset = []
for i in range(10):
    d, t = dataset[i]

    # send to worker before adding to dataset
    # securely encrypt across all workers
    r_d = d.fix_precision().share(*workers, crypto_provider=crypto_provider,
                                  requires_grad=True)
    r_t = t.fix_precision().share(*workers, crypto_provider=crypto_provider,
                                  requires_grad=True)

    smpc_remote_dataset.append((r_d, r_t))

print(r_d, r_t)

(Wrapper)>AutogradTensor>FixedPrecisionTensor>[AdditiveSharingTensor]
	-> [PointerTensor | me:54906963309 -> MichaelBerry:93683502389]
	-> [PointerTensor | me:63042872172 -> SaraHoyos:73389774872]
	-> [PointerTensor | me:48900505983 -> KevinMack:49273037583]
	*crypto provider: crypto_provider* (Wrapper)>AutogradTensor>FixedPrecisionTensor>[AdditiveSharingTensor]
	-> [PointerTensor | me:23710369807 -> MichaelBerry:54339387032]
	-> [PointerTensor | me:10964909446 -> SaraHoyos:62114046832]
	-> [PointerTensor | me:40775904429 -> KevinMack:1070431350]
	*crypto provider: crypto_provider*


Please note, that the data now also is also type `AutogradTensor`. As is explained by PySyft, we require the data tensors to maintain gradients, but since we fix the precision and PyTorch's autograd only works on float type tensors, PySyft has a special `AutogradTensor` to compute the gradient graph for backpropagation.

In [42]:
# new training logic to reflect federated learning
# generally speaking the training of fully encrypted networks is very similar
# to normal training
def encrypted_federated_train(model, datasets, optimizer, args):
    print(f'SMPC Training on {len(datasets)} remote workers (dataowners)')
    steps = 0
    model.train()  # training mode
    
    for e in range(1, args.epochs+1):
        running_loss = 0
        for ii, (data, target) in enumerate(datasets):  # iterates over pointers to remote data
            steps += 1
            # TODO model.send()?
            # NB the steps below all happen remotely
            # zero out gradients so that one forward pass doesnt pick up previous forward's gradients
            optimizer.zero_grad()
            outputs = model.forward(data)  # make prediction
            # get shape of (1,2) as we need at least two dimension
            outputs = outputs.reshape(1, -1)
            loss = ((outputs - target)**2).sum().refresh()
            loss.backward()
            optimizer.step()

            # get loss from remote worker and unencrypt
            _loss = loss.get().float_precision()
            running_loss += _loss

            print_every = 100
            if steps % print_every == 0:
                print('Train Epoch: {} [{}/{}]  \tLoss: {:.6f}'.format(
                    e, ii+1, len(datasets), _loss/print_every))

                running_loss = 0

In [201]:
# arguments for a MSE Loss Network
args = Arguments(in_size, out_size, hidden_layers, activation=F.softmax, dim=1)

# create new model, fix the precision and share the gradients across workers
smpc_model = Model(args)/
            .fix_precision(precision_fractional=args.precision_fractional) /
            .share(*workers, crypto_provider=crypto_provider,
                   requires_grad=True)

smpc_opt = opt.fix_precision(precision_fractional=args.precision_fractional)

In [203]:
%%time
encrypted_federated_train(smpc_model, private_train_loader, opt, args)

SMPC Training on 10 remote workers (dataowners)


RuntimeError: expected device cpu and dtype Float but got device cpu and dtype Long

###### Notes

**Loss Functions**
Using negative log-likelihood loss is not yet supported for multi-party computation. This is due to the nature of computation required for the loss function calculation.

_Options_
1. train on non-encrypted data (could be differentially private though) and then make predictions using encrypted data. This way we can use NLLLoss for training
2. Train the model on federated, encrypted data using mean squared error

The type of loss we use [MSELoss](https://pytorch.org/docs/stable/nn.html#mseloss) vs [NLLLoss](https://pytorch.org/docs/stable/nn.html#nllloss) would indicate that we need to handle our target tensors a little differently. These loss functions expect different shapes as the target inputs. Read the documentation if you want to find out more.
***
**Feature Normalization**<br>
Normalization can be handled on a per-datum basis. When working with images, for example, you can pass in normalization parameters before hand, so that each remote worker can normalize their data. However, normalization generally becomes difficult for encrypted data since it is not possible to ensure total privacy. However, data could generally be normalized with such a trusted party, but this introduces inherent privacy problems.

## Conclusion

Even though all the data here is encrypted it does not prevent an adversarial attack where shares are intentionally corrupted during computation. This is generally considered an open problem in SMPC and encrypted deep learning.

<a id="dp_dl"></a>
## Differential Privacy for Deep Learning
Differential privacy techniques provide certain guarantees for privacy in the context of deep learning. Instead of encrypting data, we add noise to the data (local DP) or to the output of a query (global DP) such that privacy is preserved to an acceptable degree. To familiarize yourself with Differential Privacy, visit a short guide I have put together [here](https://htmlpreview.github.io/?https://github.com/mkucz95/private_ai_finance/blob/master/differential-privacy.html). For the purpose of this example, however, I have not implemented differential privacy since data will be encrypted end-to-end anyway. However, one could have private deep learning employing differential privacy on a local or global level, and then work with unencrypted data, gradients, and models.

# TODO
- training and validation set
- loss graph

- https://github.com/OpenMined/PySyft/tree/dev/examples/tutorials/advanced/websockets-example-MNIST

- create final websocket live code