In [4]:
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings('ignore')
import copy
from torch import nn
from torch import optim
import torch.nn.functional as F
import syft as sy
import torch as th
from helpers import Model

# BEWARE, ignoreing warnings is not always a good idea
# I am doing it for presentation

<a id="federated_dl"></a>
## Federated Deep Learning
The idea behind federated learning is that we train a model on subsets of data (encrypted or otherwise) that never leaves the ownership of an individual. In this example of credit rating scores it would allow people to submit claims without ever losing ownership of their data. It requires very little trust of the party to which the application is being submitted.

Even though we currently have our dataset located locally, we want to simulate having many people in our network who each maintain ownership of their data. Therefore we have to create a virtual worker for each datum. The work/data flow in this situation would be as follows:

- get pointers to training data on each remote worker <br>
**Training Steps:**
- send model to remote worker
- train model on data located with remote worker
- receive updated model from remote worker
- repeat for all workers

In [52]:
features = np.load('data/features.npy')
labels = np.load('data/labels.npy')
data = th.tensor(features, dtype=th.float32, requires_grad=True)
target = th.tensor(labels, dtype=th.int64, requires_grad=False).reshape(-1,1)

In [28]:
class Arguments():
    def __init__(self, in_size, out_size, hidden_layers,
                       activation=F.softmax, dim=-1):
        self.batch_size = 1
        self.drop_p = None
        self.epochs = 10
        self.lr = 0.001
        self.in_size = in_size
        self.out_size = out_size
        self.hidden_layers = hidden_layers
        self.precision_fractional=10
        self.activation = activation
        self.dim = dim

In [29]:
hook = sy.TorchHook(th)

def connect_to_workers(n_workers, secure_worker=False):
    '''
    Connect to remote workers with PySyft
    
    Inputs
        n_workers (int) - how many workers to connect to
        secure_worker (bool) - whether to return a trusted aggregator as well
        
    Outputs
        workers (list, <PySyft.VirtualWorker>)
    '''
    workers = [sy.VirtualWorker(hook, id=f'w_{i}') for i in range(n_workers)]

    if secure_worker:
        return workers, sy.VirtualWorker(hook, id='trusted_aggregator')

    else:
        return workers

W0821 22:10:23.372497 140304801736512 hook.py:102] Torch was already hooked... skipping hooking process


In [30]:
dataset = [(data[i], target[i]) for i in range(len(data))]

#instantiate model
in_size = data[0].shape[0]
out_size = 2
hidden_layers=[30,15]
workers = connect_to_workers(len(dataset))

In [31]:
workers[:5] 
# each individual worker corresponds to a person, or rather their device
# currently these people have no objects associated with them

[<VirtualWorker id:w_0 #objects:12>,
 <VirtualWorker id:w_1 #objects:4>,
 <VirtualWorker id:w_2 #objects:4>,
 <VirtualWorker id:w_3 #objects:4>,
 <VirtualWorker id:w_4 #objects:4>]

### Send Data to Remote Worker
In reality the data of each person would already be on a remote worker. Either each person's device or aggregated into multiple remote workers by a secure third party.

Here we have two options:
1. send the data to each worker individually
2. use PySyft's implementation of PyTorch's `Dataset` and `DataLoader`

I will use PySyft's `BaseDataset`, `FederatedDataset` and `FederatedDataLoader` since this simplifies dataprocessing for larger applications, even though it is not necessary for this example.


In [62]:
# Option 1
remote_dataset = []
for i in range(len(dataset)):
    d, t = dataset[i]
    
    r_d = d.reshape(1,-1).send(workers[i])
    r_t = t.reshape(1,-1).send(workers[i])
    
    remote_dataset.append((r_d, r_t))
    
r_d, r_t = remote_dataset[0]
r_d #this is now a pointer to remote data rather than an actual tensor on our device

(Wrapper)>[PointerTensor | me:69228889286 -> w_0:35953420875]

In [63]:
# Option 2
# Cast the result in BaseDatasets
remote_dataset_list = []
for i in range(len(dataset)):
    d, t = dataset[i] #get data

    #send to worker before adding to dataset
    r_d = d.reshape(1,-1).send(workers[i])
    r_t = t.reshape(1,-1).send(workers[i])
    
    dtset = sy.BaseDataset(r_d, r_t)
    remote_dataset_list.append(dtset)

# Build the FederatedDataset object
remote_dataset = sy.FederatedDataset(remote_dataset_list)
print(remote_dataset.workers[:5])


['w_0', 'w_1', 'w_2', 'w_3', 'w_4']


In [64]:
train_loader = sy.FederatedDataLoader(remote_dataset, batch_size=1,
                                      shuffle=True, drop_last=False)

In [75]:
#new training logic to reflect federated learning
def federated_train(model, datasets, criterion, args):
    #use a simple stochastic gradient descent optimizer
    #define optimizer for each model
    optimizer = optim.SGD(params=model.parameters(), lr=args.lr)
    
    print(f'Federated Training on {len(datasets)} remote workers (dataowners)')
    steps=0
    model.train() #training mode

    for e in range(1, args.epochs+1):
        running_loss=0
        for ii, (data,target) in enumerate(datasets): 
            #iterates over pointers to remote data
            steps+=1
            
            #FEDERATION STEP
            model.send(data.location) 
            #send model to remote worker
            
            #NB the steps below all happen remotely
            optimizer.zero_grad()#zero out gradients so that one forward pass doesnt pick up previous forward's gradients
            outputs = model.forward(data) #make prediction
            outputs = outputs.reshape(1,-1) #get shape of (1,2) as we need at least two dimension
 
            loss = criterion(outputs,target[0])
            loss.backward()
            optimizer.step()
            
            #FEDERATION STEP
            model.get() #get model with new gradients back from remote worker
            
            #FEDERATION STEP
            _loss = loss.get() #get loss from remote worker
            running_loss+=_loss
            
            print_every= 200
            if (ii+1) % print_every == 0:
                print('Train Epoch: {} [{}/{}]  \tLoss: {:.6f}'.format(
                    e, ii+1, len(datasets), running_loss/print_every))
                
                running_loss=0
            

In [76]:
%%time
args = Arguments(in_size, out_size, hidden_layers, 
                 activation=F.log_softmax, dim=1)
model = Model(args)

federated_train(model, train_loader, nn.NLLLoss(), args)

Federated Training on 653 remote workers (dataowners)
Train Epoch: 1 [200/653]  	Loss: 1.117102
Train Epoch: 1 [400/653]  	Loss: 483.192688
Train Epoch: 1 [600/653]  	Loss: 41.180523
Train Epoch: 2 [200/653]  	Loss: 0.591983
Train Epoch: 2 [400/653]  	Loss: 0.804055
Train Epoch: 2 [600/653]  	Loss: 0.728580
Train Epoch: 3 [200/653]  	Loss: 0.627162
Train Epoch: 3 [400/653]  	Loss: 0.756000
Train Epoch: 3 [600/653]  	Loss: 0.709238
Train Epoch: 4 [200/653]  	Loss: 0.657211
Train Epoch: 4 [400/653]  	Loss: 0.724889
Train Epoch: 4 [600/653]  	Loss: 0.698314
Train Epoch: 5 [200/653]  	Loss: 0.681328
Train Epoch: 5 [400/653]  	Loss: 0.704380
Train Epoch: 5 [600/653]  	Loss: 0.692041
Train Epoch: 6 [200/653]  	Loss: 0.699990
Train Epoch: 6 [400/653]  	Loss: 0.690608
Train Epoch: 6 [600/653]  	Loss: 0.688358
Train Epoch: 7 [200/653]  	Loss: 0.714111
Train Epoch: 7 [400/653]  	Loss: 0.681206
Train Epoch: 7 [600/653]  	Loss: 0.686137
Train Epoch: 8 [200/653]  	Loss: 0.724644
Train Epoch: 8 [400

_Viola!_ Now we have a federated model where the data never leaves the ownership of a remote device. We can implement this in a way where each user's device is a worker. The problem that occurs here, is that even though the data never leaves an owner's device, `model.get()` returns a new version of the model, which in turn violates privacy of the data owners by revealing information on their data through the updates that were made to the model. A solution to this problem is to use a **trusted third-party aggregator** to combine the remotely trained models into one, *before* sending it to the end-user (in this case me, the credit provider).

Notice how the federated model is about 6.5x slower than the non-federated model. This is simply one of the trade-offs that we have to be willing to make.

The next step in this journey is **Federated Learning with Model Averaging** which you can find [here]()