## Prerequisites

First, we load PySyft and PyTorch:

In [0]:
!pip install syft torch==1.3.0 torchvision==0.4.1 -f https://download.pytorch.org/whl/torch_stable.html

Because PyTorch does not yet support TensorFlow 2, we must limit the TensorFlow backend used by Google Colab to version 1.x as follows:

In [0]:
%tensorflow_version 1.x

We then proceed to check for CUDA support that will be used by PyTorch and assign it to a variable for reference later:

In [3]:
import torch as tc

if tc.cuda.is_available():
  print(tc.cuda.get_device_name(0))
  print(tc.cuda.device_count())
  tc.set_default_tensor_type(tc.cuda.FloatTensor)
  cuda_device = tc.device('cuda:0')

Tesla P100-PCIE-16GB
1


We import the PySyft library at this stage and assign the PyTorch tensor that we imported into the notebook earlier. This allows PySft to interface with PyTorch as it’s backend. We also create 2 worker nodes (Worker1 and Worker2) that will be used for our training later. To ease setup, we will create these workers as virtual nodes that are part of the same python runtime:

In [0]:
import syft as sf

hook = sf.TorchHook(tc)
w1 = sf.VirtualWorker(hook, id="worker1")
w2 = sf.VirtualWorker(hook, id="worker2")

## Worker Sanity Check

This section is to test whether the nodes are working and could be used for computation.

In [0]:
x = tc.Tensor([3,2,1]).send(w1)
print(x)
print(x.location)

In [0]:
y = tc.Tensor([1,2,3]).send(w1)
sum = x + y
print(sum)

In [0]:
sum = sum.get()
print(sum)

## Actual Trainer

We upload the msgtypes and msgtext pre-processed data into the notebook to be used for training:

In [5]:
from google.colab import files

upl = files.upload()

Saving msgtexts.npy to msgtexts.npy
Saving msgtypes.npy to msgtypes.npy


After successful loading of the data into the notebook, we import it into the following variables, texts for the text messages and types for the message types:

In [0]:
import io
import numpy as np

# Loading data
texts = np.load(io.BytesIO(upl["msgtexts.npy"]))
texts = tc.tensor(texts).to(cuda_device, tc.long)
types = np.load(io.BytesIO(upl["msgtypes.npy"]))
types = tc.tensor(types).to(cuda_device, tc.long)

We split the data into 80% for training and 20% for testing, then spit it into 2 halves for Worker1 and Worker2:

In [0]:
# Split training and test data
test_pct = 0.2
train_types = types[:-int(len(types)*test_pct)]
train_texts = texts[:-int(len(types)*test_pct)]
test_types = types[-int(len(types)*test_pct):]
test_texts = texts[-int(len(types)*test_pct):]

# Dataset split (one half for w1, other half for w2)
train_idx = int(len(train_types)/2)
test_idx = int(len(test_types)/2)

We then send the data sets (both test and training datasets) to the workers for processing. Note that the data can’t be processed in the notebook directly, as that would not be federated learning. It has to be done on separate (worker) nodes and not visible to the master (notebook):

In [0]:
w1_train_dataset = sf.BaseDataset(train_texts[:train_idx], 
                                  train_types[:train_idx]).send(w1)
w2_train_dataset = sf.BaseDataset(train_texts[train_idx:], 
                                    train_types[train_idx:]).send(w2)
w1_test_dataset = sf.BaseDataset(test_texts[:test_idx], 
                                  test_types[:test_idx]).send(w1)
w2_test_dataset = sf.BaseDataset(test_texts[test_idx:], 
                                  test_types[test_idx:]).send(w2)

We define a global value for the batch processing size that will be performed by each worker:

In [0]:
BATCH_SIZE = 32

We specify that the datasets that were sent to the workers are to be processed in a federated manner:

In [0]:
federated_train_dataset = sf.FederatedDataset([w1_train_dataset, w2_train_dataset])
federated_test_dataset = sf.FederatedDataset([w1_test_dataset, w2_test_dataset])

federated_train_loader = sf.FederatedDataLoader(federated_train_dataset, 
                                                shuffle=True, batch_size=BATCH_SIZE)
federated_test_loader = sf.FederatedDataLoader(federated_test_dataset, 
                                               shuffle=False, batch_size=BATCH_SIZE)

This is th GRU RNN model that will used for the training:

In [0]:
from torch import nn
import torch as tc

class GRUCell(nn.Module):

    def __init__(self, input_size, hidden_size, bias=True):
        super(GRUCell, self).__init__()
        self.input_size = input_size
        self.hidden_size = hidden_size
        self.bias = bias

        # reset gate
        self.fc_ir = nn.Linear(input_size, hidden_size, bias=bias)
        self.fc_hr = nn.Linear(hidden_size, hidden_size, bias=bias)

        # update gate
        self.fc_iz = nn.Linear(input_size, hidden_size, bias=bias)
        self.fc_hz = nn.Linear(hidden_size, hidden_size, bias=bias)

        # new gate
        self.fc_in = nn.Linear(input_size, hidden_size, bias=bias)
        self.fc_hn = nn.Linear(hidden_size, hidden_size, bias=bias)

        self.init_parameters()

    def init_parameters(self):
        std = 1.0 / np.sqrt(self.hidden_size)
        for w in self.parameters():
            w.data.uniform_(-std, std)

    def forward(self, x, h):

        x = x.view(-1, x.shape[1])

        i_r = self.fc_ir(x)
        h_r = self.fc_hr(h)
        i_z = self.fc_iz(x)
        h_z = self.fc_hz(h)
        i_n = self.fc_in(x)
        h_n = self.fc_hn(h)

        resetgate = tc.sigmoid(i_r + h_r)
        inputgate = tc.sigmoid(i_z + h_z)
        newgate = tc.tanh(i_n + (resetgate * h_n))

        hy = newgate + inputgate * (h - newgate)

        return hy


class GRU(nn.Module):
    def __init__(self, vocab_size, output_size=1, embedding_dim=50, hidden_dim=10, bias=True, dropout=0.2):
        super(GRU, self).__init__()

        self.hidden_dim = hidden_dim
        self.output_size = output_size

        # Dropout layer
        self.dropout = nn.Dropout(p=dropout)
        # Embedding layer
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        # GRU Cell
        self.gru_cell = GRUCell(embedding_dim, hidden_dim)
        # Fully-connected layer
        self.fc = nn.Linear(hidden_dim, output_size)
        # Sigmoid layer
        self.sigmoid = nn.Sigmoid()

    def forward(self, x, h):

        batch_size = x.shape[0]

        # Deal with cases were the current batch_size is different from general batch_size
        # It occurrs at the end of iteration with the Dataloaders
        if h.shape[0] != batch_size:
            h = h[:batch_size, :].contiguous()

        # Apply embedding
        x = self.embedding(x)

        # GRU cells
        for t in range(x.shape[1]):
            h = self.gru_cell(x[:,t,:], h)

        # Output corresponds to the last hidden state
        out = h.contiguous().view(-1, self.hidden_dim)

        # Dropout and fully-connected layers
        out = self.dropout(out)
        sig_out = self.sigmoid(self.fc(out))

        return sig_out, h

We define the number of training epochs that we want to perform using the GRU model we have defined earlier:

In [0]:
EPOCHS = 15
CLIP = 5
LR = 0.1

# Model parameters
VOCAB_SIZE = int(texts.max()) + 1
EMBEDDING_DIM = 50
HIDDEN_DIM = 10
DROPOUT = 0.2

# Initiating the model
model = GRU(vocab_size=VOCAB_SIZE, hidden_dim=HIDDEN_DIM, embedding_dim=EMBEDDING_DIM, dropout=DROPOUT)

We perform the training and test of the model defined earlier:

In [13]:
from sklearn.metrics import roc_auc_score

# Defining the loss and optimizer
criterion = nn.BCELoss()
optimizer = tc.optim.SGD(model.parameters(), lr=LR)

for e in range(EPOCHS):
  # To track the amount of loss
  losses = []

  # Batch processing loop for training
  for texts, types in federated_train_loader:
      # Location of current batch
      worker = texts.location
      # Initialize hidden state and send it to the worker
      h = tc.Tensor(tc.zeros((BATCH_SIZE, HIDDEN_DIM))).send(worker)
      # Send the model to the current worker
      model.send(worker)
      # Accumulated gradients to zero before optimization step
      optimizer.zero_grad()
      # Output from the model
      output, _ = model(texts, h)
      # Calculate the loss and perform backprop
      loss = criterion(output.squeeze(), types.float())
      loss.backward()
      # Clipping the gradient to avoid explosion
      nn.utils.clip_grad_norm_(model.parameters(), CLIP)
      # Backpropagation step
      optimizer.step() 
      # Get the model back to the master
      model.get()
      losses.append(loss.get())

  # Evaluate the model
  model.eval()

  with tc.no_grad():
    test_preds = []
    test_types_list = []
    eval_losses = []

    for texts, types in federated_test_loader:
      # Location of current batch
      worker = texts.location
      # Initialize hidden state and send it to worker
      h = tc.Tensor(tc.zeros((BATCH_SIZE, HIDDEN_DIM))).send(worker)    
      # Send the model to the worker
      model.send(worker)
      output, _ = model(texts, h)
      loss = criterion(output.squeeze(), types.float())
      eval_losses.append(loss.get())
      preds = output.squeeze().get()
      test_preds += list(preds.cpu().numpy())
      test_types_list += list(types.get().cpu().numpy().astype(int))
      # Get the model back to the master
      model.get()
            
    score = roc_auc_score(test_types_list, test_preds)
    
  print("Epoch {}/{}...  \    AUC: {:.3%}...  \    Training loss: {:.5f}...  \
  Validation loss: {:.5f}".format(e+1, EPOCHS, score, sum(losses)/len(losses), sum(eval_losses)/len(eval_losses)))
    
  model.train()

Epoch 1/15...  \    AUC: 59.536%...  \    Training loss: 0.43978...    Validation loss: 0.38470
Epoch 2/15...  \    AUC: 69.168%...  \    Training loss: 0.38347...    Validation loss: 0.36768
Epoch 3/15...  \    AUC: 77.022%...  \    Training loss: 0.36520...    Validation loss: 0.34497
Epoch 4/15...  \    AUC: 83.198%...  \    Training loss: 0.33711...    Validation loss: 0.30929
Epoch 5/15...  \    AUC: 88.115%...  \    Training loss: 0.30012...    Validation loss: 0.26688
Epoch 6/15...  \    AUC: 92.157%...  \    Training loss: 0.25172...    Validation loss: 0.21880
Epoch 7/15...  \    AUC: 94.885%...  \    Training loss: 0.19680...    Validation loss: 0.17415
Epoch 8/15...  \    AUC: 96.057%...  \    Training loss: 0.15761...    Validation loss: 0.14885
Epoch 9/15...  \    AUC: 96.581%...  \    Training loss: 0.11808...    Validation loss: 0.14219
Epoch 10/15...  \    AUC: 96.774%...  \    Training loss: 0.10079...    Validation loss: 0.13527
Epoch 11/15...  \    AUC: 97.057%...  \