# Section: Federated Learning

# Lesson: Introducing Federated Learning

Federated Learning is a technique for training Deep Learning models on data to which you do not have access. Basically:

Federated Learning: Instead of bringing all the data to one machine and training a model, we bring the model to the data, train it locally, and merely upload "model updates" to a central server.

Use Cases:

    - app company (Texting prediction app)
    - predictive maintenance (automobiles / industrial engines)
    - wearable medical devices
    - ad blockers / autotomplete in browsers (Firefox/Brave)
    
Challenge Description: data is distributed amongst sources but we cannot aggregated it because of:

    - privacy concerns: legal, user discomfort, competitive dynamics
    - engineering: the bandwidth/storage requirements of aggregating the larger dataset

# Lesson: Introducing / Installing PySyft

In order to perform Federated Learning, we need to be able to use Deep Learning techniques on remote machines. This will require a new set of tools. Specifically, we will use an extensin of PyTorch called PySyft.

### Install PySyft

The easiest way to install the required libraries is with [Conda](https://docs.conda.io/projects/conda/en/latest/user-guide/overview.html). Create a new environment, then install the dependencies in that environment. In your terminal:

```bash
conda create -n pysyft python=3
conda activate pysyft # some older version of conda require "source activate pysyft" instead.
conda install jupyter notebook
pip install syft
pip install numpy
```

If you have any errors relating to zstd - run the following (if everything above installed fine then skip this step):

```
pip install --upgrade --force-reinstall zstd
```

and then retry installing syft (pip install syft).

If you are using Windows, I suggest installing [Anaconda and using the Anaconda Prompt](https://docs.anaconda.com/anaconda/user-guide/getting-started/) to work from the command line. 

With this environment activated and in the repo directory, launch Jupyter Notebook:

```bash
jupyter notebook
```

and re-open this notebook on the new Jupyter server.

If any part of this doesn't work for you (or any of the tests fail) - first check the [README](https://github.com/OpenMined/PySyft.git) for installation help and then open a Github Issue or ping the #beginner channel in our slack! [slack.openmined.org](http://slack.openmined.org/)

In [1]:
import torch as th

In [2]:
x = th.tensor([1,2,3,4,5])
x

tensor([1, 2, 3, 4, 5])

In [3]:
y = x + x

In [4]:
print(y)

tensor([ 2,  4,  6,  8, 10])


In [2]:
import syft as sy

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [5]:
# create a hook which modify PyTorch with new functionality 
# pass in the reference to PyTorch library
# behind the scene, modify PyTorch API
hook = sy.TorchHook(th)

In [9]:
th.tensor([1,2,3,4,5])

tensor([1, 2, 3, 4, 5])

# Lesson: Basic Remote Execution in PySyft

## PySyft => Remote PyTorch

The essence of Federated Learning is the ability to train models in parallel on a wide number of machines. Thus, we need the ability to tell remote machines to execute the operations required for Deep Learning.

Thus, instead of using Torch tensors - we're now going to work with **pointers** to tensors. Let me show you what I mean. First, let's create a "pretend" machine owned by a "pretend" person - we'll call him Bob.

In [6]:
# how does interface to other machines look like (this is called a worker)

# create a worker: it simulates the interface that we might have to another machine

# tensors in PySyft are owned by worker

bob = sy.VirtualWorker(hook, id="bob")

In [12]:
# worker is a collection of objects that are tensors or other objects

bob._objects

{}

In [13]:
# send a tensor x
x = th.tensor([1,2,3,4,5])

In [14]:
# send it to Bob
x = x.send(bob)

In [15]:
# now Bob contains that tensor
bob._objects

{86320504910: tensor([1, 2, 3, 4, 5])}

In [16]:
# after sending x, what was returned is a Pointer to the remote object
# Pointer is a kind of Tensor and it has full tensor API at its disposal
# instead of executing these commands locally like a normal tensor, each command is serialized to a json or tuple
# format sent to Bob and Bob executes on our behalf and returns to us the pointer to the new object.
x

(Wrapper)>[PointerTensor | me:29637109465 -> bob:86320504910]

In [17]:
# few assets of tensors
# this pointer is pointing to Bob
x.location

<VirtualWorker id:bob #objects:1>

In [18]:
# check and see if this location equals to Bob
x.location == bob

True

In [19]:
# x has an id at location
x.id_at_location

86320504910

In [20]:
# x has an id as well
x.id

29637109465

In [21]:
# the owner is me by default
# this owner was created when we first imported and hooked PySyft into PyTorch
x.owner

<VirtualWorker id:me #objects:0>

In [22]:
# whenever we communicates with a remote machine, whenever we execute a command on x, we tell local worker to
# contact Bob and tell him to do this...
hook.local_worker

<VirtualWorker id:me #objects:0>

In [23]:
x

(Wrapper)>[PointerTensor | me:29637109465 -> bob:86320504910]

In [24]:
# get information back from Bob
x = x.get()
x

tensor([1, 2, 3, 4, 5])

In [25]:
bob._objects

{}

# Project: Playing with Remote Tensors

In this project, I want you to .send() and .get() a tensor to TWO workers by calling .send(bob,alice). This will first require the creation of another VirtualWorker called alice.

In [7]:
# create Alice virtual worker
alice = sy.VirtualWorker(hook, id="alice")

In [41]:
# create a tensor
y = th.rand(5)
y

tensor([0.3927, 0.0777, 0.9927, 0.5059, 0.3514])

In [42]:
# verify that Bob and Alice contains nothing
print(bob._objects)
print(alice._objects)

{}
{}


In [52]:
# send y to Bob and Alice
y_ptr = y.send(bob, alice)

In [44]:
# verify what's in Bob and Alice
print(bob._objects)
print(alice._objects)

{21300190183: tensor([0.3927, 0.0777, 0.9927, 0.5059, 0.3514])}
{21300190183: tensor([0.3927, 0.0777, 0.9927, 0.5059, 0.3514])}


In [46]:
# y now becomes a multipointer
y_ptr

(Wrapper)>[MultiPointerTensor]
	-> [PointerTensor | me:82754682940 -> bob:21300190183]
	-> [PointerTensor | me:93377228317 -> alice:21300190183]

In [48]:
y_ptr.child.child

{'bob': [PointerTensor | me:82754682940 -> bob:21300190183],
 'alice': [PointerTensor | me:93377228317 -> alice:21300190183]}

In [50]:
# get information back from Bob and Alice: this results in a list of two tensors
y_ptr.get()

[tensor([0.3927, 0.0777, 0.9927, 0.5059, 0.3514]),
 tensor([0.3927, 0.0777, 0.9927, 0.5059, 0.3514])]

In [53]:
y_ptr.get(sum_results=True)

tensor([0.7854, 0.1554, 1.9855, 1.0119, 0.7028])

# Lesson: Introducing Remote Arithmetic

In [54]:
x = th.tensor([1,2,3,4,5]).send(bob)
y = th.tensor([1,1,1,1,1]).send(bob)

In [55]:
x

(Wrapper)>[PointerTensor | me:25409265332 -> bob:43180661480]

In [56]:
y

(Wrapper)>[PointerTensor | me:49370737240 -> bob:90223188463]

In [57]:
# x and y are like normal tensors
z = x + y

In [58]:
z

(Wrapper)>[PointerTensor | me:82820814238 -> bob:65744497304]

In [59]:
z = z.get()
z

tensor([2, 3, 4, 5, 6])

In [60]:
z = th.add(x,y)
z

(Wrapper)>[PointerTensor | me:22943599373 -> bob:94098637390]

In [61]:
z = z.get()
z

tensor([2, 3, 4, 5, 6])

In [62]:
x = th.tensor([1.,2,3,4,5], requires_grad=True).send(bob)
y = th.tensor([1.,1,1,1,1], requires_grad=True).send(bob)

In [63]:
z = (x + y).sum()

In [64]:
# calculate gradient of x in respect of x and y
z.backward()

(Wrapper)>[PointerTensor | me:6035729596 -> bob:96683259137]

In [65]:
x = x.get()

In [66]:
x

tensor([1., 2., 3., 4., 5.], requires_grad=True)

In [67]:
x.grad

tensor([1., 1., 1., 1., 1.])

# Project: Learn a Simple Linear Model

In this project, I'd like for you to create a simple linear model which will solve for the following dataset below. You should use only Variables and .backward() to do so (no optimizers or nn.Modules). Furthermore, you must do so with both the data and the model being located on Bob's machine.

- create a fake data: x and y (add a noise to y)
- create a linear model
- train model

In [125]:
# create a fake dataset
x = th.randn((100, 1)).send(bob)
epsilon = th.rand((100, 1)).send(bob)
y = (1.5*x - 15.0  + epsilon).view(100)

In [128]:
# verify that x and y are pointers
print(x)
print(y)

(Wrapper)>[PointerTensor | me:12443852645 -> bob:86486077599]
(Wrapper)>[PointerTensor | me:61538641793 -> bob:38919652707]


In [121]:
import torch.nn as nn
from torch.optim import SGD

In [129]:
model = nn.Linear(1, 1).send(bob)

In [130]:
criterion = nn.MSELoss()
optimizer = SGD(model.parameters(), lr=0.01, momentum=0.9)

In [132]:
epochs = 100
for e in range(epochs):
    optimizer.zero_grad()
    output = model(x)
    loss = criterion(output, y)
    loss.backward()
    optimizer.step()
    print("Epoch: {}/100 \t Training loss: {:.4f}".format(e+1, loss.get()))

Epoch: 1/100 	 Training loss: 226.6617
Epoch: 2/100 	 Training loss: 209.2473
Epoch: 3/100 	 Training loss: 186.0948
Epoch: 4/100 	 Training loss: 159.4500
Epoch: 5/100 	 Training loss: 131.4760
Epoch: 6/100 	 Training loss: 104.0860
Epoch: 7/100 	 Training loss: 78.8279
Epoch: 8/100 	 Training loss: 56.8231
Epoch: 9/100 	 Training loss: 38.7534
Epoch: 10/100 	 Training loss: 24.8866
Epoch: 11/100 	 Training loss: 15.1328
Epoch: 12/100 	 Training loss: 9.1198
Epoch: 13/100 	 Training loss: 6.2769
Epoch: 14/100 	 Training loss: 5.9197
Epoch: 15/100 	 Training loss: 7.3269
Epoch: 16/100 	 Training loss: 9.8056
Epoch: 17/100 	 Training loss: 12.7411
Epoch: 18/100 	 Training loss: 15.6295
Epoch: 19/100 	 Training loss: 18.0955
Epoch: 20/100 	 Training loss: 19.8947
Epoch: 21/100 	 Training loss: 20.9053
Epoch: 22/100 	 Training loss: 21.1107
Epoch: 23/100 	 Training loss: 20.5772
Epoch: 24/100 	 Training loss: 19.4290
Epoch: 25/100 	 Training loss: 17.8241
Epoch: 26/100 	 Training loss: 15

# Lesson: Garbage Collection and Common Errors


In [15]:
bob = bob.clear_objects()

In [16]:
bob._objects

{}

In [17]:
x = th.tensor([1,2,3,4,5]).send(bob)

In [18]:
bob._objects

{91666742428: tensor([1, 2, 3, 4, 5])}

In [19]:
# delete x also deletes object in Bob
del x

In [20]:
bob._objects

{}

In [22]:
x = th.tensor([1,2,3,4,5]).send(bob)

In [24]:
# this is set to True by default: it tells the pointer to delete the object in Bob
x.child.garbage_collect_data

True

In [25]:
x = th.tensor([1,2,3,4,5]).send(bob)

In [26]:
bob._objects

{22099308590: tensor([1, 2, 3, 4, 5])}

In [27]:
x = "asdf"

In [28]:
bob._objects

{}

In [29]:
x = th.tensor([1,2,3,4,5]).send(bob)

In [30]:
x

(Wrapper)>[PointerTensor | me:67732117937 -> bob:88168650421]

In [31]:
bob._objects

{88168650421: tensor([1, 2, 3, 4, 5])}

In [32]:
x = "asdf"

In [33]:
# sometimes when we executes a few command, garbage collection doesn't work as expected
# since x pointer might be referenced differently
bob._objects

{88168650421: tensor([1, 2, 3, 4, 5])}

In [34]:
del x

In [35]:
bob._objects

{88168650421: tensor([1, 2, 3, 4, 5])}

In [36]:
bob = bob.clear_objects()
bob._objects

{}

In [37]:
for i in range(1000):
    x = th.tensor([1,2,3,4,5]).send(bob)

In [38]:
# Bob contains only one object
# for every loops, x pointer is re-assigned
bob._objects

{53987541927: tensor([1, 2, 3, 4, 5])}

In [39]:
x = th.tensor([1,2,3,4,5]).send(bob)
y = th.tensor([1,1,1,1,1])

In [40]:
z = x + y

TensorsNotCollocatedException: You tried to call a method involving two tensors where one tensor is actually located on another machine (is a PointerTensor). Call .get() on a the PointerTensor or .send({tensor_b.location.id}) on the other tensor.
Tensor A: {tensor_a}
Tensor B: {tensor_b}

In [44]:
x = th.tensor([1,2,3,4,5]).send(bob)
y = th.tensor([1,1,1,1,1]).send(alice)

In [45]:
z = x + y

TensorsNotCollocatedException: You tried to call __add__ involving two tensors which are not on the same machine! One tensor is on {tensor_a.location} while the other is on {tensor_b.location}. Use a combination of .move(), .get(), and/or .send() to co-locate them to the same machine.

# Lesson: Toy Federated Learning

Let's start by training a toy model the centralized way. This is about a simple as models get. We first need:

- a toy dataset
- a model
- some basic training logic for training a model to fit the data.

In [46]:
from torch import nn, optim

In [47]:
# A Toy Dataset
data = th.tensor([[1.,1],[0,1],[1,0],[0,0]], requires_grad=True)
target = th.tensor([[1.],[1], [0], [0]], requires_grad=True)

In [48]:
# A Toy Model
model = nn.Linear(2,1)

In [49]:
opt = optim.SGD(params=model.parameters(), lr=0.1)

In [50]:
def train(iterations=20):
    for iter in range(iterations):
        opt.zero_grad()

        pred = model(data)

        loss = ((pred - target)**2).sum()

        loss.backward()

        opt.step()

        print(loss.data)
        
train()

tensor(2.2232)
tensor(0.3527)
tensor(0.1526)
tensor(0.0979)
tensor(0.0670)
tensor(0.0466)
tensor(0.0326)
tensor(0.0231)
tensor(0.0164)
tensor(0.0118)
tensor(0.0085)
tensor(0.0062)
tensor(0.0045)
tensor(0.0033)
tensor(0.0025)
tensor(0.0018)
tensor(0.0014)
tensor(0.0010)
tensor(0.0008)
tensor(0.0006)


In [51]:
data_bob = data[0:2].send(bob)
target_bob = target[0:2].send(bob)

In [52]:
data_alice = data[2:4].send(alice)
target_alice = target[2:4].send(alice)

In [53]:
datasets = [(data_bob, target_bob), (data_alice, target_alice)]

In [54]:
def train(iterations=20):

    model = nn.Linear(2,1)
    opt = optim.SGD(params=model.parameters(), lr=0.1)
    
    for iter in range(iterations):

        for _data, _target in datasets:

            # send model to the data
            model = model.send(_data.location)

            # do normal training
            opt.zero_grad()
            pred = model(_data)
            loss = ((pred - _target)**2).sum()
            loss.backward()
            opt.step()

            # get smarter model back
            model = model.get()

            print(loss.get())

In [55]:
train()

tensor(2.6568, requires_grad=True)
tensor(1.4959, requires_grad=True)
tensor(0.8012, requires_grad=True)
tensor(0.9099, requires_grad=True)
tensor(0.4418, requires_grad=True)
tensor(0.5266, requires_grad=True)
tensor(0.2557, requires_grad=True)
tensor(0.3036, requires_grad=True)
tensor(0.1485, requires_grad=True)
tensor(0.1749, requires_grad=True)
tensor(0.0864, requires_grad=True)
tensor(0.1007, requires_grad=True)
tensor(0.0503, requires_grad=True)
tensor(0.0580, requires_grad=True)
tensor(0.0293, requires_grad=True)
tensor(0.0334, requires_grad=True)
tensor(0.0171, requires_grad=True)
tensor(0.0192, requires_grad=True)
tensor(0.0100, requires_grad=True)
tensor(0.0110, requires_grad=True)
tensor(0.0058, requires_grad=True)
tensor(0.0063, requires_grad=True)
tensor(0.0034, requires_grad=True)
tensor(0.0036, requires_grad=True)
tensor(0.0020, requires_grad=True)
tensor(0.0021, requires_grad=True)
tensor(0.0012, requires_grad=True)
tensor(0.0012, requires_grad=True)
tensor(0.0007, requi

# Lesson: Advanced Remote Execution Tools

In the last section we trained a toy model using Federated Learning. We did this by calling .send() and .get() on our model, sending it to the location of training data, updating it, and then bringing it back. However, at the end of the example we realized that we needed to go a bit further to protect people privacy. Namely, we want to average the gradients BEFORE calling .get(). That way, we won't ever see anyone's exact gradient (thus better protecting their privacy!!!)

But, in order to do this, we need a few more pieces:

- use a pointer to send a Tensor directly to another worker

And in addition, while we're here, we're going to learn about a few more advanced tensor operations as well which will help us both with this example and a few in the future!

In [56]:
bob.clear_objects()
alice.clear_objects()

<VirtualWorker id:alice #objects:0>

In [57]:
x = th.tensor([1,2,3,4,5]).send(bob)

In [58]:
x

(Wrapper)>[PointerTensor | me:35468219460 -> bob:2707117597]

In [59]:
x = x.send(alice)

In [60]:
bob._objects

{2707117597: tensor([1, 2, 3, 4, 5])}

In [61]:
alice._objects

{35468219460: (Wrapper)>[PointerTensor | alice:35468219460 -> bob:2707117597]}

In [62]:
x

(Wrapper)>[PointerTensor | me:91219527131 -> alice:35468219460]

In [63]:
y = x + x

In [64]:
y

(Wrapper)>[PointerTensor | me:95299019691 -> alice:26753865732]

In [65]:
bob._objects

{2707117597: tensor([1, 2, 3, 4, 5]),
 21016841917: tensor([ 2,  4,  6,  8, 10])}

In [66]:
alice._objects

{35468219460: (Wrapper)>[PointerTensor | alice:35468219460 -> bob:2707117597],
 26753865732: (Wrapper)>[PointerTensor | alice:26753865732 -> bob:21016841917]}

In [67]:
jon = sy.VirtualWorker(hook, id="jon")

In [68]:
x = th.tensor([1,2,3,4,5]).send(bob).send(alice)
y = th.tensor([1,2,3,4,5]).send(bob).send(jon)

z = x + y

TensorsNotCollocatedException: You tried to call __add__ involving two tensors which are not on the same machine! One tensor is on {tensor_a.location} while the other is on {tensor_b.location}. Use a combination of .move(), .get(), and/or .send() to co-locate them to the same machine.

In [69]:
bob.clear_objects()
alice.clear_objects()

x = th.tensor([1,2,3,4,5]).send(bob).send(alice)

In [70]:
bob._objects

{41072031909: tensor([1, 2, 3, 4, 5])}

In [71]:
alice._objects

{16584679333: (Wrapper)>[PointerTensor | alice:16584679333 -> bob:41072031909]}

In [72]:
# get the Alice pointer back, the pointer is now to Bob
x = x.get()
x

(Wrapper)>[PointerTensor | me:16584679333 -> bob:41072031909]

In [73]:
bob._objects

{41072031909: tensor([1, 2, 3, 4, 5])}

In [74]:
alice._objects

{}

In [75]:
x = x.get()
x

tensor([1, 2, 3, 4, 5])

In [76]:
bob._objects

{}

In [77]:
bob.clear_objects()
alice.clear_objects()

x = th.tensor([1,2,3,4,5]).send(bob).send(alice)

In [78]:
bob._objects

{72645605028: tensor([1, 2, 3, 4, 5])}

In [79]:
alice._objects

{84774712008: (Wrapper)>[PointerTensor | alice:84774712008 -> bob:72645605028]}

In [80]:
del x

In [81]:
bob._objects

{}

In [82]:
alice._objects

{}

# Lesson: Pointer Chain Operations

In [8]:
bob.clear_objects()
alice.clear_objects()

<VirtualWorker id:alice #objects:0>

In [9]:
x = th.tensor([1,2,3,4,5]).send(bob)

In [10]:
bob._objects

{22292189940: tensor([1, 2, 3, 4, 5])}

In [11]:
alice._objects

{}

In [12]:
# move function: send the pointer down to the machine we want to move the data to
# and call remote_get()
x.move(alice)

(Wrapper)>[PointerTensor | me:5727224057 -> alice:22292189940]

In [13]:
bob._objects

{}

In [14]:
alice._objects

{22292189940: tensor([1, 2, 3, 4, 5])}

In [15]:
x = th.tensor([1,2,3,4,5]).send(bob).send(alice)

In [16]:
bob._objects

{23861656330: tensor([1, 2, 3, 4, 5])}

In [17]:
alice._objects

{22292189940: tensor([1, 2, 3, 4, 5]),
 41626396504: (Wrapper)>[PointerTensor | alice:41626396504 -> bob:23861656330]}

In [18]:
x.remote_get()

(Wrapper)>[PointerTensor | me:49658759749 -> alice:41626396504]

In [19]:
bob._objects

{}

In [20]:
alice._objects

{22292189940: tensor([1, 2, 3, 4, 5]), 41626396504: tensor([1, 2, 3, 4, 5])}

In [21]:
x.move(bob)

(Wrapper)>[PointerTensor | me:49658759749 -> bob:41626396504]

In [22]:
x

(Wrapper)>[PointerTensor | me:49658759749 -> bob:41626396504]

In [23]:
bob._objects

{41626396504: tensor([1, 2, 3, 4, 5])}

In [24]:
alice._objects # garbage collection issue

{22292189940: tensor([1, 2, 3, 4, 5])}

# Section Project:

For the final project for this section, you're going to train on the MNIST dataset using federated learning However the gradient should not come up to central server in raw form

In [1]:
import torch
from torch import nn, optim
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, Subset
from torch.utils.data.sampler import SubsetRandomSampler
from sklearn.model_selection import KFold

import numpy as np
import matplotlib.pyplot as plt

In [2]:
transform = transforms.Compose([transforms.ToTensor(),
                               transforms.Normalize((0.5,), (0.5,))])
mnist_trainset = datasets.MNIST(root='./data', train=True, download=True, transform=transform)

In [3]:
print("Number of examples in training data: " + str(len(mnist_trainset)))

Number of examples in training data: 60000


In [4]:
import syft as sy
hook = sy.TorchHook(torch)

  _np_qint8 = np.dtype([("qint8", np.int8, 1)])
  _np_quint8 = np.dtype([("quint8", np.uint8, 1)])
  _np_qint16 = np.dtype([("qint16", np.int16, 1)])
  _np_quint16 = np.dtype([("quint16", np.uint16, 1)])
  _np_qint32 = np.dtype([("qint32", np.int32, 1)])
  np_resource = np.dtype([("resource", np.ubyte, 1)])


In [5]:
bob = sy.VirtualWorker(hook, id="bob")
alice = sy.VirtualWorker(hook, id="alice")

- divide datasets into 2 parts
- train on Bob machine first
- train on Alice machine
- move model from Alice to Bob, sum up
- get back to local worker

In [6]:
# define network architecture

class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(784, 256)
        self.fc2 = nn.Linear(256, 128)
        self.fc3 = nn.Linear(128, 64)
        self.fc4 = nn.Linear(64, 10)
    
    def forward(self, x):
        # flatten input tensor
        x = x.view(x.shape[0], -1)
        
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = F.log_softmax(self.fc4(x), dim=1)
        
        return x

In [7]:
# define some hyperparameters
batch_size = 64
epochs = 50
print_every = 5

In [8]:
# split training data into 2 parts: Bob and Alice
kf = KFold(n_splits=2)

for bob_index, alice_index in kf.split(mnist_trainset):
    bob_subset = Subset(mnist_trainset, bob_index)
    alice_subset = Subset(mnist_trainset, alice_index)
    break

print("Length of Bob's subset: {}".format(len(bob_subset)))
print("Length of Alice's subset: {}".format(len(alice_subset)))

Length of Bob's subset: 30000
Length of Alice's subset: 30000


In [9]:
# create dataloader
bob_loader = DataLoader(bob_subset, batch_size=batch_size)
alice_loader = DataLoader(alice_subset, batch_size=batch_size)

In [10]:
# initialize the models
bob_model = Network()
alice_model = Network()

In [11]:
# training on remote machine
def train(model, dataloader, worker):
    model = model.send(worker)
    model.train()
    criterion = nn.NLLLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.003)
    for e in range(epochs):
        train_loss = 0
        for data, label in dataloader:
            # send data and labels to worker machine
            data = data.send(worker)
            label = label.send(worker)

            optimizer.zero_grad()

            output = model(data)
            loss = criterion(output, label)
            loss.backward()
            optimizer.step()
            train_loss += loss

        if (e+1) % print_every==0:
            print("Epoch: {}/{}\tTraining loss: {:.4f}".format(e+1, epochs, train_loss.get()))
    return model

In [12]:
# train on Bob machine
bob_model = train(bob_model, bob_loader, bob)

Epoch: 5/50	Training loss: 66.4311
Epoch: 10/50	Training loss: 42.9918
Epoch: 15/50	Training loss: 38.3678
Epoch: 20/50	Training loss: 28.2546
Epoch: 25/50	Training loss: 27.2353
Epoch: 30/50	Training loss: 24.8982
Epoch: 35/50	Training loss: 18.2560
Epoch: 40/50	Training loss: 25.7563
Epoch: 45/50	Training loss: 12.6226
Epoch: 50/50	Training loss: 16.8756


In [13]:
alice_model = train(alice_model, alice_loader, alice)

Epoch: 5/50	Training loss: 60.8719
Epoch: 10/50	Training loss: 38.9868
Epoch: 15/50	Training loss: 28.8466
Epoch: 20/50	Training loss: 26.7248
Epoch: 25/50	Training loss: 22.8317
Epoch: 30/50	Training loss: 22.3633
Epoch: 35/50	Training loss: 21.8604
Epoch: 40/50	Training loss: 20.3788
Epoch: 45/50	Training loss: 13.5609
Epoch: 50/50	Training loss: 17.9747


In [14]:
print(bob_model.fc1.weight)
print(bob_model.fc1.bias)
print(bob_model.fc2.weight)
print(bob_model.fc2.bias)
print(bob_model.fc3.weight)
print(bob_model.fc3.bias)
print(bob_model.fc4.weight)
print(bob_model.fc4.bias)

(Wrapper)>[PointerTensor | me:15484316397 -> bob:45421884765]
(Wrapper)>[PointerTensor | me:43796411415 -> bob:76301588481]
(Wrapper)>[PointerTensor | me:19952644714 -> bob:91446991860]
(Wrapper)>[PointerTensor | me:98086699140 -> bob:80122908263]
(Wrapper)>[PointerTensor | me:38518621147 -> bob:10067488261]
(Wrapper)>[PointerTensor | me:6569421856 -> bob:43653066101]
(Wrapper)>[PointerTensor | me:9410400812 -> bob:90017736205]
(Wrapper)>[PointerTensor | me:47822334926 -> bob:79764111795]


In [15]:
print(alice_model.fc1.weight)
print(alice_model.fc1.bias)
print(alice_model.fc2.weight)
print(alice_model.fc2.bias)
print(alice_model.fc3.weight)
print(alice_model.fc3.bias)
print(alice_model.fc4.weight)
print(alice_model.fc4.bias)

(Wrapper)>[PointerTensor | me:46938450556 -> alice:16491595925]
(Wrapper)>[PointerTensor | me:71366605710 -> alice:32304953521]
(Wrapper)>[PointerTensor | me:94862650919 -> alice:4882748875]
(Wrapper)>[PointerTensor | me:47592026295 -> alice:81107653018]
(Wrapper)>[PointerTensor | me:44112525496 -> alice:23037489030]
(Wrapper)>[PointerTensor | me:2321855208 -> alice:39825305127]
(Wrapper)>[PointerTensor | me:3318904528 -> alice:2924485064]
(Wrapper)>[PointerTensor | me:15859973993 -> alice:29656787133]


In [16]:
model = Network()

In [17]:
model_state_dict = model.state_dict()
bob_model_state_dict = bob_model.state_dict()
alice_model_state_dict = alice_model.state_dict()

In [18]:
for parameter in model.state_dict().keys():
    model_state_dict[parameter] = ((bob_model_state_dict[parameter].copy().move(alice) + alice_model_state_dict[parameter])).get()

model.load_state_dict(model_state_dict)

<All keys matched successfully>

In [19]:
mnist_testset = datasets.MNIST(root='./data', train=False, download=True, transform=transform)

In [20]:
test_loader = DataLoader(mnist_testset, batch_size=batch_size)

In [21]:
def test(model, dataloader):
    model.eval()
    with torch.no_grad():
        accuracy = 0
        for data, label in dataloader:

            output = model(data)
            # calculate accuracy
            ps = torch.exp(output)
            top_p, top_class = ps.topk(1, dim=1)
            equals = top_class == label.view(*top_class.shape)
            accuracy += torch.mean(equals.type(torch.FloatTensor)).item()

        print("Accuracy: {:.4f}".format(accuracy/len(dataloader)))

In [22]:
test(model, test_loader)

Accuracy: 0.3812


In [24]:
test(bob_model.get(), test_loader)

Accuracy: 0.9644


In [25]:
test(alice_model.get(), test_loader)

Accuracy: 0.9677
