# Federated Learning Demo

This is a brief demo on using PySyft for federated learning. Check out [the main tutorials](https://github.com/OpenMined/PySyft/tree/dev/examples/tutorials) to learn more.


In [1]:
import torch
import syft as sy
hook = sy.TorchHook(torch)

Now create some tensors.

In [2]:
x = torch.tensor([1,2,3,4,5])
y = torch.tensor([1,1,1,1,1])

Create a worker and send the tensors.

In [3]:
mat = sy.VirtualWorker(hook, id="mat")

In [4]:
x_ptr = x.send(mat)
y_ptr = y.send(mat)

This creates PointerTensors on our client that point to the tensors on the worker.

In [5]:
print(x_ptr)
mat._objects

(Wrapper)>[PointerTensor - 3507829112@mat]


{3507829112: tensor([1, 2, 3, 4, 5]), 9150654653: tensor([1, 1, 1, 1, 1])}

The tensors are now on our worker. Operations on the PointerTensors (on our machine) are performed on the worker.

In [6]:
z = x_ptr + x_ptr
print(z)
mat._objects

(Wrapper)>[PointerTensor - 29706455715@mat]


{3507829112: tensor([1, 2, 3, 4, 5]),
 9150654653: tensor([1, 1, 1, 1, 1]),
 29706455715: tensor([ 2,  4,  6,  8, 10])}

We can get tensors back from the worker using `.get`.

In [7]:
z = z.get()
print(z)
mat._objects

tensor([ 2,  4,  6,  8, 10])


{3507829112: tensor([1, 2, 3, 4, 5]), 9150654653: tensor([1, 1, 1, 1, 1])}

You can do **backpropagation** on the worker as well!

In [8]:
x = torch.tensor([1,2,3,4,5.], requires_grad=True).send(mat)
y = torch.tensor([1,1,1,1,1.], requires_grad=True).send(mat)

In [9]:
z = (x + y).sum()
z.backward()

In [10]:
x = x.get()
print(x)
print(x.grad)

tensor([1., 2., 3., 4., 5.], requires_grad=True)
tensor([1., 1., 1., 1., 1.])


## Training models remotely

Since we can do backprop on the worker, we should be able to train a model on the worker as well.

In [11]:
from torch import nn
from syft import optim

In [12]:
# Create a new worker
alice = sy.VirtualWorker(hook, id="alice")

In [13]:
# A Toy Dataset
data = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1.]])
target = torch.tensor([[0], [0], [1], [1.]])

# Iniitalize A Toy Model
model = nn.Linear(2, 1)

# send data to alice
data_alice = data.send(alice)
target_alice = target.send(alice)

# send model to alice
model.send(data_alice.location)

Linear(in_features=2, out_features=1, bias=True)

In [14]:
print(alice._objects)

{41907534508: tensor([[0., 0.],
        [0., 1.],
        [1., 0.],
        [1., 1.]]), 42445027957: tensor([[0.],
        [0.],
        [1.],
        [1.]]), 98559516141: Parameter containing:
tensor([[-0.3651, -0.1132]], requires_grad=True), 52943628406: Parameter containing:
tensor([-0.1512], requires_grad=True)}


In [15]:
def train(data, target, model):
    # Training Logic
    opt = optim.SGD(params=model.parameters(),lr=0.1)
    for step in range(10):
        # 1) erase previous gradients (if they exist)
        opt.zero_grad()

        # 2) make a prediction
        pred = model(data)

        # 3) calculate how much we missed
        loss = ((pred - target)**2).sum()

        # 4) figure out which weights caused us to miss
        loss.backward()

        # 5) change those weights
        opt.step(data.shape[0])

        # 6) print our progress
        print(loss.get())

In [16]:
train(data_alice, target_alice, model)
# Get the model back after training
model.get()

tensor(5.0467, requires_grad=True)
tensor(2.8338, requires_grad=True)
tensor(1.7639, requires_grad=True)
tensor(1.2283, requires_grad=True)
tensor(0.9444, requires_grad=True)
tensor(0.7804, requires_grad=True)
tensor(0.6751, requires_grad=True)
tensor(0.5995, requires_grad=True)
tensor(0.5401, requires_grad=True)
tensor(0.4903, requires_grad=True)


Linear(in_features=2, out_features=1, bias=True)

## Model Update Averaging

While this example is a nice introduction to Federated Learning, it still has some major shortcomings. Most notably, when we call `model.get()` and receive the updated model from Alice, we can actually learn a lot about Alice's training data by looking at their gradients. In some cases, we can restore their training data perfectly! 

So, what is there to do? Well, the first strategy people employ is to **average the gradient across multiple individuals before uploading it to the central server**. This strategy, however, will require some more sophisticated use of PointerTenor objects.

I'll create a new worker, Bob, and use Mat as a trusted third party where I'll do the averaging.

In [17]:
# Create a new worker
bob = sy.VirtualWorker(hook, id="bob")

First we need to tell our workers about each other.

In [18]:
bob.add_workers([alice, mat])
alice.add_workers([bob, mat])
mat.add_workers([alice, bob])

Now we can split up our data and send it to the workers.

In [19]:
# get pointers to training data on each worker by
# sending some training data to bob and alice
bobs_data = data[0:2].send(bob)
bobs_target = target[0:2].send(bob)

alices_data = data[2:].send(alice)
alices_target = target[2:].send(alice)

Create the model and send it to our two workers.

In [20]:
# Iniitalize A Toy Model
model = nn.Linear(2,1)

bobs_model = model.copy().send(bob)
alices_model = model.copy().send(alice)

bobs_opt = optim.SGD(params=bobs_model.parameters(), lr=0.1)
alices_opt = optim.SGD(params=alices_model.parameters(), lr=0.1)

Train models on the two workers in parallel.

In [21]:
for i in range(10):
    # Train Bob's Model
    bobs_opt.zero_grad()
    bobs_pred = bobs_model(bobs_data)
    bobs_loss = ((bobs_pred - bobs_target)**2).sum()
    bobs_loss.backward()

    bobs_opt.step(bobs_data.shape[0])
    bobs_loss = bobs_loss.get().data

    # Train Alice's Model
    alices_opt.zero_grad()
    alices_pred = alices_model(alices_data)
    alices_loss = ((alices_pred - alices_target)**2).sum()
    alices_loss.backward()

    alices_opt.step(alices_data.shape[0])
    alices_loss = alices_loss.get().data
    
    print("Bob: " + str(bobs_loss) + "  Alice: " + str(alices_loss))

Bob: tensor(0.1923)  Alice: tensor(1.2013)
Bob: tensor(0.1084)  Alice: tensor(0.3670)
Bob: tensor(0.0624)  Alice: tensor(0.1192)
Bob: tensor(0.0371)  Alice: tensor(0.0450)
Bob: tensor(0.0231)  Alice: tensor(0.0222)
Bob: tensor(0.0152)  Alice: tensor(0.0147)
Bob: tensor(0.0107)  Alice: tensor(0.0118)
Bob: tensor(0.0081)  Alice: tensor(0.0103)
Bob: tensor(0.0065)  Alice: tensor(0.0093)
Bob: tensor(0.0055)  Alice: tensor(0.0084)


Now we'll move the models to our trusted worker and average the parameters.

In [22]:
alices_model.move(mat)
bobs_model.move(mat)

avg_weights = ((alices_model.weight.data + bobs_model.weight.data) / 2)
avg_bias = ((alices_model.bias.data + bobs_model.bias.data) / 2)

Finally, move those averaged weights back to our centralized model.

In [23]:
model.weight.data.set_(avg_weights.get())
model.bias.data.set_(avg_bias.get())

tensor([0.3639])

In a non-demo setting you would do this whole training loop a bunch of times. Also, it's in general preferred to average the gradients rather than the models. Again, please visit [the main tutorials](https://github.com/OpenMined/PySyft/tree/dev/examples/tutorials) to learn more about federated learning with PySyft.