# Section: Securing Federated Learning

- Lesson 1: Trusted Aggregator
- Lesson 2: Intro to Additive Secret Sharing
- Lesson 3: Intro to Fixed Precision Encoding
- Lesson 4: Secret Sharing + Fixed Precision in PySyft
- Final Project: Federated Learning wtih Encrypted Gradient Aggregation

# Lesson: Federated Learning with a Trusted Aggregator

In the last section, we learned how to train a model on a distributed dataset using Federated Learning. In particular, the last project aggregated gradients directly from one data owner to another. 

However, while in some cases it could be ideal to do this, what would be even better is to be able to choose a neutral third party to perform the aggregation.

As it turns out, we can use the same tools we used previously to accomplish this.

# Project: Federated Learning with a Trusted Aggregator

In [1]:
# try this project here!

In [2]:
import torch as th
from torch import nn, optim
import syft as sy

hook = sy.TorchHook(th)



In [3]:
user1 = sy.VirtualWorker(hook, id='user1')
user2 = sy.VirtualWorker(hook, id='user2')
concentrator = sy.VirtualWorker(hook, id='concentrator')

user1.add_workers([user2, concentrator])
user2.add_workers([user1, concentrator])
concentrator.add_workers([user1, user2])

print(user1._objects)
print(user2._objects)
print(concentrator._objects)



{}
{}
{}


In [4]:
# A Toy Dataset
data = th.tensor([[1.,1],[0,1],[1,0],[0,0]], requires_grad=True)
target = th.tensor([[1.],[1], [0], [0]], requires_grad=True)

# Distribute the datsets
data1 = data[:2].send(user1)
target1 = target[:2].send(user1)

data2 = data[2:].send(user2)
target2 = target[2:].send(user2)

datasets = [(data1, target1, user1), (data2, target2, user2)]

In [5]:
def train(iterations=20, steps_per_round=10):

    model = nn.Linear(2,1)
    
    for round_iter in range(iterations):
        # Distribute the models
        user1_model = model.copy().send(user1)
        user1_opt = optim.SGD(params=user1_model.parameters(), lr=0.1)
        user2_model = model.copy().send(user2)
        user2_opt = optim.SGD(params=user2_model.parameters(), lr=0.1)

        for i in range(steps_per_round):
            # user1 training
            user1_opt.zero_grad()
            user1_pred = user1_model(data1)
            user1_loss = ((user1_pred - target1)**2).sum()
            user1_loss.backward()
            user1_opt.step()

            # user2 training
            user2_opt.zero_grad()
            user2_pred = user2_model(data2)
            user2_loss = ((user2_pred - target2)**2).sum()
            user2_loss.backward()
            user2_opt.step()

        # Aggregate the models
        user1_model.move(concentrator)
        user2_model.move(concentrator)
        with th.no_grad():
            model.weight.set_(((user1_model.weight.data + user2_model.weight.data) / 2).get())
            model.bias.set_(((user1_model.bias.data + user2_model.bias.data) / 2).get())
        
        # Print the loss
        user1_loss = user1_loss.get()
        user2_loss = user2_loss.get()
        print('user1 loss: {},   user2 loss: {}'.format(user1_loss, user2_loss))
    
    concentrator.clear_objects()

    return model

In [6]:
model = train(50, 10)

user1 loss: 0.014233489520847797,   user2 loss: 0.005528910551220179
user1 loss: 0.006420702673494816,   user2 loss: 0.00020944108837284148
user1 loss: 0.002157557522878051,   user2 loss: 0.00021038809791207314
user1 loss: 0.0008175794500857592,   user2 loss: 0.0006332125049084425
user1 loss: 0.00037761038402095437,   user2 loss: 0.0007557716453447938
user1 loss: 0.00020870684238616377,   user2 loss: 0.0006956604192964733
user1 loss: 0.00013125743134878576,   user2 loss: 0.0005762622458860278
user1 loss: 8.947880996856838e-05,   user2 loss: 0.00045474895159713924
user1 loss: 6.38752753729932e-05,   user2 loss: 0.00035048704012297094
user1 loss: 4.675448144553229e-05,   user2 loss: 0.00026696251006796956
user1 loss: 3.468142676865682e-05,   user2 loss: 0.000202131865080446
user1 loss: 2.5904009817168117e-05,   user2 loss: 0.0001525801490060985
user1 loss: 1.9417399016674608e-05,   user2 loss: 0.00011499658285174519
user1 loss: 1.4582522453565616e-05,   user2 loss: 8.660159073770046e-05


In [7]:
model(data)

tensor([[9.9964e-01],
        [9.9966e-01],
        [2.4950e-04],
        [2.7023e-04]], grad_fn=<AddmmBackward>)

In [8]:
target

tensor([[1.],
        [1.],
        [0.],
        [0.]], requires_grad=True)

# Lesson: Intro to Additive Secret Sharing

While being able to have a trusted third party to perform the aggregation is certainly nice, in an ideal setting we wouldn't have to trust anyone at all. This is where Cryptography can provide an interesting alterantive. 

Specifically, we're going to be looking at a simple protocol for Secure Multi-Party Computation called Additive Secret Sharing. This protocol will allow multiple parties (of size 3 or more) to aggregate their gradients without the use of a trusted 3rd party to perform the aggregation. In other words, we can add 3 numbers together from 3 different people without anyone ever learning the inputs of any other actors.

Let's start by considering the number 5, which we'll put into a varible x

In [9]:
x = 5

Let's say we wanted to SHARE the ownership of this number between two people, Alice and Bob. We could split this number into two shares, 2, and 3, and give one to Alice and one to Bob

In [10]:
bob_x_share = 2
alice_x_share = 3

decrypted_x = bob_x_share + alice_x_share
decrypted_x

5

Note that neither Bob nor Alice know the value of x. They only know the value of their own SHARE of x. Thus, the true value of X is hidden (i.e., encrypted). 

The truly amazing thing, however, is that Alice and Bob can still compute using this value! They can perform arithmetic over the hidden value! Let's say Bob and Alice wanted to multiply this value by 2! If each of them multiplied their respective share by 2, then the hidden number between them is also multiplied! Check it out!

In [11]:
bob_x_share = 2 * 2
alice_x_share = 3 * 2

decrypted_x = bob_x_share + alice_x_share
decrypted_x

10

This even works for addition between two shared values!!

In [12]:
# encrypted "5"
bob_x_share = 2
alice_x_share = 3

# encrypted "7"
bob_y_share = 5
alice_y_share = 2

# encrypted 5 + 7
bob_z_share = bob_x_share + bob_y_share
alice_z_share = alice_x_share + alice_y_share

decrypted_z = bob_z_share + alice_z_share
decrypted_z

12

As you can see, we just added two numbers together while they were still encrypted!!!

One small tweak - notice that since all our numbers are positive, it's possible for each share to reveal a little bit of information about the hidden value, namely, it's always greater than the share. Thus, if Bob has a share "3" then he knows that the encrypted value is at least 3.

This would be quite bad, but can be solved through a simple fix. Decryption happens by summing all the shares together MODULUS some constant. I.e.

In [13]:
x = 5

Q = 23740629843760239486723

bob_x_share = 23552870267 # <- a random number
alice_x_share = Q - bob_x_share + x
alice_x_share

23740629843736686616461

In [14]:
(bob_x_share + alice_x_share) % Q

5

So now, as you can see, both shares are wildly larger than the number being shared, meaning that individual shares no longer leak this inforation. However, all the properties we discussed earlier still hold! (addition, encryption, decryption, etc.)

# Project: Build Methods for Encrypt, Decrypt, and Add 

In this project, you must take the lessons we learned in the last section and write general methods for encrypt, decrypt, and add. Store shares for a variable in a tuple like so.

In [15]:
x_share = (2,5,7)

Even though normally those shares would be distributed amongst several workers, you can store them in ordered tuples like this for now :)

In [16]:
# try this project here!
import numpy as np

In [17]:
def encrypt(x, n_splits, Q):
    """
    Args:
        x(int): The number to 'encrypt'
        n_splits(int): The number of workers to split the shares.
        Q(int): The large number to use for the modulus operation
    Returns:
        tuple: The tuple of shares
    """
    shares = np.random.randint(0, Q, size=n_splits)
    shares[-1] = Q - (shares[:-1].sum() % Q) + x
    return tuple(shares)

In [18]:
def decrypt(shares, Q):
    """
    Args:
        shares(tuple): The 'encrypted' shares
    Returns:
        int: The decrypted value
    """
    return sum(shares) % Q

In [19]:
def add(share1, share2):
    return (s1 + s2 for s1, s2 in zip(share1, share2))

In [20]:
Q = 23740629843760
shares = encrypt(675, 2, Q)
shares

(12155319478627, 11585310365808)

In [21]:
decrypt(shares, Q)

675

In [22]:
a = 342
b = 765
a + b

1107

In [23]:
Q = 23740629843760
n_splits = 50
a_e = encrypt(a, n_splits, Q)
b_e = encrypt(b, n_splits, Q)
s_e = add(a_e, b_e)
decrypt(s_e, Q)

1107

### Course solution

In [24]:
import random

In [25]:
Q = 23740629843760239486723
x = 5

In [26]:
def encrypt(x, n_shares=3):
    n_shares = 3
    shares = list()
    for i in range(n_shares - 1):
        shares.append(random.randint(0, Q))

    shares.append(Q - (sum(shares) % Q) + x)
    return tuple(shares)

In [27]:
def decrypt(shares):
    return sum(shares) % Q

In [28]:
def add(a, b):
    return (s1 + s2 for s1, s2 in zip(a, b))

In [29]:
a = 342
b = 765
a + b

1107

In [30]:
decrypt(add(encrypt(a), encrypt(b)))

1107

# Lesson: Intro to Fixed Precision Encoding

As you may remember, our goal is to aggregate gradients using this new Secret Sharing technique. However, the protocol we've just explored in the last section uses positive integers. However, our neural network weights are NOT integers. Instead, our weights are decimals (floating point numbers).

Not a huge deal! We just need to use a fixed precision encoding, which lets us do computation over decimal numbers using integers!

In [31]:
BASE=10
PRECISION=4

In [32]:
def encode(x):
    return int((x * (BASE ** PRECISION)) % Q)

def decode(x):
    return (x if x <= Q/2 else x - Q) / BASE**PRECISION

In [33]:
encode(3.5)

35000

In [34]:
decode(35000)

3.5

In [35]:
x = encrypt(encode(5.5))
y = encrypt(encode(2.3))
z = add(x,y)
decode(decrypt(z))

7.8

# Lesson: Secret Sharing + Fixed Precision in PySyft

While writing things from scratch is certainly educational, PySyft makes a great deal of this much easier for us through its abstractions.

In [37]:
bob = sy.VirtualWorker(hook, id='bob')
alice = sy.VirtualWorker(hook, id='alice')
secure_worker = sy.VirtualWorker(hook, id='secure_worker')

In [84]:
x = th.tensor([1,2,3,4,5])

### Secret Sharing Using PySyft

We can share using the simple .share() method!

In [81]:
bob.clear_objects()
alice.clear_objects()
secure_worker.clear_objects()

<VirtualWorker id:secure_worker #tensors:0>

In [85]:
x = x.share(bob, alice, secure_worker)

In [86]:
bob._objects

{70359730503: tensor([3374866408227520394, 3485889385843155371, 3352805840149520209,
         1054496370300055269, 1715230210032799941])}

In [88]:
alice._objects

{38443201291: tensor([ -330039577961216550,   842410172400154683, -2810460732890040224,
          3512318971339790530, -1064772342309215860])}

In [90]:
secure_worker._objects

{23200242275: tensor([-3044826830266303843, -4328299558243310052,  -542345107259479982,
         -4566815341639845795,  -650457867723584076])}

and as you can see, Bob now has one of the shares of x! Furthermore, we can still call addition in this state, and PySyft will automatically perform the remote execution for us!

In [41]:
y = x + x

In [42]:
y

(Wrapper)>[AdditiveSharingTensor]
	-> (Wrapper)>[PointerTensor | me:98727775431 -> bob:59554037850]
	-> (Wrapper)>[PointerTensor | me:8149739957 -> alice:96012424387]
	-> (Wrapper)>[PointerTensor | me:69173357206 -> secure_worker:91577878194]
	*crypto provider: me*

In [43]:
y.get()

tensor([ 2,  4,  6,  8, 10])

### Fixed Precision using PySyft

We can also convert a tensor to fixed precision using .fix_precision()

In [44]:
x = th.tensor([0.1,0.2,0.3])

In [45]:
x

tensor([0.1000, 0.2000, 0.3000])

In [46]:
x = x.fix_prec()

In [47]:
x.child.child

tensor([100, 200, 300])

In [48]:
y = x + x

In [49]:
y = y.float_prec()
y

tensor([0.2000, 0.4000, 0.6000])

### Shared Fixed Precision

And of course, we can combine the two!

In [50]:
x = th.tensor([0.1, 0.2, 0.3])

In [51]:
x = x.fix_prec().share(bob, alice, secure_worker)

In [52]:
y = x + x

In [53]:
y.get().float_prec()

tensor([0.2000, 0.4000, 0.6000])

Make sure to make the point that people can see the model averages in the clear.

# Final Project: Federated Learning with Encrypted Gradient Aggregation

In [60]:
user1 = user1.clear_objects()
user2 = user2.clear_objects()
user3 = sy.VirtualWorker(hook, id='user3')

In [66]:
# A Toy Dataset
data = th.tensor([[1.,1],[0,1],[1,0],[0,0],[1,1],[0,0]], requires_grad=True)
target = th.tensor([[1.],[1],[0],[0],[1],[0]], requires_grad=True)

# Distribute the datsets
data1 = data[:2].send(user1)
target1 = target[:2].send(user1)

data2 = data[2:4].send(user2)
target2 = target[2:4].send(user2)

data3 = data[4:].send(user3)
target3 = target[4:].send(user3)

users = [user1, user2, user3]
data_train = [data1, data2, data3]
data_target =[target1, target2, target3]

In [120]:
def train(iterations=20, steps_per_round=10):

    model = nn.Linear(2,1)
    
    for round_iter in range(iterations):
        # Distribute the models
        models = list()
        optimizers = list()
        for user in users:
            user_model = model.copy().send(user)
            user_opt = optim.SGD(params=user_model.parameters(), lr=0.1)
            models.append(user_model)
            optimizers.append(user_opt)
            
        for user_id, data, target, m, opt in zip(range(len(users)), data_train, data_target, models, optimizers):
            for i in range(steps_per_round):
                # user training
                opt.zero_grad()
                user_pred = m(data)
                user_loss = ((user_pred - target)**2).sum()
                user_loss.backward()
                opt.step()
            
            user_loss = user_loss.get()
            print('User {} loss = {}'.format(user_id, user_loss))

        # Aggregate the models
        
        # Share the models between all the users
        weights = list()
        bias = list()
        for m in models:
            weights.append(m.weight.data.fix_prec().share(*users).get())
            bias.append(m.bias.data.fix_prec().share(*users).get())
        
        # Average the tensors
        weights_sum = sum(weights)
        bias_sum = sum(bias)
            
        # Set the global model
        with th.no_grad():
            weights_sum = weights_sum.get()
            bias_sum = bias_sum.get()
            model.weight.set_(weights_sum.float_prec() / len(users))
            model.bias.set_(bias_sum.float_prec() / len(users))

    return model

In [121]:
model = train(50, 10)

User 0 loss = 0.032475389540195465
User 1 loss = 0.005831535439938307
User 2 loss = 0.012939343228936195
User 0 loss = 0.018524419516324997
User 1 loss = 0.002802141709253192
User 2 loss = 0.009432634338736534
User 0 loss = 0.011404258199036121
User 1 loss = 0.0015035668620839715
User 2 loss = 0.006692942231893539
User 0 loss = 0.007210960146039724
User 1 loss = 0.0008627597126178443
User 2 loss = 0.004617930389940739
User 0 loss = 0.004640923347324133
User 1 loss = 0.0005205414490774274
User 2 loss = 0.003135288367047906
User 0 loss = 0.003021321492269635
User 1 loss = 0.00032552957418374717
User 2 loss = 0.0021075396798551083
User 0 loss = 0.0019711581990122795
User 1 loss = 0.0002064485161099583
User 2 loss = 0.0014055421343073249
User 0 loss = 0.001289341482333839
User 1 loss = 0.0001323226315435022
User 2 loss = 0.0009337920346297324
User 0 loss = 0.0008468147716484964
User 1 loss = 8.542939758626744e-05
User 2 loss = 0.0006213450687937438
User 0 loss = 0.0005548321641981602
User 