<a href="https://colab.research.google.com/github/soumyadip1995/Federated-Learning/blob/master/Introduction_to_Federated_Learning_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introduction To Federated Learning (Continued)




##Federated Learning

In the previous post, we have been introduced to federated learning. Let's go deeper 

To avoid collecting the training data, a distributed way of running the
learning algorithm is required as mentioned in the previous blog post. The parts of the algorithm that directly make use of the data need to be executed on the users’ computers. These
correspond to the sections of the algorithm that compute the previously
mentioned updates. In Federated Learning, users compute updates based
on their locally available training data and send them to a server. These
updates are much harder to interpret than pure data, so this is a major
improvement for privacy. For some applications with huge amounts of data,
it might also be cheaper to communicate the updates compared to directly
sending the data. While the computers of users generally have much less computational
power than servers in a data center, there is also less data on them. By having a lot of users, the computations that need to be performed are vastly distributed. There is not much work to do on the computer of each individual.



##Optimization


Conventional machine learning can be seen as a centralized system where all the work is performed on one server. In the process described so far, responsibilities are moved from the server to the clients. This is not a fully decentralized system because the server still runs a part of the algorithm. Instead, it is a federated system: A federation of clients takes over a significant amount of work but there is still one central entity, a server, coordinating everything. Before the server starts off the distributed learning process, it needs to initialize the model. Theoretically, this can be done randomly. In practice,
it makes sense to smartly initialize the model with sensible default values.
If some data is already available on the server, it can be used to pretrain
the model. In other cases, there might be a known configuration of model
parameters that already leads to acceptable results. Having a good first
model gives the training process a headstart and can reduce the time it
takes until convergence. (Img: Florian)

![alt text](https://florian.github.io/assets/posts/federated-learning/iteration.png)

After the model has been initialized, the iterative training process is
kicked off. A visualization of the steps performed in each iteration is shown
in Figure. At the beginning of an iteration, a subset of *K* clients are
randomly selected by the server. They receive a copy of the current model
parameters θ and use their locally available training data to compute an
update. The update of the i-th client is denoted by $Hi$.The updates are then sent back to the server.

 we are  assuming $θ$ and $H_i$ to be vectors. However, the same concepts transfer directly to any sequence of vectors since they can be concatenated into one long vector. The server waits until it has received all updates and then combines them into one final update. This is usually done by computing an average of all updates, weighted by how many training examples the respective clients.

## Federated Learning Example
Let's start by training a toy model the centralized way. This is about a simple as models get. We first need:

a toy dataset
a model
some basic training logic for training a model to fit the data.
Note: If this API is un-familiar to you - head on over to [fast.ai](http://fast.ai/ ) and take their course before continuing in this tutorial.

##Let's take a look at some code !!




In [0]:
import torch
from torch import nn
from torch import optim

# A Toy Dataset
data = torch.tensor([[0,0],[0,1],[1,0],[1,1.]], requires_grad=True)
target = torch.tensor([[0],[0],[1],[1.]], requires_grad=True)

# A Toy Model
model = nn.Linear(2,1)

def train():
    # Training Logic
    opt = optim.SGD(params=model.parameters(),lr=0.1)
    for iter in range(20):

        # 1) erase previous gradients (if they exist)
        opt.zero_grad()

        # 2) make a prediction
        pred = model(data)

        # 3) calculate how much we missed
        loss = ((pred - target)**2).sum()

        # 4) figure out which weights caused us to miss
        loss.backward()

        # 5) change those weights
        opt.step()

        # 6) print our progress
        print(loss.data)

train()

tensor(5.8433)
tensor(0.6510)
tensor(0.2085)
tensor(0.1366)
tensor(0.1020)
tensor(0.0774)
tensor(0.0588)
tensor(0.0448)
tensor(0.0341)
tensor(0.0260)
tensor(0.0198)
tensor(0.0151)
tensor(0.0115)
tensor(0.0088)
tensor(0.0067)
tensor(0.0051)
tensor(0.0039)
tensor(0.0030)
tensor(0.0023)
tensor(0.0018)


And there you have it! We've trained a basic model in the conventional manner. All our data is aggregated into our local machine and we can use it to make updates to our model. Federated Learning, however, doesn't work this way. So, let's modify this example to do it the Federated Learning way!

So, what do we need:

create a couple workers get pointers to training data on each worker updated training logic to do federated learning

New Training Steps:

send model to correct worker
train on the data located there
get the model back and repeat with next worker

In [0]:
import syft as sy
hook = sy.TorchHook(torch)
from syft import optim



ImportError: ignored

In [0]:
# create a couple workers

bob = sy.VirtualWorker(hook, id="bob")
alice = sy.VirtualWorker(hook, id="alice")

# A Toy Dataset
data = torch.tensor([[0,0],[0,1],[1,0],[1,1.]], requires_grad=True)
target = torch.tensor([[0],[0],[1],[1.]], requires_grad=True)

# get pointers to training data on each worker by
# sending some training data to bob and alice
data_bob = data[0:2]
target_bob = target[0:2]

data_alice = data[2:]
target_alice = target[2:]

# Iniitalize A Toy Model
model = nn.Linear(2,1)

data_bob = data_bob.send(bob)
data_alice = data_alice.send(alice)
target_bob = target_bob.send(bob)
target_alice = target_alice.send(alice)

# organize pointers into a list
datasets = [(data_bob,target_bob),(data_alice,target_alice)]

opt = optim.SGD(params=model.parameters(),lr=0.1)

def train():
    # Training Logic
    opt = optim.SGD(params=model.parameters(),lr=0.1)
    for iter in range(10):
        
        # NEW) iterate through each worker's dataset
        for data,target in datasets:
            
            # NEW) send model to correct worker
            model.send(data.location)

            # 1) erase previous gradients (if they exist)
            opt.zero_grad()

            # 2) make a prediction
            pred = model(data)

            # 3) calculate how much we missed
            loss = ((pred - target)**2).sum()

            # 4) figure out which weights caused us to miss
            loss.backward()

            # 5) change those weights
            opt.step(data.shape[0])
            
            # NEW) get model (with gradients)
            model.get()

            # 6) print our progress
            print(loss.get()) # NEW) slight edit... need to call .get() on loss\
    
# federated averaging

In [0]:
# create a couple workers



train()

tensor(0.0013)
tensor(0.0010)
tensor(0.0008)
tensor(0.0006)
tensor(0.0005)
tensor(0.0003)
tensor(0.0003)
tensor(0.0002)
tensor(0.0002)
tensor(0.0001)
tensor(9.1281e-05)
tensor(6.9796e-05)
tensor(5.3368e-05)
tensor(4.0807e-05)
tensor(3.1203e-05)
tensor(2.3859e-05)
tensor(1.8244e-05)
tensor(1.3950e-05)
tensor(1.0667e-05)
tensor(8.1569e-06)


###cool!

And voilà! We now are training a very simple Deep Learning model using Federated Learning! We send the model to each worker, generate a new gradient, and then bring the gradient back to our local server where we update our global model. Never in this process do we ever see or request access to the underlying training data! We preserve the privacy of Bob and Alice!!!

###Shortcomings of this Example

So, while this example is a nice introduction to Federated Learning, it still has some major shortcomings. Most notably, when we call model.get() and receive the updated model from Bob or Alice, we can actually learn a lot about Bob and Alice's training data by looking at their gradients. In some cases, we can restore their training data perfectly!



##A few Key Properties of Federated Learning


However, Federated Learning is a vastly more distributed way of collaboratively training machine learning models. It can be distinguished by several key properties. These also describe the some of the challenges in
Federated Learning:



**1. A huge number of users**: In a data center, there might be thousands
of compute nodes. Popular consumer software has several orders of
magnitude more users than that. All of these users should be able to
participate in training the model at some point, so Federated Learning
needs to scale to millions of users.

**2. Unbalanced number of data points**: It is easy to guarantee that
compute nodes have a similar number of data points in a data center.
In Federated Learning, there is no control over the location of data at
all. It is likely that some users generate vastly more data than others.

**3. Different data distributions**: Even worse, no assumptions about
the data distributions themselves can be made. While some users
probably generate similar data, two randomly picked users are likely
to compute very different updates. 

**4. Slow communication**: Since compute nodes in Federated Learning correspond to users’ computers, the network connections are often bad. This is especially the case if the training happens on mobile phones . Updates for complex models can be large, so this is problematic when training more sophisticated models

**5. Unstable communication**: Some clients might not even be connected to the internet at all when the server asks them to send back model updates. In a data center, it is much easier to guarantee that
compute nodes stay online.In a nutshell, Federated Learning is a massively distributed way of training machine learning models where very little control over the compute nodes and the distribution of data can be exercised. 

## Applications

The protocol introduced so far is fairly abstract and it remains to be discussed what exactly can be implemented with it. In general, it is possible to use Federated Learning with any type of models for which some notion of updates can be defined. It turns out that most popular learning algorithms
can be described in that way.


Some algorithms, however, can not be reformulated for Federated Learn-
ing. For example, k-NN requires memorizing the data points themselves which is not possible here. Non-parametric models in general can be problematic since their configurations often heavily depend on the exact data that was used to train them.

In terms of data, Federated Learning is especially useful in situations where users generate and label data themselves implicitly. This is the case for the application of trying to predict the next word. The model can then automatically update itself without having to store the data permanently. In such a situation, Federated Learning is extremely powerful because models can be trained with a huge amount of data that is not stored and not directly shared with a server at all. We can thus make use of a lot of data that we could otherwise not have used without violating the users’ privacy.