<a href="https://colab.research.google.com/github/soumyadip1995/Federated-Learning/blob/master/The_Basic_tools_of_Privacy%2C_decentralized_data_.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introduction


###Centralized vs Decentralized vs Distributed systems

![alt text](https://cdn-images-1.medium.com/max/1094/1*WG5_xDDwHv0lMaVUYLNbVA.png)



###Centralized
Centralized systems directly control the operation of the individual units and flow of information from a single center. All individuals are directly dependent on the central power to send and receive information, and to be commanded.

single server
easy to publish
difficult to scale
single point of failure
Examples (Google, Facebook, Amazon, etc.)

###Distributed
Distributed systems spread computation across multiple nodes instead of just one. Google for example has adopted a distributed architecture internally to speed up computing and data latency. This means that a system can be both centralized and distributed.

Examples (Google, Facebook, Amazon, etc.)

###Decentralized

Decentralized systems are ones where no node is telling any other node what to do. Bitcoin is both distributed because its timestamped public ledger, the blockchain, resides on multiple computer and decentralized because if one node goes down, the network is still able to operate.

*     Multiple Servers

*   Demand and Failures better handled
*   Examples (Bitcoin, Ethereum, Steemit)




## Prerequisites


Know PyTorch - if not then take the http://fast.ai course and come back
Read the PySyft Framework Paper https://arxiv.org/pdf/1811.04017.pdf! This will give you a thorough background on how PySyft is constructed which will help things make more sense.



###Now Lets look at some code that can help us understand the basic building blocks of private data

In [0]:
!git clone "https://github.com/OpenMined/PySyft.git"

Cloning into 'PySyft'...
remote: Enumerating objects: 169, done.[K
remote: Counting objects: 100% (169/169), done.[K
remote: Compressing objects: 100% (116/116), done.[K
remote: Total 19909 (delta 85), reused 107 (delta 53), pack-reused 19740[K
Receiving objects: 100% (19909/19909), 26.68 MiB | 20.22 MiB/s, done.
Resolving deltas: 100% (12609/12609), done.


In [3]:
# Run this cell to see if things work
import syft as sy
from syft.frameworks.torch.tensors.interpreters import PointerTensor
from syft.frameworks.torch.tensors.decorators import LoggingTensor
import sys
import torch
hook = sy.TorchHook(torch)
from torch.nn import Parameter
import torch.nn as nn
import torch.nn.functional as F

torch.tensor([1,2,3,4,5])

tensor([1, 2, 3, 4, 5])

So - the first question you may be wondering is - How in the world do we train a model on data we don't have access to?

Well, the answer is surprisingly simple. If you're used to working in PyTorch, then you're used to working with torch. Tensor objects

### Sending Tensors to Bob's Machine

Whereas normally we would perform data science / deep learning on the machine which holds the data, now we want to perform this kind of computation on some other machine. More specifically, we can no longer simply assume that the data is on our local machine.

Thus, instead of using Torch tensors, we're now going to work with pointers to tensors. Let me show you what I mean. First, let's create a "pretend" machine owned by a "pretend" person - we'll call him Bob.

Let's say Bob's machine is on another planet - perhaps on Mars! But, at the moment the machine is empty. Let's create some data so that we can send it to Bob and learn about pointers!


And now - let's send our tensors to Bob!!

In [4]:
x = torch.tensor([1,2,3,4,5])
y = x + x
print(y)

#Obviously, using these super fancy (and powerful!) tensors is important but also requires you to have the data on your local machine. This is where our journey begins.

bob = sy.VirtualWorker(hook, id="bob")

x = torch.tensor([1,2,3,4,5])
y = torch.tensor([1,1,1,1,1])

x_ptr = x.send(bob)#Send tensors to bob
y_ptr = y.send(bob)

bob._objects

tensor([ 2,  4,  6,  8, 10])


{5436207437: tensor([1, 1, 1, 1, 1]), 58235070220: tensor([1, 2, 3, 4, 5])}


BOOM! Now Bob has two tensors!

In [5]:
z = x_ptr + x_ptr

z

bob._objects

{4033011783: tensor([ 2,  4,  6,  8, 10]),
 5436207437: tensor([1, 1, 1, 1, 1]),
 58235070220: tensor([1, 2, 3, 4, 5])}

Now notice something. When we called x.send(bob) it returned a new object that we called x_ptr. This is our first pointer to a tensor. Pointers to tensors do NOT actually hold data themselves. Instead, they simply contain the metadata about a tensor (with data) stored on another machine. The purpose is to give us an intuitive API to tell the other machine to compute functions using this tensor. Let's take a look at the metadata that pointers contain.

In [6]:
x_ptr

(Wrapper)>[PointerTensor | me:1480744114 -> bob:58235070220]

Check out that metadata!

There are two main attributes specific to pointers:

x_ptr.location : bob, the location, a reference to the location that the pointer is pointing to
x_ptr.id_at_location : <random integer>, the id where the tensor is stored at location
They are printed in the format <id_at_location>@<location>

There are also other more generic attributes:

x_ptr.id : <random integer>, the id of oui pointer tensor, it was allocated randomly
x_ptr.owner : me, the worker which owns the pointer tensor, here it's the local worker, named "me"

In [7]:
x_ptr.location




<VirtualWorker id:bob #tensors:3>

In [8]:
bob

<VirtualWorker id:bob #tensors:3>

In [9]:
bob== x_ptr.location

True

In [10]:
x_ptr.id_at_location

58235070220

In [11]:

x_ptr.owner

<VirtualWorker id:me #tensors:0>

You may wonder why the local worker which owns the pointer is also a VirtualWorker, although we didn't create it. Fun fact, just like we had a VirtualWorker object for Bob, we (by default) always have one for us as well. This worker is automatically created when we called hook = sy.TorchHook() and so you don't usually have to create it yourself.

In [12]:
me= sy.local_worker
me


<VirtualWorker id:me #tensors:0>

In [13]:
me == x_ptr.owner

True

And finally, just like we can call .send() on a tensor, we can call .get() on a pointer to a tensor to get it back!!!



In [14]:
x_ptr

(Wrapper)>[PointerTensor | me:1480744114 -> bob:58235070220]

In [15]:
x_ptr.get()

tensor([1, 2, 3, 4, 5])

In [16]:
y_ptr


(Wrapper)>[PointerTensor | me:83054688871 -> bob:5436207437]

In [17]:
y_ptr.get()

tensor([1, 1, 1, 1, 1])

In [18]:
z.get()

tensor([ 2,  4,  6,  8, 10])

In [19]:
bob._objects

{}

And as you can see... Bob no longer has the tensors anymore!!! They've moved back to our machine!

###Using Tensor Pointers
So, sending and receiving tensors from Bob is great, but this is hardly Deep Learning! We want to be able to perform tensor operations on remote tensors. Fortunately, tensor pointers make this quite easy! You can just use pointers like you would normal tensors!

In [20]:
x = torch.tensor([1,2,3,4,5]).send(bob)
y = torch.tensor([1,1,1,1,1]).send(bob)

z = x + y
z

(Wrapper)>[PointerTensor | me:81564910974 -> bob:81564910974]

And voilà!

Behind the scenes, something very powerful happened. Instead of x and y computing an addition locally, a command was serialized and sent to Bob, who performed the computation, created a tensor z, and then returned the pointer to z back to us!

If we call .get() on the pointer, we will then receive the result back to our machine!

In [21]:
z.get()

tensor([2, 3, 4, 5, 6])

###Torch Functions


This API has been extended to all of Torch's operations!!!

In [22]:
x

(Wrapper)>[PointerTensor | me:53443883314 -> bob:22174642389]

In [23]:
y

(Wrapper)>[PointerTensor | me:76221350567 -> bob:36008129613]

In [24]:
z=torch.add(x,y)
z


(Wrapper)>[PointerTensor | me:79838982842 -> bob:79838982842]

In [25]:
z.get()


tensor([2, 3, 4, 5, 6])

###Variables (including backpropagation!)

In [26]:
x = torch.tensor([1,2,3,4,5.], requires_grad=True).send(bob)
y = torch.tensor([1,1,1,1,1.], requires_grad=True).send(bob)

z = (x + y).sum()
z.backward()
x=x.get()
x

tensor([1., 2., 3., 4., 5.], requires_grad=True)

In [27]:
x.grad

tensor([1., 1., 1., 1., 1.])

So as you can see, the API is really quite flexible and capable of performing nearly any operation you would normally perform in Torch on remote data. This lays the groundwork for our more advanced privacy preserving protocols such as Federated Learning, Secure Multi-Party Computation, and Differential Privacy !



##Introduction To federated Learning


Federated Learning is an alternative approach to machine learn-
ing where data is not collected. In a nutshell, the parts of the algorithms that touch the data are moved to the users’ computers. Users collaboratively help to train a model by using their locally available data to compute model improvements. Instead of sharing their data, users then send only
these abstract improvements back to the server.
This approach is much more privacy-friendly and flexible. Applications
on mobile phones provide examples where this is especially evident. Users
generate vast amounts of data through interaction with the device. This
data is often deeply private in nature and should not be shared completely
with a server. Federated Learning still allows training a common model
using all this data, without necessarily sacrificing computational power or
missing out on smarter algorithms. (Img:- Google AI blog). A primary example how federated learning works.


![alt text](https://1.bp.blogspot.com/-K65Ed68KGXk/WOa9jaRWC6I/AAAAAAAABsM/gglycD_anuQSp-i67fxER1FOlVTulvV2gCLcB/s1600/FederatedLearning_FinalFiles_Flow%2BChart1.png)

##Summary


In this blog post we have learnt how to use the basic blocks of privacy- sending and receiving tensors and other torch functions . As mentioned before, this lays the groundwork for more advanced privacy, such techniques such as federated learning, which we have just been introduced to. In the next blog post we will discuss federated learning in details.


More reading material and some of my sources

1. Google API blog post https://ai.googleblog.com/2017/04/federated-learning-collaborative.html
2. Andrew Trask  https://www.openmined.org/
3. Siraj Raval.