# Federated Learning - MNIST Example

## Train a remote Deep Learning model
In this notebbok, we will show how to train a Federated Deep Learning with data hosted in Nodes.

We will consider that you are a Data Scientist and you do not know where data lives, you only have access to GridNetwork

## 0 - Previous setup

Components:

 - PyGrid Network      http://alice:7000
 - PyGrid Node Alice (http://bob:5000)
 - PyGrid Node Bob   (http://charlie:5001)

This tutorial assumes that these components are running in background. See [instructions](https://github.com/OpenMined/PyGrid/tree/dev/examples#how-to-run-this-tutorial) for more details.

### Import dependencies
Here we import core dependencies

In [None]:
import syft as sy
from syft.grid.public_grid import PublicGridNetwork

import torch as th

import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

import torchvision
from torchvision import datasets, transforms


### Syft and client configuration
Now we hook Torch and connect to the GridNetwork. This is the only sever you do not need to know node addresses (networks knows), but lets first define some useful parameters

In [None]:
grid_address = "http://network:7000"  # address
N_EPOCHS = 100  # number of epochs to train
N_TEST   = 10   # number of test

In [None]:
hook = sy.TorchHook(th)


# Connect direcly to grid nodes
my_grid = PublicGridNetwork(hook, grid_address)

## 1 - Define our Neural Network Arquitecture

Now we will define a Deep Learning Network, feel free to write your own model!

In [None]:
class Arguments():
    def __init__(self):
        self.test_batch_size = N_TEST
        self.epochs = N_EPOCHS
        self.lr = 0.01
        self.log_interval = 5
        self.device = th.device("cpu")
        
args = Arguments()

In [None]:
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 20, 5, 1)
        self.conv2 = nn.Conv2d(20, 50, 5, 1)
        self.fc1 = nn.Linear(4*4*50, 500)
        self.fc2 = nn.Linear(500, 10)

    def forward(self, x):
        x = F.relu(self.conv1(x))
        x = F.max_pool2d(x, 2, 2)
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2, 2)
        x = x.view(-1, 4*4*50)
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return F.log_softmax(x, dim=1)
    
model = Net()
model.to(args.device)

optimizer = optim.SGD(model.parameters(), lr=0.01)


## 2 - Search for remote data

Once we have defined our Deep Learning Network, we need some data to train... Thanks to PyGridNetwork this is very easy, you just need to search for your tags of interest.

Notice that _search()_ method  returns a pointer tensor, so we will work with those keeping real tensors hosted in Alice and Bob

In [None]:
data = my_grid.search("#X", "#mnist", "#dataset")  # images
target = my_grid.search("#Y", "#mnist", "#dataset")  # labels

data = list(data.values())  # returns a pointer
target = list(target.values())  # returns a pointer

If we print the tensors, we can check how the metadata we added before is included

In [None]:
print(data)
print(target)

## 3 - Train the model

Now we are ready to train. As you will see, this is very similar to standard pytorch sintax.

Let's first load test data in order to evaluate the model

In [None]:
transform = transforms.Compose([
                              transforms.ToTensor(),
                              transforms.Normalize((0.1307,), (0.3081,)),  #  mean and std 
                              ])
testset = datasets.MNIST('./dataset', download=False, train=False, transform=transform)
testloader = th.utils.data.DataLoader(testset, batch_size=args.test_batch_size, shuffle=True)

In [None]:
# epoch size
def epoch_total_size(data):
    total = 0
    for i in range(len(data)):
        for j in range(len(data[i])):
            total += data[i][j].shape[0]
            
    return total

In [None]:
def train(args):
    
    model.train()
    epoch_total = epoch_total_size(data)
    
    current_epoch_size = 0
    for i in range(len(data)):
        for j in range(len(data[i])):
            
            current_epoch_size += len(data[i][j])
            worker = data[i][j].location  # worker hosts data
            
            model.send(worker)  # send model to PyGridNode worker
            optimizer.zero_grad()  
            
            pred = model(data[i][j])
            loss = F.nll_loss(pred, target[i][j])
            loss.backward()
            
            optimizer.step()
            model.get()  # get back the model
            
            loss = loss.get()
            
        if epoch % args.log_interval == 0:

            print('Train Epoch: {} | With {} data |: [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                      epoch, worker.id, current_epoch_size, epoch_total,
                            100. *  current_epoch_size / epoch_total, loss.item()))



In [None]:
def test(args):
    
    if epoch % args.log_interval == 0:
    
        model.eval()
        test_loss = 0
        correct = 0
        with th.no_grad():
            for data, target in testloader:
                data, target = data.to(args.device), target.to(args.device)
                output = model(data)
                test_loss += F.nll_loss(output, target, reduction='sum').item() # sum up batch loss
                pred = output.argmax(1, keepdim=True) # get the index of the max log-probability 
                correct += pred.eq(target.view_as(pred)).sum().item()

        test_loss /= len(testloader.dataset)

        print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
            test_loss, correct, len(testloader.dataset),
            100. * correct / len(testloader.dataset)))

In [None]:
for epoch in range(N_EPOCHS):
    train(args)
    test(args)

Et voilà! Here you are, you have trained a model on remote data using Federated Learning!

# Congratulations!!! - Time to Join the Community!

Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement toward privacy preserving, decentralized ownership of AI and the AI supply chain (data), you can do so in the following ways!

### Star PyGrid on GitHub

The easiest way to help our community is just by starring the GitHub repos! This helps raise awareness of the cool tools we're building.

- [Star PyGrid](https://github.com/OpenMined/PyGrid)

### Join our Slack!

The best way to keep up to date on the latest advancements is to join our community! You can do so by filling out the form at [http://slack.openmined.org](http://slack.openmined.org)

### Join a Code Project!

The best way to contribute to our community is to become a code contributor! At any time you can go to PySyft GitHub Issues page and filter for "Projects". This will show you all the top level Tickets giving an overview of what projects you can join! If you don't want to join a project, but you would like to do a bit of coding, you can also look for more "one off" mini-projects by searching for GitHub issues marked "good first issue".

- [PySyft Projects](https://github.com/OpenMined/PySyft/issues?q=is%3Aopen+is%3Aissue+label%3AProject)
- [Good First Issue Tickets](https://github.com/OpenMined/PyGrid/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)

### Donate

If you don't have time to contribute to our codebase, but would still like to lend support, you can also become a Backer on our Open Collective. All donations go toward our web hosting and other community expenses such as hackathons and meetups!

[OpenMined's Open Collective Page](https://opencollective.com/openmined)