# Federated Learning - MNIST Example

## Populate remote GridNodes with labeled tensors
In this notebbok, we will show how to populate a GridNode with labeled data, so it will be used later (link to second part) by people interested in train models.

In particular, we will consider that two Data Owners (Alice & Bob) want to populate their nodes with some data from the well-known MNIST dataset.

## 0 - Previous setup

Components:

 - PyGrid Network      http://network:7000
 - PyGrid Node Alice (http://alice:5000)
 - PyGrid Node Bob   (http://bob:5001)

This tutorial assumes that these components are running in background. See [instructions](https://github.com/OpenMined/PyGrid/tree/dev/examples#how-to-run-this-tutorial) for more details.

### Import dependencies
Here we import core dependencies

In [1]:
import syft as sy
from syft.grid.clients.data_centric_fl_client import DataCentricFLClient  # websocket client. It sends commands to the node servers

import torch
import torchvision
from torchvision import datasets, transforms

import requests

### Syft and client configuration
Now we hook Torch and connect the clients to the servers

In [2]:
# address
alice_address = "http://alice:5000" 
bob_address   = "http://bob:5001"

In [3]:
hook = sy.TorchHook(torch)

# Connect direcly to grid nodes
compute_nodes = {}

compute_nodes["Alice"] = DataCentricFLClient(hook, alice_address)
compute_nodes["Bob"]   = DataCentricFLClient(hook, bob_address) 

# Check if they are connected
for key, value in compute_nodes.items(): 
    print("Is " + key + " connected?: " + str(value.ws.connected))

Is Alice connected?: True
Is Bob connected?: True


## 1 - Load dataset
Download (and load) the MNIST dataset

In [4]:
N_SAMPLES = 10000  # Number of samples
MNIST_PATH = './dataset'  # Path to save MNIST dataset

# Define a transformation.
transform = transforms.Compose([
                              transforms.ToTensor(),
                              transforms.Normalize((0.1307,), (0.3081,)),  #  mean and std 
                              ])

# Download and load MNIST dataset
trainset = datasets.MNIST(MNIST_PATH, download=True, train=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=N_SAMPLES, shuffle=True)

dataiter = iter(trainloader)
images_train_mnist, labels_train_mnist = dataiter.next()  # Train images and their labels

## 2 - Split dataset
We split our dataset ...

In [5]:
images_train_mnist = torch.split(images_train_mnist, int(len(images_train_mnist) / len(compute_nodes)), dim=0 ) #tuple of chunks (dataset / number of nodes)
labels_train_mnist   = torch.split(labels_train_mnist, int(len(labels_train_mnist) / len(compute_nodes)), dim=0 )  #tuple of chunks (labels / number of nodes)

... and we add tags to them so that we can search them later

In [6]:
for index, _ in enumerate(compute_nodes):
        
    images_train_mnist[index]\
        .tag("#X", "#mnist", "#dataset")\
        .describe("The input datapoints to the MNIST dataset.") 
    
    
    labels_train_mnist[index]\
        .tag("#Y", "#mnist", "#dataset") \
        .describe("The input labels to the MNIST dataset.")


## 3 - Sending our tensor to grid nodes

We can consider the previous steps as data preparation, i.e., in a more realistic scenario Alice and Bob would already have their data, so they just would need to load their tensors into their nodes.

In [7]:
for index, key in enumerate(compute_nodes):
    
    print("Sending data to", key)
    
    images_train_mnist[index].send(compute_nodes[key], garbage_collect_data=False)
    labels_train_mnist[index].send(compute_nodes[key], garbage_collect_data=False)

Sending data to Alice
Sending data to Bob


If everything is ok, tensors must be hosted in the nodes. GridNode have a specific endpoint to request what tensors are hosted. Let's check it!

In [8]:
print("Alice's tags: ", requests.get(alice_address + "/data-centric/dataset-tags").json())
print("Bob's tags: ",   requests.get(bob_address   + "/data-centric/dataset-tags").json())

Alice's tags:  ['#mnist', '#Y', '#dataset', '#X']
Bob's tags:  ['#Y', '#dataset', '#X', '#mnist']


**Now go ahead and continue with  [2nd part](02-FL-mnist-train-model.ipynb) where we will train a Federated Deep Learning model from scratch without having data!**

# Congratulations!!! - Time to Join the Community!

Congratulations on completing this notebook tutorial! If you enjoyed this and would like to join the movement toward privacy preserving, decentralized ownership of AI and the AI supply chain (data), you can do so in the following ways!

### Star PyGrid on GitHub

The easiest way to help our community is just by starring the GitHub repos! This helps raise awareness of the cool tools we're building.

- [Star PyGrid](https://github.com/OpenMined/PyGrid)

### Join our Slack!

The best way to keep up to date on the latest advancements is to join our community! You can do so by filling out the form at [http://slack.openmined.org](http://slack.openmined.org)

### Join a Code Project!

The best way to contribute to our community is to become a code contributor! At any time you can go to PySyft GitHub Issues page and filter for "Projects". This will show you all the top level Tickets giving an overview of what projects you can join! If you don't want to join a project, but you would like to do a bit of coding, you can also look for more "one off" mini-projects by searching for GitHub issues marked "good first issue".

- [PySyft Projects](https://github.com/OpenMined/PySyft/issues?q=is%3Aopen+is%3Aissue+label%3AProject)
- [Good First Issue Tickets](https://github.com/OpenMined/PyGrid/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)

### Donate

If you don't have time to contribute to our codebase, but would still like to lend support, you can also become a Backer on our Open Collective. All donations go toward our web hosting and other community expenses such as hackathons and meetups!

[OpenMined's Open Collective Page](https://opencollective.com/openmined)