In this notebook we sketch the implementation of [Artemis](https://arxiv.org/pdf/2006.14591.pdf). In artemis, every device keeps a memory variable $h_{i}$ to track the gradient in order to insure the convergence, and exchanges the compressed version of the difference between the gradient and this value with the server.

In [2]:
import sys; sys.path.insert(0, '../..')

%matplotlib inline
%load_ext autoreload
%autoreload 2

import torch
from torch.utils.data import DataLoader, Subset
from torchvision.datasets import MNIST
import torchvision.transforms as transforms
from tqdm import tqdm

from src.utils.Constants import NB_EPOCH, NB_RUN, DEVICE_TYPE, LR, BATCH_SIZE, NB_WORKERS
from src.utils.Utilities import pickle_loader, pickle_saver
from src.deeplearning.Compression import QuantizationCompressor
from src.deeplearning.Utilities import multiple_run_descent, ResultsOfSeveralDescents
from src.deeplearning.Parameters import Parameters
from src.deeplearning.FederatedLearningAlgo import *


filename = "mnist"

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


# Compression 


We first need to implement a compression scheme, in this example we use quantization

# Dataset

We simply use iid MNIST here, can be changed with any dataset later, Note that you only need to specify a list of `torch.utils.data.DataLoaders`, each of them is a loader of the dataset of a given client.


In [3]:
trans = transforms.Compose([transforms.ToTensor(),
                                 transforms.Normalize((0.5,), (1.0,))
                                 ])

# TODO : To test, not sure what it does exactly.
dataset = Subset(MNIST(root="./data", download=True, transform=trans), range(300))

loaders = []
for _ in range(NB_WORKERS):
    loader = DataLoader(dataset, shuffle=True, batch_size=BATCH_SIZE)
    loaders.append(loader)


# Learner

We also implement a `Learner` class that will be used to train and evaluate a deep-learning model

# Models

We use a two layer neural network, can be replaced with whatever is needed.

## Artemis


This is the core class of this project 

Computing the best approximation of the model.

In [4]:
%%time
artemis = VanillaSGD(Parameters(), loaders)
for round_idx in range(100):
    artemis.step()
print("---> final loss:", artemis.losses[-1])
obj_min = artemis.losses[-1]
pickle_saver(obj_min, filename + "-obj_min")

---> final loss: 11.417719316482543
CPU times: user 2min 34s, sys: 2min 57s, total: 5min 32s
Wall time: 1min 1s


In [None]:
%%time
all_descent = {}
cpt = 1
for fl_algo in tqdm([VanillaSGD, Diana, Artemis]):
    multiple_sg_descent = multiple_run_descent(fl_algo=fl_algo, parameters=Parameters(), loaders=loaders)
    all_descent[cpt] = multiple_sg_descent
    cpt += 1
res = ResultsOfSeveralDescents(all_descent, NB_WORKERS)
pickle_saver(res, filename + "-descent")

  0%|          | 0/3 [00:00<?, ?it/s]

---> final loss: 11.51985592842102


 33%|███▎      | 1/3 [00:31<01:02, 31.05s/it]

---> final loss: 11.51985592842102
---> final loss: 11.516035795211792


 67%|██████▋   | 2/3 [01:04<00:31, 31.73s/it]

---> final loss: 11.516035795211792


In [None]:
from src.utils.ErrorPlotter import *
obj_min = pickle_loader(filename + "-obj_min")
res = pickle_loader(filename + "-descent")


plot_error_dist(res.get_loss(obj_min), res.names, res.nb_devices_for_the_run, res.n_dimensions,
                all_error=res.get_std(obj_min), x_legend="Number of passes on data\n(non-iid)") 

In [None]:
all_descent[3].losses

# TODO tasks

* Complete the #TODO tasks left inside the code
* Use only one learner instead of a list of learners for memory efficiency
* Check if the compression class is working (I copied it directly from choco-SGD repo without any tests)
* Test the evolution of the parameters on a "toy example", for example linear regression.