In [None]:
##! IGNORE THIS if running on Google Colab
%load_ext notexbook

%texify

<img src="https://colab.research.google.com/img/colab_favicon_256px.png" width="5%" class="badges" />

In [None]:
# UNCOMMENT THIS ONLY if running on Google Colab

# !pip install syft==0.5.1
# !pip install protobuf==3.20

**ORIGINAL NOTEBOOK** [here](https://github.com/OpenMined/courses/tree/foundations-of-private-computation/federated-learning/duet_iris_classifier) from the PrivateAI Series

## Part 1: Join the Duet Server

In [None]:
import syft as sy

SERVER_ID = "" # paste server ID here

duet = sy.duet(SERVER_ID)
# Option to replace with the following if RUNNING locally
# duet = sy.join_duet(loopback=True)

#### <img src="https://github.com/OpenMined/design-assets/raw/master/logos/OM/mark-primary-light.png" alt="he-black-box" width="5%" class="maxw5"/> Now STOP & Run the Data Owner notebook.

## Part 2: Search for Available Data


In [None]:
# The data scientist can check the list of searchable data in Data Owner's duet store
duet.store.pandas

Data Scientist wants to use the iris dataset. (S)He needs a pointer to the data and
a pointer to the target for prediction.

In [None]:
data_ptr = duet.store["iris-data"]
target_ptr = duet.store["iris-target"]

`data_ptr` is a reference to the iris dataset remotely available on data owner's server.

`target_ptr` is a reference to the iris dataset LABELS remotely available on data owner's server

In [None]:
print(data_ptr)
print(target_ptr)

## Part 3: Perform Logistic Regression on Iris dataset
Now the data scientist can perform machine learning on the data that is in the Data Owner's duet server, without the owner having to share his/her data.

#### Basic analysis

First the data scientist needs to know some basic information about the dataset.
1. The length of the dataset
2. The input dimension
3. The output dimension

These information have to be explicitly shared by the Data Owner. Let's try to find them in the data description.

In [None]:
print(duet.store.pandas["Description"][0])
print()
print(duet.store.pandas["Description"][1])

### Train model

In [None]:
import torch

In [None]:
in_dim = 4
out_dim = 3
n_samples = 150

First, let's create our model for `Logistic Regression`. 

The model will be a `SyNet` - which is very similar to a standard `torch.nn.Module`.

- The main difference is that here we inherit from `sy.Module` instead of `nn.Module`. 
- We also need to pass in a variable called `torch_ref` which we will use internally for any calls that you would normally make to `torch`.

In [None]:
class SyNet(sy.Module):
    def __init__(self, torch_ref):
        super(SyNet, self).__init__(torch_ref=torch_ref)
        self.layer1 = self.torch_ref.nn.Linear(in_dim, 20)
        self.layer2 = self.torch_ref.nn.Linear(20, 30)
        self.out = self.torch_ref.nn.Linear(30, out_dim)

    def forward(self, x):
        x = self.torch_ref.nn.functional.relu(self.layer1(x))
        x = self.torch_ref.nn.functional.relu(self.layer2(x))
        output = self.torch_ref.nn.functional.log_softmax(self.out(x), dim=1)
        return output

Now we can create a local model by passing our local copy of torch.

In [None]:
local_model = SyNet(torch)

Now we will send the local copy of the model to our partner's duet server.

In [None]:
remote_model = local_model.send(duet)  # send the model to the Data Owner for Remote Computation

Let's create an alias for our partner’s torch called `remote_torch` so we can refer to the local torch as `torch` and any operation we want to do remotely as `remote_torch`. 

Remember, the return values from `remote_torch` are **Pointers**, not the real objects. 

They mostly act the same when using them with other Pointers but they cannot be mixed with local torch objects.

In [None]:
remote_torch = duet.torch

We will get a pointer to our remote model parameters. 

Then we will set our optimizer. 

Here, we will be using `Adam optimizer`:
- `params` is a pointer to the list of parameters. 
- `optim` is a reference to the Adam optimizer which can be used to optimize the remote model.

In [None]:
params = remote_model.parameters()
optim = remote_torch.optim.Adam(params=params, lr=0.01)
print("params:", params)
print("optim:", optim)

Now we will create our `train` function. 

It will take few parameters, like the `remote_model`, `torch_ref`, `optim` and `data_ptr` and `target_ptr`.

In [None]:
from tqdm.notebook import tqdm

def train(iterations, model, torch_ref, optim, data_ptr, target_ptr):

    losses = []

    for i in tqdm(range(iterations), desc="Epochs: "):

        optim.zero_grad()

        output = model(data_ptr)

        # nll_loss = negative log-liklihood loss
        loss = torch_ref.nn.functional.nll_loss(output, target_ptr.long())

        loss_item = loss.item()

        loss_value = loss_item.get(
            reason="To evaluate training progress", request_block=True, timeout_secs=5
        )

        if i % 10 == 0:
            print("Epoch", i, "loss", loss_value)

        losses.append(loss_value)

        loss.backward()

        optim.step()

    return losses

In [None]:
iteration = 50
losses = train(iteration, remote_model, remote_torch, optim, data_ptr, target_ptr)

In [None]:
import matplotlib.pyplot as plt

In [None]:
plt.plot(range(iteration), losses)
plt.ylabel("Loss")
plt.xlabel("iteration")

### Download model

In [None]:
def get_local_model(model):
    if not model.is_local:
        local_model = model.get(
            request_block=True,
            reason="To run test and inference locally",
            timeout_secs=5,
        )
    else:
        local_model = model

    return local_model


local_model = get_local_model(remote_model)

### Test on local data

In [None]:
import torch
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score

In [None]:
SEED = 12345

from sklearn.model_selection import train_test_split
from sklearn import datasets


iris = datasets.load_iris()
X, y = iris.data, iris.target
_, X_test, _, y_test = train_test_split(X, y, random_state=SEED, test_size=0.2)

In [None]:
X_test = torch.FloatTensor(np.array(X_test))
y_test = torch.LongTensor(np.array(y_test))

In [None]:
preds = []
with torch.no_grad():
    for i in range(len(X_test)):
        sample = X_test[i]
        y_hat = local_model(sample.unsqueeze(0))
        pred = y_hat.argmax().item()
        print(f"Prediction: {pred} Ground Truth: {y_test[i]}")
        preds.append(pred)

In [None]:
acc = accuracy_score(y_test, preds)
print("Overall test accuracy", acc * 100)