# Client Collaboration

[Previous notebook](./Exercise%202%20-%20User%202.ipynb)

Run the following cells to get everything set up in preparation for model training and prediction.

In [None]:
import securexgboost as mc2
from Utils import * 

server_ip = "127.0.0.1"
server_port = "50052" 
remote_addr = server_ip + ":" + server_port 

**TODO**: Enter the username you used in Exercise 1 for user2 below.

In [None]:
# TODO: Enter the username you used for Exercise 1
# below as a string. Ensure that your username doesn't
# contain any spaces.
username = # ...

data_dir = "/home/mc2/skycamp/mc2/tutorial/data/"
PUB_KEY = "config/{0}.pem".format(username)
CERT_FILE = "config/{0}.crt".format(username)
KEY_FILE = "{}_key.txt".format(username)
enc_training_data = data_dir + "{}_train.enc".format(username)
enc_test_data = data_dir + "{}_test.enc".format(username)

## 1. Data Transfer
Next, centralize the data by sending your training and test data to the enclave server. In practice, this would mean uploading your encrypted data to a remote server. However, because this tutorial is done all on one machine, each of the two mushroom enthusiasts that you're locally acting on behalf of will copy their data to a central directory.

The `transfer_data()` function below is a Python wrapper around the command line `scp` function.

In [None]:
transfer_data(enc_training_data, server_ip)
transfer_data(enc_test_data, server_ip)

## 2. Client Initialization and Authentication
Once both mushroom enthusiasts have uploaded their data, we can initialize our client. You'll need the usernames that you gave each mushroom enthusiast.

**TODO:** Fill in the usernames of all parties in your collaboration.

In [None]:
# Run this cell to initialize your client

# TODO: fill out `clients`

###########################################################
# `clients` is a Python list of the usernames you 
# specified in each Exercise 1 notebook, e.g.
#
# clients = ["alice", "bob"]
###########################################################
clients = # ...
mc2.init_client(user_name=username, client_list=clients, sym_key_file=KEY_FILE, 
                priv_key_file=PUB_KEY, cert_file=CERT_FILE, remote_addr=remote_addr)

Before we perform any computation, we want to attest that the remote enclave on the untrusted server has loaded the proper code. Secure XGBoost provides this functionality through the `attest()` API.

In [None]:
# Verify that the enclave has been set up correctly
mc2.attest()

## 3. Collaborative Training
Once we've authenticated the enclave, we can begin making requests to the enclave server. MC<sup>2</sup> enables users to make requests through a Python API, but will only execute requests once all users in the collaboration have submitted the same request. Consequently, users must submit the exact same requests in the exact order if they want to collaboratively compute. 

In particular, if you submit a request, the RPC orchestrator will queue up your request and only relay the request to the enclave server once all members of the collaboration have submitted the same request. Consequently, the execution of a cell containing a MC<sup>2</sup> API call will only finish once all parties have called the same function and the enclave server has returned from that function.

Let's first prepare for training by loading everyone's encrypted training data within the enclave. 

**TODO:** Fill in the paths to each party's training data. Your training data is at `<your_username>_train.enc`.

MC<sup>2</sup>'s `DMatrix()` function takes in a dictionary:

`{"username1": "user1.data", "username2": "user2.data"}`. 

In [None]:
# TODO: fill in usernames and training data paths

###########################################################################################
# For example, given that you used usernames `alice` and `bob` in each notebook, 
# the call would look like
#
# dtrain = mc2.DMatrix({"alice": "alice_train.enc", "bob": "bob_train.enc"})
###########################################################################################

dtrain = mc2.DMatrix({<****>: <****>,
                      <****>: <****>})

**Note:** The cell above will only finish execution once every party has run it. If all parties have not yet run the cell, the cell will be blocked.

Next, jointly train a model over all mushroom samples shared by your group!

In [None]:
# Set parameters
params = {
        "tree_method": "hist",
        "objective": "binary:logistic",
        "min_child_weight": "1",
        "gamma": "0.1",
        "max_depth": "3",
        "verbosity": "1" 
}

num_rounds = 20
booster = mc2.train(params, dtrain, num_rounds)

**Note:** The cell above will only finish execution once every party has run it. If all parties have not yet run the cell, the cell will be blocked.

## 4. Prediction Serving
Once we've jointly trained a model, we'll use the model to serve predictions on each party's test data. Each party should load its data into a separate object so that the model will output a set of predictions on only that party's test data. Predictions served by MC<sup>2</sup> are encrypted and can only be decrypted by the owner of the test data.

Remember that a request can only be executed if every party allows it. As a result, we'll need to submit a request to load test data for _every party_. 

**Make sure that each party in your collaboration loads parties' test data in the same order, i.e. `dtest1` is the same party's data, `dtest2` is the same party's data, etc across all notebooks.**

**TODO:** Fill in usernames and paths to test data for each user. Your test data is at `<your_username>_test.enc`.

In [None]:
# TODO: fill in usernames and test data paths
dtest1 = mc2.DMatrix({<****>: <****>})
dtest2 = mc2.DMatrix({<****>: <****>})

**Note:** The cell above will only finish execution once every party has run it (you have run this cell from each of the two Exercise 3 notebooks). If all parties have not yet run the cell, the cell will be blocked.

Once we've loaded each party's test data, we'll need MC<sup>2</sup> to serve predictions on each set of test data.

The `predict()` function returns two values: `(encrypted_predictions, num_predictions)`.

In [None]:
enc_preds1, num_preds1 = booster.predict(dtest1)
enc_preds2, num_preds2 = booster.predict(dtest2)

**Note:** The cell above will only finish execution once every party has run it. If all parties have not yet run the cell, the cell will be blocked.

At this point, each party has obtained a set of encrypted predictions. They now have a better idea of whether their mysterious mushroom samples are edible! Decrypt the predictions to reap the benefits of the collaboration and of being a member of the mushroom enthuasist group.

**TODO:** replace the arguments to `decrypt_predictions()` with the variables storing your test data's predictions. You can also try decrypting another party's predictions, but that'll fail because you don't have the proper key.

In [None]:
# Decrypt our predictions
preds = booster.decrypt_predictions(enc_preds2, num_preds2)
print(preds[:10])

Given the predictions, compare the predictions with the labels. The labels are the leftmost column in the test data. Run the following cell to look at the first ten records of the test data, which is stored locally.

In [None]:
!head -n 10 /home/mc2/skycamp/mc2/tutorial/data/agaricus2.txt.test

# Conclusion and Feedback

In this tutorial, you generated a key and used it to encrypt your sensitive training data. You and other members of the collaboration transferred your respective encrypted data to a central location. You then collectively trained a model on the aggregated data and used the model to serve predictions on your test data.

Thank you for attending our tutorial. If you have a few minutes, we'd really appreciate it if you could give us feedback and fill out this [form](https://forms.gle/mRZNqMHa9Xgcrg9F6).