# MC<sup>2</sup>
This tutorial demonstrates how to use [MC<sup>2</sup>](https://github.com/mc2-project/mc2) (<b>M</b>ultiparty <b>C</b>ollaboration and <b>C</b>ompetition), our platform that enables collaborating parties to jointly perform analytics and train machine learning models on their sensitive data without sharing the contents of the data. In particular, this tutorial focuses on a module of MC<sup>2</sup> that supports gradient boosted decision tree learning, [Secure XGBoost](https://github.com/mc2-project/secure-xgboost).

Secure XGBoost leverages secure enclaves, e.g. Intel SGX, to perform computation in a secure environment. Parties can send their encrypted data to an untrusted server hosting Secure XGBoost, which will then load the data into an enclave before decrypting it. Since enclaves provide encrypted regions of memory, even the OS, hypervisor, and other (privileged) processes on the same machine won't be able to see the unencrypted data or intermediate results during computation.

However, secure enclaves have been shown to be vulnerable to a whole host of side-channel attacks. To combat this, Secure XGBoost redesigns GBDT learning algorithms to be data-oblivious, i.e. to make memory accesses independent of input. The use of data-oblivous algorithms eliminates a large class of leakage that side-channel attacks rely on to extract information.

Secure XGBoost's architecture is shown below. Clients make requests to a central untrusted RPC orchestrator, which queues up requests and relays each request to each enclave server once all parties have made a particular request. Computation happens in a distributed manner across the enclave cluster.

![Secure XGBoost architecture](figures/sys-arch.png)

In this tutorial, we'll break everyone into small groups -- each group will be collaborating to jointly train a decision tree model. While in practice there will exist a central enclave server controlled by no one member of the party, in this tutorial one member per group will start the RPC enclave server that enables clients to jointly orchestrate a training pipeline that will run inside an enclave. All group members will submit requests to jointly execute the pipeline.

MC<sup>2</sup> is open source and available on [GitHub](https://github.com/mc2-project/mc2).

## 1. User Setup
We'll first need to set up your user by inputting a username, generating a keypair, generating a certificate, and generating a symmetric key.

In [None]:
import securexgboost as mc2
from Utils import *

# TODO: Enter your username below
username = "chief"
cwd = "/home/mc2/risecamp/mc2/tutorial/"

In [None]:
# Run this cell to generate a keypair and a certificate
generate_certificate(username)
PUB_KEY = "config/{0}.pem".format(username)
CERT_FILE = "config/{0}.crt".format(username)

In [None]:
# Run this cell to generate a symmetric key
KEY_FILE = "key.txt"
mc2.generate_client_key(KEY_FILE)

## 2. Data Encryption


Let's first take a look at our training data. Our data is located at `/home/mc2/risecamp/mc2/tutorial/data/1_2agaricus.txt.train` and is in LibSVM format.

In [None]:
!tail -n 10 /home/mc2/risecamp/mc2/tutorial/data/1_2agaricus.txt.train

Next, use the symmetric key generated above to encrypt your data. Your training data is located at path, and your test data is located at path.

In [None]:
# Run this cell to encrypt your training data
training_data = "data/1_2agaricus.txt.train"
enc_training_data = cwd + "data/{}_train.enc".format(username)

# Encrypt training data
mc2.encrypt_file(training_data, enc_training_data, KEY_FILE)

In [None]:
# Run this cell to encrypt your test data
test_data = "data/agaricus.txt.test"
enc_test_data = cwd + "data/{}_test.enc".format(username)

# Encrypt test data
mc2.encrypt_file(test_data, enc_test_data, KEY_FILE)

Once we've encrypted our data, let's take a look to confirm it's encrypted.

In [None]:
# TODO: fill in your username
!tail -n 10 /home/mc2/risecamp/mc2/tutorial/data/chief_train.enc

In [None]:
# Store variables for use in subsequent notebooks
%store username
%store PUB_KEY 
%store CERT_FILE 
%store KEY_FILE 
%store enc_training_data 
%store enc_test_data
%store cwd

Once you've finished this step, wait for breakout rooms to reconverge. 

## 3. Enclave server setup
While in practice there'll be an enclave server controlled by no one party, to complete this tutorial one party in the collaboration will have to act as both a party and the enclave server. Designate one person in the collaboration to control the enclave server.

If you've been designated as the enclave server, click [here](./rpc-orchestrator.ipynb) to go to the next notebook. You'll have to set up the enclave server before everyone can begin training. 

Otherwise, click [here](./tutorial-client.ipynb).