# MC<sup>2</sup>
This tutorial demonstrates how to use [MC<sup>2</sup>](https://github.com/mc2-project/mc2) (<b>M</b>ultiparty <b>C</b>ollaboration and <b>C</b>ompetition), our platform that enables collaborating parties to jointly perform analytics and train machine learning models on their sensitive data without sharing the contents of the data. In particular, this tutorial focuses on a module of MC<sup>2</sup> that supports gradient boosted decision tree learning, [Secure XGBoost](https://github.com/mc2-project/secure-xgboost).

Secure XGBoost leverages secure enclaves, e.g., Intel SGX, to perform computation in a secure environment. Parties can send their encrypted data to an untrusted server hosting Secure XGBoost, which will then load the data into an enclave before decrypting it. **Since enclaves provide encrypted regions of memory, even the OS, hypervisor, and other (privileged) processes on the same machine won't be able to see the unencrypted data or intermediate results during computation.**

Secure XGBoost's architecture is shown below. Clients make requests to a central untrusted RPC orchestrator, which queues up requests and relays each request to each enclave server once all parties have made a particular request. Computation happens in a distributed manner across the enclave cluster.

![Secure XGBoost architecture](figures/sys-arch.png)

For this tutorial you will individually play the role of two different parties that want to collaborate without sharing the contents of their data. The two parties want to work together to train a model on their pooled data. Pooling the data to form a larger dataset makes the model much more robust.

In practice, there will exist a central enclave cluster controlled by no one member of the party, on which all computation will occur. For this tutorial, you will start the enclave server that enables clients to jointly orchestrate a training pipeline that will run inside an enclave. All parties will submit requests to execute the pipeline together.

MC<sup>2</sup> is open source and available on [GitHub](https://github.com/mc2-project/mc2).

## Mushroom Dataset
In this tutorial we'll be using the [Mushroom Dataset](https://archive.ics.uci.edu/ml/datasets/mushroom). This dataset contains 22 features, each of which represents a physical characteristic of a particular mushroom sample. Labels in this dataset are binary, and represent whether a mushroom sample is edible. As a result, the datasets lends itself quite nicely to a binary classification task.

<img src="figures/mushroom.png" width="100"/>

Imagine that you're part of a mushroom enthusiast group, and have stumbled across some mushroom samples whose edibility is unknown even after much examination. You could of course decide to try eating them, but eating even one poisonous mushroom would lead to the end of your mushroom collection career. Instead, you decide to team up with a few other mushroom enthuasists and combine your data to train a more robust mushroom edibility classification model. 

However, collecting all your mushroom samples was hard work -- you don't want other mushroom enthuasists to have access to your hard earned data, and consequently don't want to share your data in plaintext.

## Configuration
For this tutorial, you will play the role of two distinct mushroom enthusiasts who will be working together to collectively train a model on their aggregated data. Each enthusiast will 1) set up their user with MC<sup>2</sup>, 2) launch or await the enclave server, and 3) collaborate with the other enthusaist to jointly train a model on their pooled data.

From here, you will take on the roles of two mushroom enthusiasts, each with their own unique notebooks and workflow.
* [Click here to get started with mushroom enthusiast 1](./Exercise%201%20-%20User%201.ipynb)
* [Click here to get started with mushroom enthusiast 2](./Exercise%201%20-%20User%202.ipynb)