# MC<sup>2</sup> : Multiparty Collaboration & Coopetition

MC<sup>2</sup> contains a series of subprojects in the RISE Lab, all pertaining to <strong><u>M</u></strong>ultiparty <strong><u>C</u></strong>ollaboration and <strong><u>C</u></strong>oopetition. 
The particular project we'll be demonstrating today is Federated XGBoost, an extension of the <a href="https://xgboost.readthedocs.io/en/latest/">XGBoost</a> framework to the federated setting. 

The federated setting allows multiple parties train a model over their collective data with the help of a centralized *aggregator* server. Instead of sending their entire data to the aggregator, the parties only send summary statistics. This provides two benefits: (a) it conserves bandwidth, and (b) it limits the amount of information leaked to other parties in the federation.

You can find the codebase for the MC<sup>2</sup> project here: https://github.com/mc2-project/mc2

## Dataset
### Allstate Claim Prediction Dataset
This dataset is used in the original XGBoost paper and is taken from a Kaggle competition.
The goal of the competition is to predict insurance claim payments given multiple datapoints about the insured vehicle.
Further information can be found [here](https://www.kaggle.com/c/ClaimPredictionChallenge).
We propose a usecase where some insurance company has multiple departments specializing in different makes of cars, and these departments are unable to share data between them.
As such, a sample of the original Allstate Claim Prediction dataset is partitioned here into four groups, each of which represents one of these departments.
In the following exercises, you will represent one such department, and your task will be to use the information provided to predict whether new insurance claims will be greater than 0, or equal to 0. (binary classification)
You will then collaborate with the other departments, using our federated distributed XGBoost to collectively train a model without revealing all of your departments' data to one another.

## Setup

During the setup phase, the parties first form a federation.

To simulate a federation, please get into groups of 3 or 4. Choose one member of the team to act as the centralized aggregator. 
<!--
Assign all other members of the federation a party ID from 1 to 3.
Create a Slack channel or group message and add all members of your federation.
-->

### 1. Public key infrastructure (PKI) ###

We have created a mock PKI service for the purpose of this tutorial. In this section of the tutorial, you will upload your public keys to the PKI service, and obtain the public key of the other members of the federation. 

To use our mock service, you can use the following API:

```python 
class PKI:
     # No input arguments; connects to the PKI service
    def __init__(self):
        
    # uploads the user's IP and public key to the PKI service; returns None
    def upload(self, username, ip_address, public_key):
        
    # Retrieves the IP and public key for user <username>
    def lookup(self, username):
        
    # Retrieves the public key for user <username> and saves it
    def save_key(self, username):
        
```

**Exercise**: Upload your username, IP address, and public key to the lookup service. Verify that your information has been uploaded using `lookup()`.

In [None]:
from Utils import PKI
from requests import get
from os.path import expanduser

# TODO: Add your username 
# username = ""

# Get your public IP address
IP = get('https://api.ipify.org').text

# Get your public key
pubkey = open(expanduser("~") + '/.ssh/id_rsa.pub').read().strip()

# TODO: Connect to the PKI service and upload your IP address and public key
# ...

# TODO: Verify that your information has been uploaded by using the lookup() API 
# ...

### 2. Federation ###

You will now use the Federation API to create and join a federation of participants. One participant will act as the centralized aggregator for the purposes of this tutorial. Before proceeding, confer with the other members of your federation and elect the aggregator.

The federation API for the aggregator is as follows.

```python 
class FederationAggregator:
     # Initialize a Federation instance using your username
    def __init__(self, username):

    # Creates a federation with self.username as the aggregator; 
    # <members> is a list of usernames who may participate in the federation
    def create_federation(self, members):

    # Check to see if all the members of the federation have joined
    def check_federation(self):
```


The federation API for a participant is as follows.

```python 
class FederationMember:
     # Initialize a Federation instance using your username
    def __init__(self, username):

    # Join the federation created by user <aggregator_username> as a participant
    def join_federation(self, aggregator_username):
        
    # Check to see if all the members of the federation have joined
    def check_federation(self):
```



**Exercise**: If you've been elected the central aggregator, create a federation.
Otherwise, join the federation created by the aggregator.

In [None]:
from Utils import FederationAggregator

# Initialize a federation
fed = FederationAggregator(username)

# TODO: Add the usernames of the federation members to the list
# members = [""]

# TODO: Create a federation 
# ...

Join a federation as a member. Only do this step if you're not the central server.

In [1]:
from Utils import FederationMember

# Initialize a federation
fed = FederationMember(username)

# TODO: Add the username of the central aggregator
# aggregator = ""

# TODO: Join the federation created by the aggregator
# ...

Check if everyone has joined the federation

In [4]:
fed.check_federation()

No federation to check. Please create or join a federation first.


**Exercise**: If you're the central server, save the public keys of all the federation members. Otherwise, save the aggregator's public key. Use the PKI API.

In [None]:
# TODO: If you're the aggregator, add the usernames of all members of your federation to the members list
# members = [""]

# TODO: If you're the worker, add the username of the aggregator to the members list
# members = [aggregator]

# TODO: Save the keys of all users in the <members> list
# ...

## Table of Contents
Now that we've finished with setup, it's time to start the exercises. This tutorial consists of three exercises:
1. [Single Party XGBoost on Data Subset](Exercise 1.ipynb)
2. [Multiparty XGBoost with Centralized Training](Exercise 2.ipynb)
3. [Multiparty XGBoost with Federated Training](Exercise 3.ipynb)

Let's start with [Exercise 1](Exercise 1.ipynb).