# MC<sup>2</sup> : Multiparty Collaboration & Coopetition
MC<sup>2</sup> contains a series of subprojects in the RISE Lab, all pertaining to multiparty collaboration and coopetition. The particular project we'll be giving a tutorial on today is Federated XGBoost, an extension of the existing gradient boosting machine learning framework that enables use of the framework in the federated setting. This is particularly important for use cases that focus on low bandwidth training across multiple parties.

You can find the codebase here: https://github.com/mc2-project/mc2

## Dataset
### Allstate Claim Prediction Dataset
This dataset is used in the original XGBoost paper and is taken from a Kaggle competition.
The goal of the competition is to predict insurance claim payments given multiple datapoints about the insured vehicle.
Further information can be found [here](https://www.kaggle.com/c/ClaimPredictionChallenge).
We propose a usecase where some insurance company has multiple departments specializing in different makes of cars, and these departments are unable to share data between them.
As such, a sample of the original Allstate Claim Prediction dataset is partitioned here into four groups, each of which represents one of these departments.
In the following exercises, you will represent one such department, and your task will be to use the information provided to predict whether new insurance claims will be greater than 0, or equal to 0. (binary classification)
You will then collaborate with the other departments, using our federated distributed XGBoost to collectively train a model without revealing all of your departments' data to one another.

## Setup

To simulate a federation, please get into groups of 3 or 4. Choose one member of the team to act as the trusted central server. Assign all other members of the federation a party ID from 1 to 3.

Create a Slack channel or group message and add all members of your federation.

Obtain your IP address and your SSH public key. Have these on hand as we'll need them for latter parts of the tutorial.

In [None]:
# External IP
!dig +short myip.opendns.com @resolver1.opendns.com

In [None]:
# SSH public key
!cat ~/.ssh/id_rsa.pub

Add your username, IP address, and public key to the lookup service.

In [33]:
import importlib
importlib.reload(Federation)

import Federation

pki = Federation.PKI()
pki.upload_key("rishabh", "127.0.0.1")

Retrieve user information from the lookup service

In [40]:
import importlib
importlib.reload(Federation)

IP, key = pki.lookup("wenting")
print (IP, key)

No such user found
None None


Create a federation (you will be the master)

In [32]:
import Federation
import importlib
importlib.reload(Federation)

fed = Federation.Federation()
members = ["alice", "bob", "chris"]
fed.create_federation("mike", members)

unhashable type: 'dict'


Join a federation as a member

In [17]:
import Federation
import importlib
importlib.reload(Federation)

fed = Federation.Federation()
fed.join_federation("bob", "mike")

Check if everyone has joined the federation

In [24]:
import Federation
import importlib
importlib.reload(Federation)

fed = Federation.Federation()
check = fed.check_federation("mike")
print(check)

['alice', 'bob']
True


If you're the central server, add all parties' SSH public keys to your authorized_keys file. Otherwise, add the central server's SSH public key to your authorized_keys file

In [None]:
# TODO: add the usernames of all members of your federation to the members list.
members = []
with open("~/.ssh/authorized_keys", "a") as authorized_keys:
    for member in members:
        IP, key = pki.lookup(member)
        authorized_keys.write(key + "\n")

## Table of Contents
Now that we've finished with setup, it's time to start the exercises. This tutorial consists of three exercises:
1. [Single Party XGBoost on Data Subset](Exercise 1.ipynb)
2. [Multiparty XGBoost with Centralized Training](Exercise 2.ipynb)
3. [Multiparty XGBoost with Federated Training](Exercise 3.ipynb)

Let's start with [Exercise 1](Exercise 1.ipynb).