## Multiparty XGBoost with Federated Training
We will now discuss running XGBoost in the federated setting. In this setting, in each iteration each party sends a summary of the update made to its model. The central server then aggregates these updates, applies the aggregated update to its model, and broadcasts the new model to all parties. The parties then train locally with the new model and sends the update to the central server.

![title](img/exercise2.png)

In our project, all this is abstracted away. The central server simply starts the training, and everything else is performed automatically.

Import some helper functions.

In [None]:
import pandas as pd
import xgboost as xgb
import subprocess
from Utils import start_job
import warnings
warnings.filterwarnings('ignore')

### Edit hosts.config
The `hosts.config` file should contain the IPs and ports of all parties in the federation. 
Retrieve the IPs of all members in the federation from the PKI and write it to the hosts.config file. We'll need for this communication during federated training.

In [None]:
# TODO: add your username here. Make sure this is the same username you used 
# to create the federation
username = # ...

In [None]:
# Get the IPs of all members in your federation and add it to hosts.config
from Utils import PKI, Federation

pki = PKI()
fed = Federation()

members = fed.get_federation_members(username)

with open("hosts.config", "w+") as hosts:
    for member in members:
        IP, key = pki.lookup(member)
        
        # Write the member's IP address and port 5522 to hosts.config
        hosts.write(IP +":5522\n")

### Training Script
We will now examine the script that will be run for federated training. We've written the training script for this part for you, but have included a copy of it here to give you an idea of our Federated XGBoost API. For the full code look at the `FederatedXGBoost` class in `Utils.py`

```python
from Utils import FederatedXGBoost

# Instantiate a FederatedXGBoost instance
fxgb = FederatedXGBoost()

# Get number of federating parties
print("Number of parties in federation: ", fxgb.get_num_parties())

# Load training data
training_data_path = "/data/hb/hb_train.csv"
fxgb.load_training_data(training_data_path)

# Train a model
params = {'max_depth': 3, "objective": "binary:logistic", "eval_metric": ["error", "auc"]}
num_rounds = 100
fxgb.train(params, num_rounds)

# Save the model
fxgb.save_model("ex2_model.model")

# Shutdown
fxgb.shutdown()
```

### Start Job
After modifying the script, we can start our job! We can use the `start_job()` helper function to do so.
`start_job(num_parties)` takes in one parameter:
* num_parties: The number of parties in the federation. This should be the same as the number of IPs added to hosts.config

The training process should take less than one minute.

In [None]:
print("Training underway")

start_job(len(members))

print("Training finished")

## Model Evaluation
We'll now use the model we trained in the previous step to make predictions on our test data. Load in the federated model, preprocess your test data, and evaluate the model with the test data.

* Test data for the Higgs boson dataset is at `/data/hb/hb_test.csv`

In [None]:
model_path = "ex2_model.model"
multiparty_model = xgb.Booster()
multiparty_model.load_model(model_path)

In [None]:
test_data_path = "/data/hb/hb_test.csv"
test_data = pd.read_csv(test_data_path, sep=",", header=None)
y_test = test_data.iloc[:, 0]
x_test = test_data.iloc[:, 1:]
test_data = xgb.DMatrix(x_test, label=y_test)
x_test.head()

In [None]:
error_str = multiparty_model.eval(test_data)
_, error, auc = error_str.split("\t")

# Some string parsing for pretty printing
error = float(error.split("error:", 1)[1])
accuracy = 1 - error
accuracy_percent = str(accuracy * 100)[:5] + "%"
print("Your model achieved %s accuracy " % accuracy_percent)

auc = float(auc.split("auc:", 1)[1])
rounded_auc = str(auc)[:5]
print("Your model achieved an AUC of %s " % rounded_auc)