## Multiparty XGBoost with Centralized Training
In this exercise, we'll demonstrate a workflow in which each party has its own data and sends a copy of its data to the central server. Therefore, all the training data is sent over the network to the central server, who collects it and locally trains a model on all the data. The central server will then broadcast the trained model back to the parties, who will load the model and test it on their local test datasets. 

![title](img/exercise2.png)


We will also measure the number of bytes sent over the network to show the large bandwidth needed for this workflow. 
This shows the benefits of using as much data as possible to make the model more robust.

Import the necessary libraries

In [None]:
import xgboost as xgb
import numpy as np
import pandas as pd
from sklearn.metrics import mean_absolute_error, mean_squared_error

Ensure that you've properly set up SSH credentials in exercise 0. scp the training data you used in exercise 1 to the central server. Note how many bytes are transferred over the network.

In [None]:
# If you are a worker, run this cell
# Make sure you use the scp the training data you used in exercise 1
# The training data is located at /data/training_data_1.csv
!scp -v -P 5522 -o StrictHostKeyChecking=no /data/training_data_1.csv <central server ip>:~/shared_data/

If you are the central server, load all the data that has been sent to your machine. For example, if three other parties sent you data, make 4 calls to `read_csv()`: one for your own data and three for the other parties' data.

In [None]:
# TODO: load in all the training data that the parties have sent
# The data should've been sent to the ~/shared_data directory

Concatenate all the data in preparation for training

In [None]:
aggregated_training_data = pd.concat([master_training_data, p1_training_data])
aggregated_training_data.shape

In [None]:
# TODO: Split the aggregated training data into features and labels

In [None]:
# TODO: fit a model to the aggregated training data
multiparty_model = xgb.XGBRegressor()

Save the trained model and send it to all parties in the federation

In [None]:
multiparty_model.save_model("multiparty_model.model")

In [None]:
# If you're the central server, run this cell as many times as needed to send the saved model
# to all parties in the federation
!scp -v -P 5522 -o StrictHostKeyChecking=no multiparty_model.model <party_ip>:~

In [None]:
# If you're not the central server, ensure that you received the model and load it in
multiparty_model = xgb.XGBRegressor()
multiparty_model.load_model("multiparty_model.model")

In [None]:
# TODO: evaluate the model on your local test data

Discuss the results with other members of your federation. How did the centrally trained model perform on your local test data compared with the locally trained model? Did adding more data help?