## Federated XGBoost Tutorial

### Edit hosts.config 
The `hosts.config` file should contain the IPs and ports of all workers in the federation. After loading in the `hosts.config` file, modify it to contain the IPs of your new friends! Then write the new addresses back to the file by adding a magic to the top of the cell:

`%%writefile hosts.config`

Make sure to delete the `# %load hosts.config` line from the cell before saving it. We'll be continually using the `%load` and `%%writefile` magics in this tutorial to edit files.

In [None]:
# %load hosts.config
35.167.132.178:22
34.222.205.126:22
34.222.177.218:22


### Modifying the Training/Eval Script
We will now modify the script that will be run for training and evaluation. Load it in by running the following cell. The contents of the script should appear in the cell.

In [2]:
%%writefile tutorial.py
from FederatedXGBoost import FederatedXGBoost

# Instantiate a FederatedXGBoost instance
fxgb = FederatedXGBoost()

# Get number of federating parties
print(fxgb.get_num_parties())

# Load training data
fxgb.load_training_data('/home/ubuntu/data/msd_training_data_split.csv')

# Train a model
params = {'max_depth': 3, 'min_child_weight': 1.0, 'lambda': 1.0}
num_rounds = 100
fxgb.train(params, num_rounds)

# Save the model
fxgb.save_model("tutorial_model.model")

# Load the test data
fxgb.load_test_data('/home/ubuntu/data/msd_test_data_split.csv')

# Evaluate the model
print(fxgb.eval())

# Get predictions
ypred = fxgb.predict()

# Shutdown
fxgb.shutdown()


Overwriting tutorial.py


### Start Job
After modifying the script, we can start our job! We use the `start_job.sh` script with the given options to do so.

The following flags must be specified when running the script.

`./start_job.sh`

* `-m | --worker-memory` string, specified as "<memory>g", e.g. 3g
    * Amount of memory on workers allocated to job
* `-p | --num-parties` integer
    * Number of parties in the federation
* `-d | --dir` string
    * Path to created subdirectory containing job script, e.g. `/home/ubuntu/mc2/federated-xgboost/risecamp`
* `-j | --job` string
    * Path to job script. This should be the parameter passed into the `--dir` option concatenated with the job script file name, e.g. `/home/ubuntu/mc2/federated-xgboost/risecamp/tutorial.py`

In [3]:
!./start_job.sh -p 3 -m 3g -d /home/ubuntu/mc2/federated-xgboost/risecamp/ -j /home/ubuntu/mc2/federated-xgboost/risecamp/tutorial.py

2019-09-18 08:26:24,877 INFO start listen on 172.31.41.140:9091
2019-09-18 08:26:24,885 INFO rsync /home/ubuntu/mc2/federated-xgboost/camp/ -> 35.167.132.178:/home/ubuntu/mc2/federated-xgboost/camp/
2019-09-18 08:26:24,885 INFO rsync /home/ubuntu/mc2/federated-xgboost/camp/ -> 34.222.205.126:/home/ubuntu/mc2/federated-xgboost/camp/
2019-09-18 08:26:24,885 INFO rsync /home/ubuntu/mc2/federated-xgboost/camp/ -> 34.222.177.218:/home/ubuntu/mc2/federated-xgboost/camp/
2019-09-18 08:26:27,375 INFO @tracker All of 3 nodes getting started
3
[08:26:43] Tree method is automatically selected to be 'approx' for distributed training.
[0]	eval-rmse:9.168171
  "because it will generate extra copies and increase memory consumption")
3
[08:26:43] Tree method is automatically selected to be 'approx' for distributed training.
[0]	eval-rmse:9.168171
  "because it will generate extra copies and increase memory consumption")
2019-09-18 08:27:29,154 INFO @tracker All nodes finishes job
3
[08:26:43] Tree met