## Multiparty XGBoost with Federated Training
We will now discuss running XGBoost in the federated setting. Unlike the previous exercise, in the federated setting all data stays on its respective machine. This eliminates the need to transfer over the network which incurs high overhead and requires significant bandwidth. Instead, in the federated setting in each iteration each party sends a summary of the update made to its model. The central server then aggregates these updates, applies the aggregated update to its model, and broadcasts the new model to all parties. The parties then train locally with the new model and sends the update to the central server.

![title](img/exercise3.png)

In our project, all this is abstracted away. The central server simply starts the training, and everything else is performed automatically.

Import some helper functions.

In [1]:
import pandas as pd
import subprocess
from start_job import start_job
from network_analysis import network_analysis

### Edit hosts.config (central server)
The `hosts.config` file should contain the IPs and ports of all workers in the federation. 
Retrieve the IPs of all members in the federation from the PKI and write it to the hosts.config file.

**Only the central server has to do this step.**

In [None]:
# Get the IPs of all members in your federation and add it to hosts.config
import importlib
import Federation
importlib.reload(Federation)

members = ["chester", "rishabh", "wenting"]
pki = Federation.PKI()
with open("hosts.config", "w+") as hosts:
    for member in members:
        IP, key = pki.lookup(member)
        
        # Write the member's IP address and port 5522 to hosts.config
        hosts.write(IP +":5522\n")
    

### Set Variables For Network Analysis
We'll walk you through inspecting packets during this tutorial as well to make sure that the network topology is indeed federated. For each variable below, fill in the corresponding IP (don't worry about the ordering of the worker nodes).

In [2]:
master = '0'
worker_1 = '1'
worker_2 = '2'
worker_3 = '3'

### Modifying the Training Script (central server)
We will now modify the script that will be run for federated training. Load it in by running the following cell. The contents of the script should appear in the cell. 

The central server controls the training. If you're the central server, you can play with the `params` argument passed into the `train()` function. A list of possible parameters and their descriptions can be found [here](https://xgboost.readthedocs.io/en/latest/parameter.html).

**Only the central server has to do this step.**

In [None]:
%load train_model.py

### Using tcpdump to Capture Packets
We will be using `tcpdump` to monitor the network traffic during training. The cell below spawns a process that records all incoming network traffic.

In [None]:
tcpdump_cmd = 'tcpdump -ni en0 -s0 -w capture.pcap'
tcpdump_process = subprocess.Popen(tcpdump_cmd, stdout=subprocess.PIPE, shell=True)

### Start Job (central server)
After modifying the script, we can start our job! We can use the `start_job()` helper function to do so.
`start_job(num_parties, memory, script_path)` takes in three parameters:
* num_parties: The number of parties in the federation. This should be the same as the number of IPs added to hosts.config
* memory: The amount of memory to use for this job on each party's machine
* script_path: The absolute path to the script we want to run

**Only run this cell if you're the central server.**

In [None]:
start_job(2, 3, "/home/$USER/train_model.py")

Kill the tcpdump process once training has finished as we no longer need to monitor network traffic

In [None]:
tcpdump_process.terminate()

## Network Analysis
In the federated setting, parties don't communicate with each other -- they only communicate with the central server. We'll monitor network traffic in this section to show this isolated communication, and also to show that the communication of updates (as in the federated setting) requires less bandwidth than the transfer of whole raw datasets (as in the centralized training scenario from Exercise 2). 

Running the cell below does some conversion and preprocessing in pandas of the `.pcap` created to output a table of transmissions. Understanding the code behind it isn't very relevant to our tutorial, but feel free to take a look into `network_analysis.py` if you're curious.

In [3]:
counts = network_analysis(master, worker_1, worker_2, worker_3)
counts

Unnamed: 0_level_0,Number of Packets,Total Bytes Transmitted
Transmission,Unnamed: 1_level_1,Unnamed: 2_level_1
104.244.42.66 -> 10.142.39.202,136,136183
104.244.42.1 -> 10.142.39.202,60,50606
10.142.39.202 -> 104.244.42.66,129,22153
10.142.39.202 -> 104.244.42.1,64,11541
104.244.43.131 -> 10.142.39.202,14,10569
10.142.33.29 -> 224.0.0.251,7,7922
104.244.42.5 -> 10.142.39.202,7,4018
10.142.35.37 -> 224.0.0.251,3,3256
10.142.39.202 -> 104.244.43.131,18,3057
10.142.36.155 -> 224.0.0.251,2,2589


## Modifying the Evaluation Script
We'll now use the model we trained in the previous step to make predictions on our test data. Load in the test script like in the previous step. 

**Again, only the central server has to do these steps.**

In [None]:
%load test_model.py

In [None]:
start_job(2, 3, "/home/$USER/test_model.py")