# Secure XGBoost Demo Notebook
This notebook provides an example of how one could use Secure XGBoost. The example consists of the following steps:
* Key generation
* Data encryption
* Enclave preparation (creation, attestation, key transfer)
* Data loading
* Training
* Prediction


This example simulates a scenario in which sensitive data on a client is outsourced to a remote machine with a hardware enclave for learning. The remote machine is completely untrusted, so nothing should be left in plaintext outside the enclave. In this scenario, the data would be encrypted and transferred, staying encrypted on the remote machine. The data would then be loaded into the enclave, decrypted, and used for learning. After computing a model and while performing inference, the predictions the model gives need to be encrypted inside the enclave before being transferred back to the client, where it can be decrypted.


In [1]:
import securexgboost as xgb
import os
from client import *

In [2]:
HOME_DIR = os.getcwd() + "/../../"

# Define OE flags
OE_ENCLAVE_FLAG_DEBUG = 1
OE_ENCLAVE_FLAG_SIMULATE = 2

## Key Generation
Generate a key to be used for encryption.

In [3]:
KEY_FILE = "key.txt"

# Generate a key you will be using for encryption
generate_client_key(KEY_FILE)

Generating client key...


## Data Encryption
Use the key generated above to encrypt our data.

In [4]:
# TODO: Should we set path variable for the encryption/decrypton python files

training_data = HOME_DIR + "demo/data/agaricus.txt.train"
enc_training_data = "train.enc"

# Encrypt training data
encrypt_file(training_data, enc_training_data, KEY_FILE)

Encrypting file /home/xgb/secure-xgboost/demo/enclave/../../demo/data/agaricus.txt.train
0


In [5]:
test_data = HOME_DIR + "demo/data/agaricus.txt.test"
enc_test_data = "test.enc"

# Encrypt test data
encrypt_file(test_data, enc_test_data, KEY_FILE)

Encrypting file /home/xgb/secure-xgboost/demo/enclave/../../demo/data/agaricus.txt.test
0


## Prepare Enclave
We'll need to create an enclave, authenticate the enclave, and lastly give the enclave the key we used to encrypt the data.

In [9]:
# Create an enclave
enclave = xgb.Enclave(HOME_DIR + "enclave/build/xgboost_enclave.signed", flags=(OE_ENCLAVE_FLAG_DEBUG))

In [10]:
# Remote Attestation
enclave.get_remote_report_with_pubkey()
enclave.verify_remote_report_and_set_pubkey()

In [None]:
# Add key to enclave


## Load Data
Load the encrypted data into a `DMatrix`. 

In [11]:
# Create training matrix
dtrain = xgb.DMatrix("train.enc", encrypted=True)

In [12]:
# Create test matrix
dtest = xgb.DMatrix("test.enc", encrypted=True)

## Perform Training

In [9]:
# Set parameters
params = {
        "tree_method": "hist",
        "n_gpus": "0",
        "objective": "binary:logistic",
        "min_child_weight": "1",
        "gamma": "0.1",
        "max_depth": "3",
        "verbosity": "3" 
}

In [10]:
# Train
num_rounds = 10
booster = xgb.train(params, dtrain, num_rounds, evals=[(dtrain, "train"), (dtest, "test")])

Tree finished
[0]	train-error:0.014433	test-error:0.016139
Tree finished
[1]	train-error:0.014433	test-error:0.016139
Tree finished
[2]	train-error:0.014433	test-error:0.016139
Tree finished
[3]	train-error:0.008598	test-error:0.009932
Tree finished
[4]	train-error:0.001228	test-error:0.000000
Tree finished
[5]	train-error:0.001228	test-error:0.000000
Tree finished
[6]	train-error:0.001228	test-error:0.000000
Tree finished
[7]	train-error:0.001228	test-error:0.000000
Tree finished
[8]	train-error:0.001228	test-error:0.000000
Tree finished
[9]	train-error:0.001228	test-error:0.000000


## Predict
Our `predict()` function yields predictions in an encrypted manner. The buffer that it returns will need to be decrypted using the same key that the original data was encrypted with.

In [11]:
# Get Encrypted Predictions
print("\n\nModel Predictions: ")
print(booster.predict(dtest)[:20])



Model Predictions: 
[0.02386593 0.9543875  0.02386593 0.02386593 0.04897502 0.10559791
 0.9543875  0.02876541 0.9543875  0.02423424 0.9543875  0.02876541
 0.02340852 0.02386593 0.02340852 0.02920706 0.02876541 0.9543875
 0.04897502 0.02876541]


True Labels: 
[0. 1. 0. 0. 0. 0. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0.]


In [None]:
# Decrypt Predictions