# Secure XGBoost Demo Notebook
This notebook provides an example of how one could use Secure XGBoost. The example consists of the following steps:
* Key generation
* Data encryption
* Enclave preparation (creation, attestation, key transfer)
* Data loading
* Training
* Prediction


This example simulates a scenario in which sensitive data on a client is outsourced to a remote machine with a hardware enclave for learning. The remote machine is completely untrusted, so nothing should be left in plaintext outside the enclave. In this scenario, the data would be encrypted and transferred, staying encrypted on the remote machine. The data would then be loaded into the enclave, decrypted, and used for learning. After computing a model and while performing inference, the predictions the model gives need to be encrypted inside the enclave before being transferred back to the client, where it can be decrypted.


In [1]:
%load_ext autoreload
%autoreload 2

In [2]:
import securexgboost as xgb
import os
from client import *

In [3]:
HOME_DIR = os.getcwd() + "/../../"

# Define OE flags
OE_ENCLAVE_FLAG_DEBUG = 1
OE_ENCLAVE_FLAG_SIMULATE = 2

## Key Generation
Generate a key to be used for encryption.

In [4]:
KEY_FILE = "key.txt"

# Generate a key you will be using for encryption
generate_client_key(KEY_FILE)

Generating client key...


## Data Encryption
Use the key generated above to encrypt our data.

In [5]:
# TODO: Should we set path variable for the encryption/decrypton python files

training_data = HOME_DIR + "demo/data/agaricus.txt.train"
enc_training_data = "train.enc"

# Encrypt training data
encrypt_file(training_data, enc_training_data, KEY_FILE)

Encrypting file /home/xgb/secure-xgboost/demo/enclave/../../demo/data/agaricus.txt.train


In [6]:
test_data = HOME_DIR + "demo/data/agaricus.txt.test"
enc_test_data = "test.enc"

# Encrypt test data
encrypt_file(test_data, enc_test_data, KEY_FILE)

Encrypting file /home/xgb/secure-xgboost/demo/enclave/../../demo/data/agaricus.txt.test


## Prepare Enclave
We'll need to create an enclave, authenticate the enclave, and lastly give the enclave the key we used to encrypt the data.

In [7]:
# Create an enclave
enclave = xgb.Enclave(HOME_DIR + "enclave/build/xgboost_enclave.signed", flags=(OE_ENCLAVE_FLAG_DEBUG), log_verbosity=3)

In [8]:
# Remote Attestation
enclave.get_remote_report_with_pubkey()
enclave_pem_key, enclave_key_size, remote_report, remote_report_size = enclave.get_report_attrs()
enclave.verify_remote_report_and_set_pubkey()

In [9]:
crypto = xgb.CryptoUtils()

sym_key = None

# Read the key into memory
with open(KEY_FILE, "rb") as keyfile:
    sym_key = keyfile.read()
    
print(sym_key)
for i in range(len(sym_key)):
    print(sym_key[i])

# Encrypt symmetric key
enc_sym_key, enc_sym_key_size = crypto.encrypt_data_with_pk(sym_key, len(sym_key), 
                                                            enclave_pem_key, enclave_key_size)
# Sign encrypted symmetric key
sig, sig_size = crypto.sign_data("keypair.pem", enc_sym_key, enc_sym_key_size)

# Add key to enclave
crypto.add_client_key(enc_sym_key, enc_sym_key_size, sig, sig_size)

b'w\xfd\x90sZ\x84\xa2+\xbc\xa1N\xc2W`{d\xda\x18h\xde\xe1Llq\xd9\xa3\xd1\x14\xbaN\xf6\x87'
119
253
144
115
90
132
162
43
188
161
78
194
87
96
123
100
218
24
104
222
225
76
108
113
217
163
209
20
186
78
246
135


1

## Load Data
Load the encrypted data into a `DMatrix`. 

In [10]:
# Create training matrix
dtrain = xgb.DMatrix(os.getcwd() + "/" + enc_training_data, encrypted=True)

In [11]:
# Create test matrix
dtest = xgb.DMatrix(os.getcwd() + "/" + enc_test_data, encrypted=True)

## Perform Training

In [12]:
# Set parameters
params = {
        "tree_method": "hist",
        "n_gpus": "0",
        "objective": "binary:logistic",
        "min_child_weight": "1",
        "gamma": "0.1",
        "max_depth": "3",
        "verbosity": "3" 
}

In [13]:
# Train
num_rounds = 5
booster = xgb.train(params, dtrain, num_rounds, evals=[(dtrain, "train"), (dtest, "test")])

[0]	train-error:0.014433	test-error:0.016139
[1]	train-error:0.014433	test-error:0.016139
[2]	train-error:0.014433	test-error:0.016139
[3]	train-error:0.008598	test-error:0.009932
[4]	train-error:0.001228	test-error:0


## Predict
Our `predict()` function yields predictions in an encrypted manner. The buffer that it returns will need to be decrypted using the same key that the original data was encrypted with.

In [14]:
# Get Encrypted Predictions
enc_preds, num_preds = booster.predict(dtest)

In [15]:
# Decrypt Predictions
preds = crypto.decrypt_predictions(sym_key, enc_preds, num_preds)
print(preds[:40])

[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.
 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
