# Secure XGBoost Demo Notebook
This notebook provides an example of how one could use Secure XGBoost. The example consists of the following steps:
* Key generation
* Data encryption
* Enclave preparation (creation, attestation, key transfer)
* Data loading
* Training
* Prediction


This example simulates a scenario in which sensitive data on a client is outsourced to a remote machine with a hardware enclave for learning. The remote machine is completely untrusted, so nothing should be left in plaintext outside the enclave. In this scenario, the data would be encrypted and transferred, staying encrypted on the remote machine. The data would then be loaded into the enclave, decrypted, and used for learning. After computing a model and while performing inference, the predictions the model gives need to be encrypted inside the enclave before being transferred back to the client, where it can be decrypted.


In [1]:
import securexgboost as xgb
import os
from client import *

In [2]:
HOME_DIR = os.getcwd() + "/../../"

# Define OE flags
OE_ENCLAVE_FLAG_DEBUG = 1
OE_ENCLAVE_FLAG_SIMULATE = 2

## Key Generation
Generate a key to be used for encryption.

In [3]:
KEY_FILE = "key.txt"

# Generate a key you will be using for encryption
generate_client_key(KEY_FILE)

Generating client key...


## Data Encryption
Use the key generated above to encrypt our data.

In [4]:
# TODO: Should we set path variable for the encryption/decrypton python files

training_data = HOME_DIR + "demo/data/agaricus.txt.train"
enc_training_data = "train.enc"

# Encrypt training data
encrypt_file(training_data, enc_training_data, KEY_FILE)

Encrypting file /home/xgb/secure-xgboost/demo/enclave/../../demo/data/agaricus.txt.train


In [5]:
test_data = HOME_DIR + "demo/data/agaricus.txt.test"
enc_test_data = "test.enc"

# Encrypt test data
encrypt_file(test_data, enc_test_data, KEY_FILE)

Encrypting file /home/xgb/secure-xgboost/demo/enclave/../../demo/data/agaricus.txt.test


## Prepare Enclave
We'll need to create an enclave, authenticate the enclave, and lastly give the enclave the key we used to encrypt the data.

In [6]:
# Create an enclave
enclave = xgb.Enclave(HOME_DIR + "enclave/build/xgboost_enclave.signed", flags=(OE_ENCLAVE_FLAG_DEBUG), log_verbosity=3)

In [7]:
# Remote Attestation
enclave.get_remote_report_with_pubkey()
enclave_pem_key, enclave_key_size, remote_report, remote_report_size = enclave.get_report_attrs()
enclave.verify_remote_report_and_set_pubkey()

In [8]:
crypto = xgb.CryptoUtils()

sym_key = None

# Read the key into memory
with open(KEY_FILE, "rb") as keyfile:
    sym_key = keyfile.read()
    
print(sym_key)

# Encrypt symmetric key
enc_sym_key, enc_sym_key_size = crypto.encrypt_data_with_pk(sym_key, len(sym_key), 
                                                            enclave_pem_key, enclave_key_size)
# Sign encrypted symmetric key
sig, sig_size = crypto.sign_data("keypair.pem", enc_sym_key, enc_sym_key_size)

# Add key to enclave
crypto.add_client_key(enc_sym_key, enc_sym_key_size, sig, sig_size)

b'\xb8\x8fVPR \r\xbb\x9c5\xb2\xefs\x84E\xbc\xfai\xa7\x0c\x9b\x1b\xb3\x97\x89~\xdfnN\xd7\xfb)'
184 143 86 80 82 32 13 187 156 53 178 239 115 132 69 188 250 105 167 12 155 27 179 151 137 126 223 110 78 215 251 41 

1

## Load Data
Load the encrypted data into a `DMatrix`. 

In [9]:
# Create training matrix
dtrain = xgb.DMatrix(os.getcwd() + "/" + enc_training_data, encrypted=True)

In [10]:
# Create test matrix
dtest = xgb.DMatrix(os.getcwd() + "/" + enc_test_data, encrypted=True)

## Perform Training

In [11]:
# Set parameters
params = {
        "tree_method": "hist",
        "n_gpus": "0",
        "objective": "binary:logistic",
        "min_child_weight": "1",
        "gamma": "0.1",
        "max_depth": "3",
        "verbosity": "3" 
}

In [12]:
# Train
num_rounds = 10
booster = xgb.train(params, dtrain, num_rounds, evals=[(dtrain, "train"), (dtest, "test")])

[0]	train-error:0.014433	test-error:0.016139
[1]	train-error:0.014433	test-error:0.016139
[2]	train-error:0.014433	test-error:0.016139
[3]	train-error:0.008598	test-error:0.009932
[4]	train-error:0.001228	test-error:0
[5]	train-error:0.001228	test-error:0
[6]	train-error:0.001228	test-error:0
[7]	train-error:0.001228	test-error:0
[8]	train-error:0.001228	test-error:0
[9]	train-error:0.001228	test-error:0


## Predict
Our `predict()` function yields predictions in an encrypted manner. The buffer that it returns will need to be decrypted using the same key that the original data was encrypted with.

In [11]:
# Get Encrypted Predictions
enc_preds, num_preds = booster.predict(dtest)



Model Predictions: 
[0.02386593 0.9543875  0.02386593 0.02386593 0.04897502 0.10559791
 0.9543875  0.02876541 0.9543875  0.02423424 0.9543875  0.02876541
 0.02340852 0.02386593 0.02340852 0.02920706 0.02876541 0.9543875
 0.04897502 0.02876541]


True Labels: 
[0. 1. 0. 0. 0. 0. 1. 0. 1. 0. 1. 0. 0. 0. 0. 0. 0. 1. 0. 0.]


In [None]:
# Decrypt Predictions
preds = crypto.decrypt_predictions(sym_key, enc_preds, num_preds)
print(preds)