## Cardiac risk score by regression tree

This notebook provides an example of using constructing a regression tree using THFE Concrete for Python taking risk factors (features) as input in the for tabular data. Note that it is a stand-alone example used to illustrate the concepts in a single notebook on a single server. In a real application one would have to distribute public keys to the executing server.  

The notebook also presents some simple numerical investigations and benchmarking of resulting THFE circuit.

Please familiarize yourself with the TFHE Concrete Python library by Zama (link to tutorial below) to better follow this example.
- https://docs.zama.ai/concrete

The risk factor based algorithm is taken from Carpov and Constantino (2016).
- S. Carpov, T. H. Nguyen, R. Sirdey, G. Constantino and F. Martinelli, "Practical Privacy-Preserving Medical Diagnosis Using Homomorphic Encryption," https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7820321

In [1]:
# requirements

if False:
    #!pip install -U pip wheel setuptools 
    !pip install concrete-python


In [2]:
import numpy as np
import matplotlib.pyplot as plt
import time
import pandas as pd

from itertools import product
from glob import glob

from concrete import fhe


In [3]:
# the risk score assessment is based on these features

features = [
    "man", "smoking", "diabetic", "high_blood_pressure", "alco",
    "age", "HDL_chol", "weight", "height", "exercise"
]

x = [1, 0, 0, 0, 2, 46, 50, 60, 173, 50]

pd.DataFrame(dict(zip(features,x)), index = ["x"])


Unnamed: 0,man,smoking,diabetic,high_blood_pressure,alco,age,HDL_chol,weight,height,exercise
x,1,0,0,0,2,46,50,60,173,50


In [4]:
# construct an input set which is used to define ranges of inputs

rnd = lambda lower, upper: np.random.randint(lower,upper+1)
inputset = [
    [rnd(0,1), rnd(0,1), rnd(0,1), rnd(0,1), rnd(0,6), 
     rnd(20,100), rnd(20,60), rnd(50,120), rnd(150,190), rnd(0,120)]
    for i in range(20)
]

pd.DataFrame(inputset, columns = features)


Unnamed: 0,man,smoking,diabetic,high_blood_pressure,alco,age,HDL_chol,weight,height,exercise
0,1,1,0,0,2,46,42,60,175,50
1,1,1,1,1,1,78,56,108,157,45
2,1,0,0,0,5,34,23,111,161,59
3,1,1,0,0,6,41,42,79,161,11
4,0,1,1,1,0,26,55,85,176,95
5,1,0,0,1,3,69,23,104,178,60
6,1,0,0,1,4,90,31,89,185,76
7,1,1,1,0,0,82,56,62,175,59
8,0,0,1,1,4,90,55,96,155,101
9,1,1,0,1,5,65,60,79,189,68


In [5]:
# define the accummulation tree structure

# branching statements, like "if", are not allowed
# therefore we must replace with the clip function

@fhe.compiler({"x": "encrypted"})
def score(x):

    input = dict(zip(features,x))

    # these are risk indicators in themselves
    y = input["smoking"] + input["diabetic"] + input["high_blood_pressure"]
    
    # high alcohol consumption is a risk, but depends on sex 
    x = input["alco"] + 1 - input["man"]
    #y += 1 if x > 3 else 0
    y += np.clip(x-3,0,1)
    
    # risk increases with age, especially for men
    x = input["age"] + 10*input["man"]
    #y += 1 if x > 60 else 0
    y += np.clip(x-60,0,1)
    
    # high levels if "good" cholestrol decreases risk
    x = input["HDL_chol"]
    #y += 1 if x < 40 else 0
    y += np.clip(40-x,0,1)
    
    # taller people can weigh more without being at risk
    x = input["height"] - input["weight"]
    #y += 1 if x < 90 else 0
    y += np.clip(90-x,0,1)
    
    # insufficient exercise is a risk factor
    x = input["exercise"]
    #y += 1 if x < 30 else 0
    y += np.clip(30-x,0,1)

    return y

# test of example input
score(x)


0

In [6]:
# this defines the TFHE calculation circuit

# integer types are based on the inputset

circuit = score.compile(inputset)

print(circuit)


 %0 = x                          # EncryptedTensor<uint8, shape=(10,)>        ∈ [0, 190]
 %1 = %0[0]                      # EncryptedScalar<uint1>                     ∈ [0, 1]
 %2 = %0[1]                      # EncryptedScalar<uint1>                     ∈ [0, 1]
 %3 = %0[2]                      # EncryptedScalar<uint1>                     ∈ [0, 1]
 %4 = %0[3]                      # EncryptedScalar<uint1>                     ∈ [0, 1]
 %5 = %0[4]                      # EncryptedScalar<uint3>                     ∈ [0, 6]
 %6 = %0[5]                      # EncryptedScalar<uint7>                     ∈ [25, 98]
 %7 = %0[6]                      # EncryptedScalar<uint6>                     ∈ [22, 60]
 %8 = %0[7]                      # EncryptedScalar<uint7>                     ∈ [50, 114]
 %9 = %0[8]                      # EncryptedScalar<uint8>                     ∈ [154, 190]
%10 = %0[9]                      # EncryptedScalar<uint7>                     ∈ [8, 111]
%11 = add(%2, %3)           

In [7]:
# evaluate and time on the whole inputset

for input in inputset:
    t0 = time.time()
    ans = circuit.encrypt_run_decrypt(input)
    #ans = circuit.simulate(input)
    print("x:", input, "y:", ans, "t:", time.time()-t0)


x: [1, 1, 0, 0, 2, 46, 42, 60, 175, 50] y: 1 t: 52.86190938949585
x: [1, 1, 1, 1, 1, 78, 56, 108, 157, 45] y: 5 t: 4.271595239639282
x: [1, 0, 0, 0, 5, 34, 23, 111, 161, 59] y: 3 t: 4.33566427230835
x: [1, 1, 0, 0, 6, 41, 42, 79, 161, 11] y: 4 t: 4.341582775115967
x: [0, 1, 1, 1, 0, 26, 55, 85, 176, 95] y: 3 t: 4.267025470733643
x: [1, 0, 0, 1, 3, 69, 23, 104, 178, 60] y: 4 t: 4.384103298187256
x: [1, 0, 0, 1, 4, 90, 31, 89, 185, 76] y: 4 t: 4.411282300949097
x: [1, 1, 1, 0, 0, 82, 56, 62, 175, 59] y: 3 t: 4.54086709022522
x: [0, 0, 1, 1, 4, 90, 55, 96, 155, 101] y: 5 t: 4.364011287689209
x: [1, 1, 0, 1, 5, 65, 60, 79, 189, 68] y: 4 t: 4.256637334823608
x: [1, 0, 1, 1, 6, 98, 45, 60, 170, 25] y: 5 t: 4.276859521865845
x: [1, 1, 1, 1, 3, 25, 34, 63, 170, 104] y: 4 t: 4.281804084777832
x: [0, 0, 1, 0, 6, 58, 39, 114, 161, 20] y: 5 t: 4.273319482803345
x: [1, 1, 0, 1, 6, 98, 53, 64, 176, 109] y: 4 t: 4.30555534362793
x: [0, 1, 0, 1, 4, 67, 39, 110, 163, 98] y: 6 t: 4.303948879241943
x: [0