# Graph Features Extraction for Anti-Money Laundering

This is an example notebook adapted from IBM Snap ML's [graph_feature_preprocessor.ipynb](https://github.com/IBM/snapml-examples/blob/main/examples/graph_feature_preprocessor/graph_feature_preprocessor.ipynb) that demonstrates how [Graph Feature Preprocessor](https://snapml.readthedocs.io/en/latest/graph_preprocessor.html) works on our Anti-Money Laundering dataset.

In [1]:
# Import the Graph Feature Preprocessor from Snap ML
from snapml import GraphFeaturePreprocessor

# Import other libraries
import numpy as np
import time
import json
import pandas as pd
from IPython.display import display

pd.options.display.max_columns = None

Here we assume that the user has access to a set of (labeled) transactions with raw features which could be used to train a machine learning (ML) model, e.g., for fraud detection. The user will extract graph features using the Graph Features Preprocessor which will be added to the initial raw features present in the transactions. The enriched set of features will be used to train an ML model. The main steps associated with this use case are shown below:

<div> <img src="img/gfp-use-case1.png" width="1000"> </div>


In [2]:
graph_path = "../data/HI-Small_Balanced_Formatted.csv"

df = pd.read_csv(graph_path)
df

Unnamed: 0,EdgeID,SourceAccountId,TargetAccountId,Timestamp,Amount Sent,Sent Currency,Amount Received,Receiving Currency,Payment Format,Is Laundering
0,748,1354,1354,10,1134.43,8,1134.43,8,5,0
1,1536,2760,2761,10,2786.09,1,2786.09,1,2,0
2,1621,2908,2908,10,16.19,0,16.19,0,5,0
3,2268,4052,4052,10,35296.17,1,35296.17,1,5,0
4,5069,8887,8887,10,14.14,4,14.14,4,5,0
...,...,...,...,...,...,...,...,...,...,...
10349,10249,15096,15121,1504510,2391.92,9,2391.92,9,0,1
10350,10053,14861,14879,1504930,3749.14,1,3749.14,1,0,1
10351,10054,14861,14880,1509490,1785.27,0,1785.27,0,0,1
10352,10055,14861,14881,1515490,2154.54,1,2154.54,1,0,1


In [3]:
df_simple = df[["EdgeID", "SourceAccountId", "TargetAccountId", "Timestamp"]]

colnames_original = df_simple.columns.tolist()

#convert df to numpy
data = df_simple.to_numpy()
print(data.shape)
df_simple

(10354, 4)


Unnamed: 0,EdgeID,SourceAccountId,TargetAccountId,Timestamp
0,748,1354,1354,10
1,1536,2760,2761,10
2,1621,2908,2908,10
3,2268,4052,4052,10
4,5069,8887,8887,10
...,...,...,...,...
10349,10249,15096,15121,1504510
10350,10053,14861,14879,1504930
10351,10054,14861,14880,1509490
10352,10055,14861,14881,1515490


In [4]:
# The following dictionary defines the configuration parameters of the Graph Feature Preprocessor

params = {
    "num_threads": 4,             # number of software threads to be used (important for performance)
    "time_window": 16,            # time window used if no pattern was specified
    
    "vertex_stats": True,         # produce vertex statistics
    "vertex_stats_cols": [3],     # produce vertex statistics using the selected input columns
    
    # features: 0:fan,1:deg,2:ratio,3:avg,4:sum,5:min,6:max,7:median,8:var,9:skew,10:kurtosis
    "vertex_stats_feats": [0, 1, 2, 3, 4, 8, 9, 10],  # fan,deg,ratio,avg,sum,var,skew,kurtosis
    
    # fan in/out parameters
    "fan": True,
    "fan_tw": 16,
    "fan_bins": [y+2 for y in range(2)],
    
    # in/out degree parameters
    "degree": True,
    "degree_tw": 16,
    "degree_bins": [y+2 for y in range(2)],
    
    # scatter gather parameters
    "scatter-gather": True,
    "scatter-gather_tw": 16,
    "scatter-gather_bins": [y+2 for y in range(2)],
    
    # temporal cycle parameters
    "temp-cycle": True,
    "temp-cycle_tw": 16,
    "temp-cycle_bins": [y+2 for y in range(2)],
    
    # length-constrained simple cycle parameters
    "lc-cycle": False,
    "lc-cycle_tw": 16,
    "lc-cycle_len": 8,
    "lc-cycle_bins": [y+2 for y in range(2)],
}

In [7]:
# Create a Graph Feature Preprocessor, set its configuration using the above dictionary and verify it

print("Creating a graph feature preprocessor ")
gp = GraphFeaturePreprocessor()

print("Setting the parameters of the graph feature preprocessor ")
gp.set_params(params)

print("Graph feature preprocessor parameters: ", json.dumps(gp.get_params(), indent=4))

Creating a graph feature preprocessor 
Setting the parameters of the graph feature preprocessor 
Graph feature preprocessor parameters:  {
    "num_threads": 4,
    "time_window": 16,
    "max_no_edges": -1,
    "vertex_stats": true,
    "vertex_stats_tw": 1728000,
    "vertex_stats_cols": [
        3
    ],
    "vertex_stats_feats": [
        0,
        1,
        2,
        3,
        4,
        8,
        9,
        10
    ],
    "fan": true,
    "fan_tw": 16,
    "fan_bins": [
        2,
        3
    ],
    "degree": true,
    "degree_tw": 16,
    "degree_bins": [
        2,
        3
    ],
    "scatter-gather": true,
    "scatter-gather_tw": 16,
    "scatter-gather_bins": [
        2,
        3
    ],
    "temp-cycle": true,
    "temp-cycle_tw": 16,
    "temp-cycle_bins": [
        2,
        3
    ],
    "lc-cycle": false,
    "lc-cycle_tw": 16,
    "lc-cycle_len": 8,
    "lc-cycle_bins": [
        2,
        3
    ]
}


In [8]:
print("Enriching the transactions with new graph features ")
print("Raw dataset shape: ", data.shape)

# the fit_transform and transform functions are equivalent
# these functions can run on single transactions or on batches of transactions
data = np.ascontiguousarray(data)
data_enriched = gp.fit_transform(data.astype("float64")) 

print("Enriched dataset shape: ", data_enriched.shape)

Enriching the transactions with new graph features 
Raw dataset shape:  (10354, 4)
Enriched dataset shape:  (10354, 48)


We define a helper function to inspect the newly generated graph-based features for a given transaction:

In [9]:
def print_enriched_transaction(transaction, params, colnames):
    '''
    Input: 
    - transaction: enriched data with graph features (in the form of a numpy array)
    - params: dictionary with the configuration parameters of the Graph Feature Preprocessor
    - colnames: list of column names of the enriched data
    '''
    
    # add features names for the graph patterns
    for pattern in ['fan', 'degree', 'scatter-gather', 'temp-cycle', 'lc-cycle']:
        if pattern in params:
            if params[pattern]: #if the pattern is enabled
                bins = len(params[pattern +'_bins'])
                # construct column names based on pattern type and bin ranges
                if pattern in ['fan', 'degree']:
                    for i in range(bins-1):
                        colnames.append(pattern+"_in_bins_"+str(params[pattern +'_bins'][i])+"-"+str(params[pattern +'_bins'][i+1]))
                    colnames.append(pattern+"_in_bins_"+str(params[pattern +'_bins'][i+1])+"-inf")
                    for i in range(bins-1):
                        colnames.append(pattern+"_out_bins_"+str(params[pattern +'_bins'][i])+"-"+str(params[pattern +'_bins'][i+1]))
                    colnames.append(pattern+"_out_bins_"+str(params[pattern +'_bins'][i+1])+"-inf")
                else:
                    for i in range(bins-1):
                        colnames.append(pattern+"_bins_"+str(params[pattern +'_bins'][i])+"-"+str(params[pattern +'_bins'][i+1]))
                    colnames.append(pattern+"_bins_"+str(params[pattern +'_bins'][i+1])+"-inf")

    vert_feat_names = ["fan","deg","ratio","avg","sum","min","max","median","var","skew","kurtosis"]

    # add features names for the vertex statistics
    for orig in ['source', 'dest']:
        for direction in ['out', 'in']:
            # add fan, deg, and ratio features
            for k in [0, 1, 2]:
                if k in params["vertex_stats_feats"]:
                    feat_name = orig + "_" + vert_feat_names[k] + "_" + direction
                    colnames.append(feat_name)
            for col in params["vertex_stats_cols"]:
                # add avg, sum, min, max, median, var, skew, and kurtosis features
                for k in [3, 4, 5, 6, 7, 8, 9, 10]:
                    if k in params["vertex_stats_feats"]:
                        feat_name = orig + "_" + vert_feat_names[k] + "_col" + str(col) + "_" + direction
                        colnames.append(feat_name)

    df = pd.DataFrame(transaction, columns=colnames)
    display(df)

    return df

print("Enriched transactions: ")
df_enriched = print_enriched_transaction(data_enriched, gp.get_params(), colnames_original)

Enriched transactions: 


Unnamed: 0,EdgeID,SourceAccountId,TargetAccountId,Timestamp,fan_in_bins_2-3,fan_in_bins_3-inf,fan_out_bins_2-3,fan_out_bins_3-inf,degree_in_bins_2-3,degree_in_bins_3-inf,degree_out_bins_2-3,degree_out_bins_3-inf,scatter-gather_bins_2-3,scatter-gather_bins_3-inf,temp-cycle_bins_2-3,temp-cycle_bins_3-inf,source_fan_out,source_deg_out,source_ratio_out,source_avg_col3_out,source_sum_col3_out,source_var_col3_out,source_skew_col3_out,source_kurtosis_col3_out,source_fan_in,source_deg_in,source_ratio_in,source_avg_col3_in,source_sum_col3_in,source_var_col3_in,source_skew_col3_in,source_kurtosis_col3_in,dest_fan_out,dest_deg_out,dest_ratio_out,dest_avg_col3_out,dest_sum_col3_out,dest_var_col3_out,dest_skew_col3_out,dest_kurtosis_col3_out,dest_fan_in,dest_deg_in,dest_ratio_in,dest_avg_col3_in,dest_sum_col3_in,dest_var_col3_in,dest_skew_col3_in,dest_kurtosis_col3_in
0,748.0,1354.0,1354.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.000000,10.0,10.0,0.000000e+00,0.000000,0.000000,1.0,1.0,1.0,1.000000e+01,10.0,0.000000e+00,0.000000,0.000000,1.0,1.0,1.0,10.0,10.0,0.0,0.0,0.0,1.0,1.0,1.0,10.0,10.0,0.0,0.0,0.0
1,1536.0,2760.0,2761.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.000000,10.0,10.0,0.000000e+00,0.000000,0.000000,0.0,0.0,0.0,0.000000e+00,0.0,0.000000e+00,0.000000,0.000000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,10.0,10.0,0.0,0.0,0.0
2,1621.0,2908.0,2908.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.000000,10.0,10.0,0.000000e+00,0.000000,0.000000,1.0,1.0,1.0,1.000000e+01,10.0,0.000000e+00,0.000000,0.000000,1.0,1.0,1.0,10.0,10.0,0.0,0.0,0.0,1.0,1.0,1.0,10.0,10.0,0.0,0.0,0.0
3,2268.0,4052.0,4052.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.000000,10.0,10.0,0.000000e+00,0.000000,0.000000,1.0,1.0,1.0,1.000000e+01,10.0,0.000000e+00,0.000000,0.000000,1.0,1.0,1.0,10.0,10.0,0.0,0.0,0.0,1.0,1.0,1.0,10.0,10.0,0.0,0.0,0.0
4,5069.0,8887.0,8887.0,10.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.000000,10.0,10.0,0.000000e+00,0.000000,0.000000,1.0,1.0,1.0,1.000000e+01,10.0,0.000000e+00,0.000000,0.000000,1.0,1.0,1.0,10.0,10.0,0.0,0.0,0.0,1.0,1.0,1.0,10.0,10.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10349,10249.0,15096.0,15121.0,1504510.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,13.0,13.0,1.000000,1347730.0,17520490.0,1.195573e+10,-0.167331,1.823783,13.0,13.0,1.0,1.032725e+06,13425430.0,9.046389e+09,-0.142320,2.284675,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1504510.0,1504510.0,0.0,0.0,0.0
10350,10053.0,14861.0,14879.0,1504930.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,14.0,15.0,1.071429,1398898.0,20983470.0,1.293296e+10,-0.329838,1.600035,8.0,8.0,1.0,9.674125e+05,7739300.0,1.705210e+10,0.194747,1.539931,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1504930.0,1504930.0,0.0,0.0,0.0
10351,10054.0,14861.0,14880.0,1509490.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,14.0,15.0,1.071429,1398898.0,20983470.0,1.293296e+10,-0.329838,1.600035,8.0,8.0,1.0,9.674125e+05,7739300.0,1.705210e+10,0.194747,1.539931,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1509490.0,1509490.0,0.0,0.0,0.0
10352,10055.0,14861.0,14881.0,1515490.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,14.0,15.0,1.071429,1398898.0,20983470.0,1.293296e+10,-0.329838,1.600035,8.0,8.0,1.0,9.674125e+05,7739300.0,1.705210e+10,0.194747,1.539931,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1515490.0,1515490.0,0.0,0.0,0.0


In [9]:
#drop EdgeID, SourceAccountId, TargetAccountId, Timestamp from df
df_enriched_2 = df_enriched.drop(["EdgeID", "SourceAccountId", "TargetAccountId", "Timestamp"], axis=1)

#concat with original df
df_enriched_2 = pd.concat([df, df_enriched_2], axis=1)

df_enriched_2.to_csv("../data/HI-Small_Balanced_Enriched.csv", index=False)
df_enriched_2

Unnamed: 0,EdgeID,SourceAccountId,TargetAccountId,Timestamp,Amount Sent,Sent Currency,Amount Received,Receiving Currency,Payment Format,Is Laundering,fan_in_bins_2-3,fan_in_bins_3-inf,fan_out_bins_2-3,fan_out_bins_3-inf,degree_in_bins_2-3,degree_in_bins_3-inf,degree_out_bins_2-3,degree_out_bins_3-inf,scatter-gather_bins_2-3,scatter-gather_bins_3-inf,temp-cycle_bins_2-3,temp-cycle_bins_3-inf,source_fan_out,source_deg_out,source_ratio_out,source_avg_col3_out,source_sum_col3_out,source_var_col3_out,source_skew_col3_out,source_kurtosis_col3_out,source_fan_in,source_deg_in,source_ratio_in,source_avg_col3_in,source_sum_col3_in,source_var_col3_in,source_skew_col3_in,source_kurtosis_col3_in,dest_fan_out,dest_deg_out,dest_ratio_out,dest_avg_col3_out,dest_sum_col3_out,dest_var_col3_out,dest_skew_col3_out,dest_kurtosis_col3_out,dest_fan_in,dest_deg_in,dest_ratio_in,dest_avg_col3_in,dest_sum_col3_in,dest_var_col3_in,dest_skew_col3_in,dest_kurtosis_col3_in
0,748,1354,1354,10,1134.43,8,1134.43,8,5,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,1.0,5772.5,11545.0,12281520.25,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,6956.0,6956.0,0.0,0.0,0.0,1.0,1.0,1.0,2268.0,2268.0,0.0,0.0,0.0
1,1536,2760,2761,10,2786.09,1,2786.09,1,2,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,1.0,1509.0,1509.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1509.0,1509.0,0.0,0.0,0.0
2,1621,2908,2908,10,16.19,0,16.19,0,5,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,4615.0,4615.0,0.00,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,13051.0,13051.0,0.0,0.0,0.0,1.0,1.0,1.0,4615.0,4615.0,0.0,0.0,0.0
3,2268,4052,4052,10,35296.17,1,35296.17,1,5,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,873.0,1746.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,873.0,1746.0,0.0,0.0,0.0
4,5069,8887,8887,10,14.14,4,14.14,4,5,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,1.0,3296.0,3296.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,2.0,2.0,1.0,3296.0,3296.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
10349,10249,15096,15121,1504510,2391.92,9,2391.92,9,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1414390.0,1414390.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1414390.0,1414390.0,0.0,0.0,0.0
10350,10053,14861,14879,1504930,3749.14,1,3749.14,1,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1433170.0,1433170.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1433170.0,1433170.0,0.0,0.0,0.0
10351,10054,14861,14880,1509490,1785.27,0,1785.27,0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1453210.0,1453210.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1453210.0,1453210.0,0.0,0.0,0.0
10352,10055,14861,14881,1515490,2154.54,1,2154.54,1,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1504510.0,1504510.0,0.00,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1504510.0,1504510.0,0.0,0.0,0.0


This newly enriched set of transactions can now be used to train a ML model. 

# Train-test-split

In [10]:
#train test split
from sklearn.model_selection import train_test_split

#split df
df_train, df_test = train_test_split(df_enriched_2, test_size=0.2, random_state=42)

#x and y
X_train = df_train.drop(["Is Laundering"], axis=1)
y_train = df_train["Is Laundering"]

X_test = df_test.drop(["Is Laundering"], axis=1)
y_test = df_test["Is Laundering"]

# XGBoost

In [11]:
from sklearn.decomposition import PCA
from sklearn.model_selection import GridSearchCV, train_test_split
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

from concrete.ml.sklearn.xgb import XGBClassifier

  from .autonotebook import tqdm as notebook_tqdm


In [12]:

# Define our model
model = XGBClassifier(n_jobs=1, n_bits=3)

# Define the pipeline
# We normalize the data and apply a PCA before fitting the model
pipeline = Pipeline(
    [("standard_scaler", StandardScaler()), ("pca", PCA(random_state=0)), ("model", model)]
)

# Define the parameters to tune
param_grid = {
    "pca__n_components": [2, 5, 10, 15],
    "model__max_depth": [2, 3, 5],
    "model__n_estimators": [5, 10, 20],
}


In [13]:
# Instantiate the grid search with 5-fold cross validation on all available cores
grid = GridSearchCV(pipeline, param_grid, cv=5, n_jobs=-1, scoring="accuracy")

# Launch the grid search
grid.fit(X_train, y_train)

# Print the best parameters found
print(f"Best parameters found: {grid.best_params_}")

best_pipeline = grid.best_estimator_
data_transformation_pipeline = best_pipeline[:-1]
model = best_pipeline[-1]

Best parameters found: {'model__max_depth': 5, 'model__n_estimators': 20, 'pca__n_components': 15}


Once trained, the model can be used for prediction (e.g., detect anomalies) on new (unlabeled) transactions. 

In [14]:
# Transform test set
X_train_transformed = data_transformation_pipeline.transform(X_train)
X_test_transformed = data_transformation_pipeline.transform(X_test)

# Evaluate the model on the test set in clear
start_time_clear = time.time()
y_pred_clear = model.predict(X_test_transformed)
end_time_clear = time.time()
elapsed_time_clear = end_time_clear - start_time_clear

# In the output, the Test accuracy in clear should be > 0.9

# Compile the model to FHE
model.compile(X_train_transformed)

# Perform the inference in FHE
# Run the inference on encrypted inputs
start_time_fhe = time.time() 
y_pred_fhe = model.predict(X_test_transformed, fhe="execute")
end_time_fhe = time.time()
elapsed_time_fhe = end_time_fhe - start_time_fhe

ratio_elapsed_time = elapsed_time_fhe / elapsed_time_clear

# Print the results
print("In unencrypted:", y_pred_clear)
print("In FHE        :", y_pred_fhe)
print(f"Results similarity between FHE and unencrypted: {int((y_pred_fhe == y_pred_clear).mean()*100)}%")

print(f"Prediction time for unencrypted: {elapsed_time_clear:.6f}s")
print(f"Prediction time in FHE         : {elapsed_time_fhe:.6f}s")
print("Prediction time of FHE / unencrypted: {:.2f}x".format(ratio_elapsed_time))

In unencrypted: [1 1 1 ... 0 1 0]
In FHE        : [1 1 1 ... 0 1 0]
Results similarity between FHE and unencrypted: 100%
Prediction time for unencrypted: 0.086140s
Prediction time in FHE         : 14808.949812s
Prediction time of FHE / unencrypted: 171916.44x


Now the enriched transactions can be used as input to the ML model previously trained. 