# Credit card fraud detection with Federated XGBoost

This notebook shows how to convert an existing tabular credit dataset, enrich and pre-process the data using a single site (like a centralized dataset), and then convert this centralized process into federated ETL steps easily. Then, construct a federated XGBoost; the only thing the user needs to define is the XGBoost data loader. 


## Step 1: Data Preparation 
First, we prepare the data by adding random transactional information to the base creditcard dataset following the below script:

* [prepare data](./notebooks/1.1.prepare_data.ipynb)

## Step 2: Feature Analysis

For this stage, we would like to analyze the data, understand the features, and derive (and encode) secondary features that can be more useful for building the model.

Towards this goal, there are two options:
1. **Feature Enrichment**: This process involves adding new features based on the existing data. For example, we can calculate the average transaction amount for each currency and add this as a new feature. 
2. **Feature Encoding**: This process involves encoding the current features and transforming them to embedding space via machine learning models. This model can be either pre-trained, or trained with the candidate dataset.

Considering the fact that the only two numerical features in the dataset are "Amount" and "Time", we will perform feature enrichment first. Optionally, we can also perform feature encoding. In this example, we use a graph neural network (GNN); we will train the GNN model in a federated, unsupervised fashion and then use the model to encode the features for all sites. 

### Step 2.1: Rule-based Feature Enrichment

#### Single-site Enrichment and Additional Processing
The detailed feature enrichment step is illustrated using one site as example: 

* [feature_enrichments with-one-site](./notebooks/2.1.1.feature_enrichment.ipynb)

Similarly, we examine the additional pre-processing step using one site: 

* [pre-processing with one-site](./notebooks/2.1.2.pre_process.ipynb)


#### Federated Job to Perform on All Sites
In order to run feature enrichment and processing job on each site similar to above steps, we wrote federated ETL job scripts for client-side based on single-site implementations.

* [enrichment script](./src/enrich.py)
* [pre-processing script](./src/pre_process.py) 


### (Optional) Step 2.2: GNN-based Feature Encoding
Based on raw features, or combining the derived features from **Step 2.1**, we can use machine learning models to encode the features. 
In this example, we use federated GNN to learn and generate the feature embeddings.

First, we construct a graph based on the transaction data. Each node represents a transaction, and the edges represent the relationships between transactions. We then use the GNN to learn the embeddings of the nodes, which represent the transaction features.

#### Single-site operation example: graph construction
The detailed graph construction step is illustrated using one site as example:

* [graph_construction with one-site](./notebooks/graph_construct.ipynb)

The detailed GNN training and encoding step is illustrated using one site as example:

* [gnn_training_encoding with one-site](./notebooks/gnn_train_encode.ipynb)

#### Federated Job to Perform on All Sites
In order to run feature graph construction job on each site similar to the enrichment and processing steps, we wrote federated ETL job scripts for client-side based on single-site implementations.

* [graph_construction script](./src/graph_construct.py)
* [gnn_train_encode script](./src/gnn_train_encode.py)


The resulting GNN encodings will be merged with the normalized data for enhancing the feature.

## Step 3: Federated XGBoost 

Now that we have the data ready, either enriched and normalized features, or GNN feature embeddings, we can fit them with XGBoost. NVIDIA FLARE has already written XGBoost Controller and Executor for the job. All we need to provide is the data loader to fit into the XGBoost.

Notice we assign defined a [```CreditCardDataLoader```](./src/xgb_data_loader.py), this a XGBLoader we defined to load the credit card dataset. 

```py
import os
from typing import Optional, Tuple

import pandas as pd
import xgboost as xgb
from xgboost.core import DataSplitMode

from src.app_opt.xgboost.data_loader import XGBDataLoader


class CreditCardDataLoader(XGBDataLoader):
    def __init__(self, root_dir: str, file_postfix: str):
        self.dataset_names = ["train", "test"]
        self.base_file_names = {}
        self.root_dir = root_dir
        self.file_postfix = file_postfix
        for name in self.dataset_names:
            self.base_file_names[name] = name + file_postfix
        self.numerical_columns = [
            "Timestamp",
            "Amount",
            "trans_volume",
            "total_amount",
            "average_amount",
            "hist_trans_volume",
            "hist_total_amount",
            "hist_average_amount",
            "x2_y1",
            "x3_y2",
        ]

    def load_data(self, client_id: str, split_mode: int) -> Tuple[xgb.DMatrix, xgb.DMatrix]:
        data = {}
        for ds_name in self.dataset_names:
            print("\nloading for site = ", client_id, f"{ds_name} dataset \n")
            file_name = os.path.join(self.root_dir, client_id, self.base_file_names[ds_name])
            df = pd.read_csv(file_name)
            data_num = len(data)

            # split to feature and label
            y = df["Class"]
            x = df[self.numerical_columns]
            data[ds_name] = (x, y, data_num)


        # training
        x_train, y_train, total_train_data_num = data["train"]
        data_split_mode = DataSplitMode(split_mode)
        dmat_train = xgb.DMatrix(x_train, label=y_train, data_split_mode=data_split_mode)

        # validation
        x_valid, y_valid, total_valid_data_num = data["test"]
        dmat_valid = xgb.DMatrix(x_valid, label=y_valid, data_split_mode=data_split_mode)

        return dmat_train, dmat_valid
```

We are now ready to run all the code

## Run All the Jobs End-to-end
Here we are going to run each job in sequence. For real-world use case,

* prepare data is not needed, as you already have the data
* feature enrichment / encoding scripts need to be defined based on your own technique
* for XGBoost Job, you will need to write your own data loader 

### Prepare Data

In [40]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("samayashar/fraud-detection-transactions-dataset")
input_csv = f"{path}/synthetic_fraud_dataset.csv"


# only generate config file, or also run the simulated job (on the same machine)
config_only = False
# the workdir is used to store the job config and the simulated job results for each node
work_dir = "/tmp/czt/jobs/workdir"
# the processed dataset folder is used to store the processed data, preparing for each node, and also output the results
output_folder = "/tmp/czt/dataset"

!mkdir -p {output_folder}
!mkdir -p {output_folder}


In [41]:
! python3 ./utils/prepare_data.py -i {input_csv} -o {output_folder}

Historical DataFrame size: 27500
Training DataFrame size: 17500
Testing DataFrame size: 5000
Saved HCBHSGSG history transactions to /tmp/czt/dataset/HCBHSGSG_Bank_9/history.csv
Saved XITXUS33 history transactions to /tmp/czt/dataset/XITXUS33_Bank_10/history.csv
Saved YSYCESMM history transactions to /tmp/czt/dataset/YSYCESMM_Bank_7/history.csv
Saved YXRXGB22 history transactions to /tmp/czt/dataset/YXRXGB22_Bank_3/history.csv
Saved ZNZZAU3M history transactions to /tmp/czt/dataset/ZNZZAU3M_Bank_8/history.csv
Saved HCBHSGSG train transactions to /tmp/czt/dataset/HCBHSGSG_Bank_9/train.csv
Saved XITXUS33 train transactions to /tmp/czt/dataset/XITXUS33_Bank_10/train.csv
Saved YSYCESMM train transactions to /tmp/czt/dataset/YSYCESMM_Bank_7/train.csv
Saved YXRXGB22 train transactions to /tmp/czt/dataset/YXRXGB22_Bank_3/train.csv
Saved ZNZZAU3M train transactions to /tmp/czt/dataset/ZNZZAU3M_Bank_8/train.csv
Saved HCBHSGSG test transactions to /tmp/czt/dataset/HCBHSGSG_Bank_9/test.csv
Saved X

In [42]:
site_names = [
    "HCBHSGSG_Bank_9",
    "XITXUS33_Bank_10",
    "YSYCESMM_Bank_7",
    "YXRXGB22_Bank_3",
    "ZNZZAU3M_Bank_8",
]

!echo {' '.join(site_names)}

HCBHSGSG_Bank_9 XITXUS33_Bank_10 YSYCESMM_Bank_7 YXRXGB22_Bank_3 ZNZZAU3M_Bank_8


In [43]:
from nvflare import FedJob
from nvflare.app_common.workflows.etl_controller import ETLController
from nvflare.job_config.script_runner import ScriptRunner

### Enrich data

In [44]:
job = FedJob(name="enrich_job")

enrich_ctrl = ETLController(task_name="enrich")
job.to(enrich_ctrl, "server", id="enrich")

# Add clients
for site_name in site_names:
    executor = ScriptRunner(
        # for this, we output the enriched data to the same folder
        script="src/enrich.py", script_args=f"-i {output_folder} -o {output_folder}"
    )
    job.to(executor, site_name, tasks=["enrich"])

if work_dir:
    print(f"{work_dir=}")
    job.export_job(work_dir)

if not config_only:
    job.simulator_run(work_dir)

work_dir='/tmp/czt/jobs/workdir'
[38m2025-08-05 14:00:23,485 - ETLController - INFO - Initializing BaseModelController workflow.[0m
[38m2025-08-05 14:00:23,486 - ETLController - INFO - Beginning model controller run.[0m
[38m2025-08-05 14:00:23,486 - ETLController - INFO - enrich task started.[0m
[38m2025-08-05 14:00:23,487 - ETLController - INFO - Sampled clients: ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:00:23,487 - ETLController - INFO - Sending task enrich to ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:00:28,606 - TaskScriptRunner - INFO - start task run() with full path: /tmp/czt/jobs/workdir/YSYCESMM_Bank_7/simulate_job/app_YSYCESMM_Bank_7/custom/src/enrich.py[0m
[38m2025-08-05 14:00:28,615 - TaskScriptRunner - INFO - start task run() with full path: /tmp/czt/jobs/workdir/YXRXGB22_Bank_3/simulate_job/app_YXRXGB22_Bank_3/

### Pre-Process Data

In [45]:

job = FedJob(name="pre_processing_job")

pre_process_ctrl = ETLController(task_name="pre_process")
job.to(pre_process_ctrl, "server", id="pre_process")

# Add clients
for site_name in site_names:
    executor = ScriptRunner(script="src/pre_process.py", script_args=f"-i {output_folder} -o {output_folder}")
    job.to(executor, site_name, tasks=["pre_process"])

if work_dir:
    print(f"{work_dir=}")
    job.export_job(work_dir)

if not config_only:
    job.simulator_run(work_dir)


work_dir='/tmp/czt/jobs/workdir'
[38m2025-08-05 14:00:37,586 - ETLController - INFO - Initializing BaseModelController workflow.[0m
[38m2025-08-05 14:00:37,587 - ETLController - INFO - Beginning model controller run.[0m
[38m2025-08-05 14:00:37,587 - ETLController - INFO - pre_process task started.[0m
[38m2025-08-05 14:00:37,588 - ETLController - INFO - Sampled clients: ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:00:37,588 - ETLController - INFO - Sending task pre_process to ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:00:42,612 - TaskScriptRunner - INFO - start task run() with full path: /tmp/czt/jobs/workdir/YXRXGB22_Bank_3/simulate_job/app_YXRXGB22_Bank_3/custom/src/pre_process.py[0m
[38m2025-08-05 14:00:42,616 - TaskScriptRunner - INFO - start task run() with full path: /tmp/czt/jobs/workdir/ZNZZAU3M_Bank_8/simulate_job/app_Z

### Construct Graph

In [46]:
job = FedJob(name="graph_construct_job")

graph_construct_ctrl = ETLController(task_name="graph_construct")
job.to(graph_construct_ctrl, "server", id="graph_construct")

# Add clients
for site_name in site_names:
    executor = ScriptRunner(script="src/graph_construct.py", script_args=f"-i {output_folder} -o {output_folder}")
    job.to(executor, site_name, tasks=["graph_construct"])

if work_dir:
    print(f"{work_dir=}")
    job.export_job(work_dir)

if not config_only:
    job.simulator_run(work_dir)

work_dir='/tmp/czt/jobs/workdir'
[38m2025-08-05 14:00:51,700 - ETLController - INFO - Initializing BaseModelController workflow.[0m
[38m2025-08-05 14:00:51,701 - ETLController - INFO - Beginning model controller run.[0m
[38m2025-08-05 14:00:51,701 - ETLController - INFO - graph_construct task started.[0m
[38m2025-08-05 14:00:51,702 - ETLController - INFO - Sampled clients: ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:00:51,702 - ETLController - INFO - Sending task graph_construct to ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:00:56,802 - TaskScriptRunner - INFO - start task run() with full path: /tmp/czt/jobs/workdir/YSYCESMM_Bank_7/simulate_job/app_YSYCESMM_Bank_7/custom/src/graph_construct.py[0m
[38m2025-08-05 14:00:56,824 - TaskScriptRunner - INFO - start task run() with full path: /tmp/czt/jobs/workdir/XITXUS33_Bank_10/simul

### GNN Training and Encoding

In [47]:
from torch_geometric.nn import GraphSAGE

from nvflare import FedJob
from nvflare.app_common.workflows.fedavg import FedAvg
from nvflare.app_opt.pt.job_config.model import PTModel
from nvflare.job_config.script_runner import ScriptRunner

job = FedJob(name="gnn_train_encode_job")

# Define the controller workflow and send to server
controller = FedAvg(
    num_clients=len(site_names),
    num_rounds=2,
)
job.to(controller, "server")

# Define the model
model = GraphSAGE(
    in_channels=10,
    hidden_channels=64,
    num_layers=2,
    out_channels=64,
)
job.to(PTModel(model), "server")

# Add clients
for site_name in site_names:
    executor = ScriptRunner(script="src/gnn_train_encode.py", script_args=f"-i {output_folder} -o {output_folder}")
    job.to(executor, site_name)

if work_dir:
    print(f"{work_dir=}")
    job.export_job(work_dir)

if not config_only:
    job.simulator_run(work_dir)


work_dir='/tmp/czt/jobs/workdir'
[38m2025-08-05 14:01:14,232 - FedAvg - INFO - Initializing BaseModelController workflow.[0m
[38m2025-08-05 14:01:14,232 - FedAvg - INFO - Beginning model controller run.[0m
[38m2025-08-05 14:01:14,233 - FedAvg - INFO - Start FedAvg.[0m
[38m2025-08-05 14:01:14,233 - FedAvg - INFO - loading initial model from persistor[0m
[38m2025-08-05 14:01:14,233 - PTFileModelPersistor - INFO - Both source_ckpt_file_full_name and ckpt_preload_path are not provided. Using the default model weights initialized on the persistor side.[0m
[38m2025-08-05 14:01:14,234 - FedAvg - INFO - Round 0 started.[0m
[38m2025-08-05 14:01:14,234 - FedAvg - INFO - Sampled clients: ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:01:14,234 - FedAvg - INFO - Sending task train to ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:01:19,049 -

### GNN Encoding Merge

In [48]:
! python3 ./utils/merge_feat.py -i {output_folder}

Processing folder:  ZNZZAU3M_Bank_8
Processing folder:  YXRXGB22_Bank_3
Processing folder:  YSYCESMM_Bank_7
Processing folder:  XITXUS33_Bank_10
Processing folder:  HCBHSGSG_Bank_9


### Run XGBoost Job
#### Without GNN embeddings

In [49]:
from nvflare.app_opt.xgboost.histogram_based_v2.fed_controller import XGBFedController
from nvflare.app_opt.xgboost.histogram_based_v2.fed_executor import (
    FedXGBHistogramExecutor,
)

from xgb_data_loader import CreditCardDataLoader


num_rounds = 10
early_stopping_rounds = 10
xgb_params = {
    "max_depth": 8,
    "eta": 0.1,
    "objective": "binary:logistic",
    "eval_metric": "auc",
    "tree_method": "hist",
    "nthread": 16,
}

job = FedJob(name="xgb_job")

# Define the controller workflow and send to server
controller = XGBFedController(
    num_rounds=num_rounds,
    data_split_mode=0,
    secure_training=False,
    xgb_params=xgb_params,
    xgb_options={"early_stopping_rounds": early_stopping_rounds},
)
job.to(controller, "server")

# Add clients
for site_name in site_names:
    executor = FedXGBHistogramExecutor(data_loader_id="data_loader")
    job.to(executor, site_name)
    data_loader = CreditCardDataLoader(root_dir=output_folder, file_postfix="_normalized.csv")
    job.to(data_loader, site_name, id="data_loader")

if work_dir:
    print("work_dir=", work_dir)
    job.export_job(work_dir)

if not config_only:
    job.simulator_run(work_dir)


work_dir= /tmp/czt/jobs/workdir
[38m2025-08-05 14:01:35,927 - XGBFedController - INFO - Waiting for clients to be ready: ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:01:35,927 - XGBFedController - INFO - Configuring clients ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:01:35,928 - XGBFedController - INFO - sending task config to clients ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:01:40,794 - FedXGBHistogramExecutor - INFO - got my rank: 3[0m
[38m2025-08-05 14:01:40,799 - FedXGBHistogramExecutor - INFO - got my rank: 0[0m
[38m2025-08-05 14:01:40,806 - FedXGBHistogramExecutor - INFO - got my rank: 4[0m
[38m2025-08-05 14:01:40,814 - XGBFedController - INFO - successfully configured client YXRXGB22_Bank_3[0m
[38m2025-08-05 14:01:40,817 - XGBFedController - INF

[14:01:41] Insecure federated server listening on 0.0.0.0:25345, world size 5


[38m2025-08-05 14:01:41,781 - XGBFedController - INFO - Starting clients ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:01:41,783 - XGBFedController - INFO - sending task start to clients ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:01:42,824 - GrpcClientAdaptor - INFO - Start internal server at 127.0.0.1:53586[0m
[38m2025-08-05 14:01:42,828 - GrpcClientAdaptor - INFO - Start internal server at 127.0.0.1:53897[0m
[38m2025-08-05 14:01:42,829 - GrpcServer - INFO - XGBServer: added insecure port at 127.0.0.1:53586[0m
[38m2025-08-05 14:01:42,830 - GrpcServer - INFO - starting gRPC Server[0m
[38m2025-08-05 14:01:42,831 - GrpcClientAdaptor - INFO - Started internal server at 127.0.0.1:53586[0m
[38m2025-08-05 14:01:42,832 - GrpcClientAdaptor - INFO - starting XGBoost Server in another thread[0m
[38m2025-08-05 14:01:42,832 - XGBClientR

[14:01:45] [0]	eval-auc:0.48822	train-auc:0.58014
[14:01:45] [0]	eval-auc:0.48822	train-auc:0.58014
[14:01:45] [0]	eval-auc:0.48822	train-auc:0.58014
[14:01:45] [0]	eval-auc:0.48822	train-auc:0.58014
[14:01:45] [0]	eval-auc:0.48822	train-auc:0.58014


[38m2025-08-05 14:01:45,911 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=29 finished processing[0m
[38m2025-08-05 14:01:45,911 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=29 finished processing[0m
[38m2025-08-05 14:01:45,924 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=29 finished processing[0m
[38m2025-08-05 14:01:45,970 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:45,971 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:45,971 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:45,972 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:45,973 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:45,979 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=30 finished processing[0m
[38m2025-08-05 14:01:45,993 - GrpcClientAd

[14:01:47] [1]	eval-auc:0.49518	train-auc:0.59616
[14:01:47] [1]	eval-auc:0.49518	train-auc:0.59616
[14:01:47] [1]	eval-auc:0.49518	train-auc:0.59616
[14:01:47] [1]	eval-auc:0.49518	train-auc:0.59616
[14:01:47] [1]	eval-auc:0.49518	train-auc:0.59616


[38m2025-08-05 14:01:47,272 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=44 finished processing[0m
[38m2025-08-05 14:01:47,280 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=44 finished processing[0m
[38m2025-08-05 14:01:47,325 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:47,326 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:47,327 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:47,328 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:47,329 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:47,331 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=45 finished processing[0m
[38m2025-08-05 14:01:47,340 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=45 finished processing[0m
[38m2025-08-05 14:01:47,352 - GrpcClientAd

[14:01:48] [2]	eval-auc:0.50104	train-auc:0.64051
[14:01:48] [2]	eval-auc:0.50104	train-auc:0.64051
[14:01:48] [2]	eval-auc:0.50104	train-auc:0.64051
[14:01:48] [2]	eval-auc:0.50104	train-auc:0.64051
[14:01:48] [2]	eval-auc:0.50104	train-auc:0.64051


[38m2025-08-05 14:01:48,613 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:48,614 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:48,615 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:48,617 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:48,618 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:48,632 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=59 finished processing[0m
[38m2025-08-05 14:01:48,632 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=59 finished processing[0m
[38m2025-08-05 14:01:48,644 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=59 finished processing[0m
[38m2025-08-05 14:01:48,649 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=59 finished processing[0m
[38m2025-08-05 14:01:48,659 - GrpcClientAd

[14:01:49] [3]	eval-auc:0.50185	train-auc:0.65675
[14:01:49] [3]	eval-auc:0.50185	train-auc:0.65675
[14:01:49] [3]	eval-auc:0.50185	train-auc:0.65675
[14:01:49] [3]	eval-auc:0.50185	train-auc:0.65675
[14:01:49] [3]	eval-auc:0.50185	train-auc:0.65675


[38m2025-08-05 14:01:49,921 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=73 finished processing[0m
[38m2025-08-05 14:01:49,925 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=73 finished processing[0m
[38m2025-08-05 14:01:49,940 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=73 finished processing[0m
[38m2025-08-05 14:01:49,945 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=73 finished processing[0m
[38m2025-08-05 14:01:49,988 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:49,989 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:49,990 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:49,990 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:49,991 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:50,003 - GrpcClientAd

[14:01:51] [4]	eval-auc:0.50390	train-auc:0.67678
[14:01:51] [4]	eval-auc:0.50390	train-auc:0.67678
[14:01:51] [4]	eval-auc:0.50390	train-auc:0.67678
[14:01:51] [4]	eval-auc:0.50390	train-auc:0.67678
[14:01:51] [4]	eval-auc:0.50390	train-auc:0.67678


[38m2025-08-05 14:01:51,464 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=90 finished processing[0m
[38m2025-08-05 14:01:51,468 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=90 finished processing[0m
[38m2025-08-05 14:01:51,468 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=90 finished processing[0m
[38m2025-08-05 14:01:51,468 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=90 finished processing[0m
[38m2025-08-05 14:01:51,516 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:51,516 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:51,517 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:51,518 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:51,519 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:51,543 - GrpcClientAd

[14:01:52] [5]	eval-auc:0.50245	train-auc:0.68385
[14:01:52] [5]	eval-auc:0.50245	train-auc:0.68385
[14:01:52] [5]	eval-auc:0.50245	train-auc:0.68385
[14:01:52] [5]	eval-auc:0.50245	train-auc:0.68385
[14:01:52] [5]	eval-auc:0.50245	train-auc:0.68385


[38m2025-08-05 14:01:52,824 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=105 finished processing[0m
[38m2025-08-05 14:01:52,832 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=105 finished processing[0m
[38m2025-08-05 14:01:52,832 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=105 finished processing[0m
[38m2025-08-05 14:01:52,836 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=105 finished processing[0m
[38m2025-08-05 14:01:52,842 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=105 finished processing[0m
[38m2025-08-05 14:01:52,892 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:52,892 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:52,893 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:52,893 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m202

[14:01:53] [6]	eval-auc:0.50901	train-auc:0.69236
[14:01:53] [6]	eval-auc:0.50901	train-auc:0.69236
[14:01:54] [6]	eval-auc:0.50901	train-auc:0.69236
[14:01:54] [6]	eval-auc:0.50901	train-auc:0.69236
[14:01:54] [6]	eval-auc:0.50901	train-auc:0.69236


[38m2025-08-05 14:01:54,157 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:54,158 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:54,159 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:54,160 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:54,161 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:54,175 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=120 finished processing[0m
[38m2025-08-05 14:01:54,180 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=120 finished processing[0m
[38m2025-08-05 14:01:54,191 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=120 finished processing[0m
[38m2025-08-05 14:01:54,192 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=120 finished processing[0m
[38m2025-08-05 14:01:54,209 - GrpcClie

[14:01:55] [7]	eval-auc:0.51034	train-auc:0.70627
[14:01:55] [7]	eval-auc:0.51034	train-auc:0.70627
[14:01:55] [7]	eval-auc:0.51034	train-auc:0.70627
[14:01:55] [7]	eval-auc:0.51034	train-auc:0.70627
[14:01:55] [7]	eval-auc:0.51034	train-auc:0.70627


[38m2025-08-05 14:01:55,429 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=134 finished processing[0m
[38m2025-08-05 14:01:55,437 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=134 finished processing[0m
[38m2025-08-05 14:01:55,481 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:55,481 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:55,483 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:55,483 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:55,485 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:55,491 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=135 finished processing[0m
[38m2025-08-05 14:01:55,495 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=135 finished processing[0m
[38m2025-08-05 14:01:55,503 - GrpcClie

[14:01:56] [8]	eval-auc:0.50742	train-auc:0.71378
[14:01:56] [8]	eval-auc:0.50742	train-auc:0.71378
[14:01:56] [8]	eval-auc:0.50742	train-auc:0.71378
[14:01:56] [8]	eval-auc:0.50742	train-auc:0.71378
[14:01:56] [8]	eval-auc:0.50742	train-auc:0.71378


[38m2025-08-05 14:01:56,779 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=149 finished processing[0m
[38m2025-08-05 14:01:56,783 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=149 finished processing[0m
[38m2025-08-05 14:01:56,788 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=149 finished processing[0m
[38m2025-08-05 14:01:56,792 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=149 finished processing[0m
[38m2025-08-05 14:01:56,796 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=149 finished processing[0m
[38m2025-08-05 14:01:56,840 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:56,841 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:56,842 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-08-05 14:01:56,842 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m202

[14:01:58] [9]	eval-auc:0.50278	train-auc:0.73504
[14:01:58] Finished training
[14:01:58] [9]	eval-auc:0.50278	train-auc:0.73504
[14:01:58] [9]	eval-auc:0.50278	train-auc:0.73504
[14:01:58] [9]	eval-auc:0.50278	train-auc:0.73504
[14:01:58] Finished training
[14:01:58] [9]	eval-auc:0.50278	train-auc:0.73504
[14:01:58] Finished training
[14:01:58] Finished training
[14:01:58] Finished training


[38m2025-08-05 14:01:58,113 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=163 finished processing[0m
[38m2025-08-05 14:01:58,524 - FedXGBHistogramExecutor - INFO - XGB Client Stopped[0m
[38m2025-08-05 14:01:58,526 - XGBFedController - INFO - XGB client is done with exit code 0[0m
[38m2025-08-05 14:01:58,532 - FedXGBHistogramExecutor - INFO - XGB Client Stopped[0m
[38m2025-08-05 14:01:58,534 - XGBFedController - INFO - XGB client is done with exit code 0[0m
[38m2025-08-05 14:01:58,535 - FedXGBHistogramExecutor - INFO - XGB Client Stopped[0m
[38m2025-08-05 14:01:58,536 - FedXGBHistogramExecutor - INFO - XGB Client Stopped[0m
[38m2025-08-05 14:01:58,537 - XGBFedController - INFO - XGB client is done with exit code 0[0m
[38m2025-08-05 14:01:58,538 - XGBFedController - INFO - XGB client is done with exit code 0[0m
[38m2025-08-05 14:01:58,544 - FedXGBHistogramExecutor - INFO - XGB Client Stopped[0m
[38m2025-08-05 14:01:58,546 - XGBFedController - INF

#### With GNN embeddings

In [50]:
from xgb_embed_data_loader import CreditCardEmbedDataLoader

from nvflare import FedJob
from nvflare.app_opt.xgboost.histogram_based_v2.fed_controller import XGBFedController
from nvflare.app_opt.xgboost.histogram_based_v2.fed_executor import (
    FedXGBHistogramExecutor,
)

num_rounds = 10
early_stopping_rounds = 10
xgb_params = {
    "max_depth": 8,
    "eta": 0.1,
    "objective": "binary:logistic",
    "eval_metric": "auc",
    "tree_method": "hist",
    "nthread": 16,
}

job = FedJob(name="xgb_job_embed")

# Define the controller workflow and send to server
controller = XGBFedController(
    num_rounds=num_rounds,
    data_split_mode=0,
    secure_training=False,
    xgb_params=xgb_params,
    xgb_options={"early_stopping_rounds": early_stopping_rounds},
)
job.to(controller, "server")

# Add clients
for site_name in site_names:
    executor = FedXGBHistogramExecutor(data_loader_id="data_loader")
    job.to(executor, site_name)
    data_loader = CreditCardEmbedDataLoader(
        root_dir=output_folder, file_postfix="_embedding.csv"
    )
    job.to(data_loader, site_name, id="data_loader")

if work_dir:
    print("work_dir=", work_dir)
    job.export_job(work_dir)

if not config_only:
    job.simulator_run(work_dir)


work_dir= /tmp/czt/jobs/workdir
[38m2025-08-05 14:02:05,679 - XGBFedController - INFO - Waiting for clients to be ready: ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:02:05,679 - XGBFedController - INFO - Configuring clients ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:02:05,679 - XGBFedController - INFO - sending task config to clients ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:02:10,675 - FedXGBHistogramExecutor - INFO - got my rank: 1[0m
[38m2025-08-05 14:02:10,693 - FedXGBHistogramExecutor - INFO - got my rank: 3[0m
[38m2025-08-05 14:02:10,697 - XGBFedController - INFO - successfully configured client XITXUS33_Bank_10[0m
[38m2025-08-05 14:02:10,710 - FedXGBHistogramExecutor - INFO - got my rank: 2[0m
[38m2025-08-05 14:02:10,715 - XGBFedController - IN

[14:02:11] Insecure federated server listening on 0.0.0.0:62378, world size 5


[38m2025-08-05 14:02:12,067 - XGBFedController - INFO - Starting clients ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:02:12,069 - XGBFedController - INFO - sending task start to clients ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-08-05 14:02:12,748 - GrpcClientAdaptor - INFO - Start internal server at 127.0.0.1:60669[0m
[38m2025-08-05 14:02:12,764 - GrpcServer - INFO - XGBServer: added insecure port at 127.0.0.1:60669[0m
[38m2025-08-05 14:02:12,765 - GrpcServer - INFO - starting gRPC Server[0m
[38m2025-08-05 14:02:12,768 - GrpcClientAdaptor - INFO - Started internal server at 127.0.0.1:60669[0m
[38m2025-08-05 14:02:12,769 - GrpcClientAdaptor - INFO - starting XGBoost Server in another thread[0m
[38m2025-08-05 14:02:12,771 - XGBClientRunner - INFO - XGB data_split_mode: 0 secure_training: False params: {'max_depth': 8, 'eta': 0.1, 'obje

## Prepare Job for POC and Production

With job running well in simulator, we are ready to run in a POC mode, so we can simulate the deployment in localhost or simply deploy to production. 

All we need is the job definition; we can use the job.export_job() method to generate the job configuration and export it to a given directory. For example, in xgb_job.py, we have the following

```
    if work_dir:
        print("work_dir=", work_dir)
        job.export_job(work_dir)

    if not args.config_only:
        job.simulator_run(work_dir)
```

let's try this out and then look at the directory. We use ```tree``` command if you have it. othewise, simply use ```ls -al ```

In [51]:
!find {work_dir} -type f -path "*/simulate_job/*"

/tmp/czt/jobs/workdir/ZNZZAU3M_Bank_8/simulate_job/meta.json
/tmp/czt/jobs/workdir/ZNZZAU3M_Bank_8/simulate_job/app_ZNZZAU3M_Bank_8/custom/xgb_embed_data_loader.py
/tmp/czt/jobs/workdir/ZNZZAU3M_Bank_8/simulate_job/app_ZNZZAU3M_Bank_8/config/config_fed_client.json
/tmp/czt/jobs/workdir/YXRXGB22_Bank_3/simulate_job/meta.json
/tmp/czt/jobs/workdir/YXRXGB22_Bank_3/simulate_job/app_YXRXGB22_Bank_3/custom/xgb_embed_data_loader.py
/tmp/czt/jobs/workdir/YXRXGB22_Bank_3/simulate_job/app_YXRXGB22_Bank_3/config/config_fed_client.json
/tmp/czt/jobs/workdir/YSYCESMM_Bank_7/simulate_job/meta.json
/tmp/czt/jobs/workdir/YSYCESMM_Bank_7/simulate_job/app_YSYCESMM_Bank_7/custom/xgb_embed_data_loader.py
/tmp/czt/jobs/workdir/YSYCESMM_Bank_7/simulate_job/app_YSYCESMM_Bank_7/config/config_fed_client.json
/tmp/czt/jobs/workdir/XITXUS33_Bank_10/simulate_job/meta.json
/tmp/czt/jobs/workdir/XITXUS33_Bank_10/simulate_job/app_XITXUS33_Bank_10/custom/xgb_embed_data_loader.py
/tmp/czt/jobs/workdir/XITXUS33_Bank_10

In [52]:
!cat /tmp/czt/workspace/xgb/credit_card/config/xgb_job/meta.json

cat: /tmp/czt/workspace/xgb/credit_card/config/xgb_job/meta.json: No such file or directory


Now we have the job definition, you can either run it in POC mode or production setup. 

* setup POC
``` 
    nvfalre poc prepare -c <list of clients>
    nvflare poc start -ex admin@nvidia.com  
```
  
* submit job using NVFLARE console 
        
    from different terminal 
   
   ```
   nvflare poc start -p admin@nvidia.com
   ```
   using submit job command
    
* use nvflare job submit command  to submit job

* use NVFLARE API to submit job
