# Credit card fraud detection with Federated XGBoost

This notebook shows how to convert an existing tabular credit dataset, enrich and pre-process the data using a single site (like a centralized dataset), and then convert this centralized process into federated ETL steps easily. Then, construct a federated XGBoost; the only thing the user needs to define is the XGBoost data loader.

## Step 1: Data Preparation 
First, we prepare the data by adding random transactional information to the base creditcard dataset following the below script:

* [prepare data](./notebooks/1.1.prepare_data.ipynb)

## Step 2: Feature Analysis

For this stage, we would like to analyze the data, understand the features, and derive (and encode) secondary features that can be more useful for building the model.

Towards this goal, there are two options:
1. **Feature Enrichment**: This process involves adding new features based on the existing data. For example, we can calculate the average transaction amount for each currency and add this as a new feature. 
2. **Feature Encoding**: This process involves encoding the current features and transforming them to embedding space via machine learning models. This model can be either pre-trained, or trained with the candidate dataset.

Considering the fact that the only two numerical features in the dataset are "Amount" and "Time", we will perform feature enrichment first. Optionally, we can also perform feature encoding. In this example, we use a graph neural network (GNN); we will train the GNN model in a federated, unsupervised fashion and then use the model to encode the features for all sites.

### Step 2.1: Rule-based Feature Enrichment

#### Single-site Enrichment and Additional Processing
The detailed feature enrichment step is illustrated using one site as example: 

* [feature_enrichments with-one-site](./notebooks/2.1.1.feature_enrichment.ipynb)

Similarly, we examine the additional pre-processing step using one site: 

* [pre-processing with one-site](./notebooks/2.1.2.pre_process.ipynb)

#### Federated Job to Perform on All Sites
In order to run feature enrichment and processing job on each site similar to above steps, we wrote federated ETL job scripts for client-side based on single-site implementations.

* [enrichment script](./src/enrich.py)
* [pre-processing script](./src/pre_process.py)

### (Optional) Step 2.2: GNN-based Feature Encoding
Based on raw features, or combining the derived features from **Step 2.1**, we can use machine learning models to encode the features. 
In this example, we use federated GNN to learn and generate the feature embeddings.

First, we construct a graph based on the transaction data. Each node represents a transaction, and the edges represent the relationships between transactions. We then use the GNN to learn the embeddings of the nodes, which represent the transaction features.

#### Single-site operation example: graph construction
The detailed graph construction step is illustrated using one site as example:

* [graph_construction with one-site](./notebooks/graph_construct.ipynb)

The detailed GNN training and encoding step is illustrated using one site as example:

* [gnn_training_encoding with one-site](./notebooks/gnn_train_encode.ipynb)

#### Federated Job to Perform on All Sites
In order to run feature graph construction job on each site similar to the enrichment and processing steps, we wrote federated ETL job scripts for client-side based on single-site implementations.

* [graph_construction script](./src/graph_construct.py)
* [gnn_train_encode script](./src/gnn_train_encode.py)


The resulting GNN encodings will be merged with the normalized data for enhancing the feature.

## Step 3: Federated XGBoost 

Now that we have the data ready, either enriched and normalized features, or GNN feature embeddings, we can fit them with XGBoost. NVIDIA FLARE has already written XGBoost Controller and Executor for the job. All we need to provide is the data loader to fit into the XGBoost.

Notice we assign defined a [```CreditCardDataLoader```](./src/xgb_data_loader.py), this a XGBLoader we defined to load the credit card dataset. 

```py
import os
from typing import Optional, Tuple

import pandas as pd
import xgboost as xgb
from xgboost.core import DataSplitMode

from src.app_opt.xgboost.data_loader import XGBDataLoader


class CreditCardDataLoader(XGBDataLoader):
    def __init__(self, root_dir: str, file_postfix: str):
        self.dataset_names = ["train", "test"]
        self.base_file_names = {}
        self.root_dir = root_dir
        self.file_postfix = file_postfix
        for name in self.dataset_names:
            self.base_file_names[name] = name + file_postfix
        self.numerical_columns = [
            "Timestamp",
            "Amount",
            "trans_volume",
            "total_amount",
            "average_amount",
            "hist_trans_volume",
            "hist_total_amount",
            "hist_average_amount",
            "x2_y1",
            "x3_y2",
        ]

    def load_data(self, client_id: str, split_mode: int) -> Tuple[xgb.DMatrix, xgb.DMatrix]:
        data = {}
        for ds_name in self.dataset_names:
            print("\nloading for site = ", client_id, f"{ds_name} dataset \n")
            file_name = os.path.join(self.root_dir, client_id, self.base_file_names[ds_name])
            df = pd.read_csv(file_name)
            data_num = len(data)

            # split to feature and label
            y = df["Class"]
            x = df[self.numerical_columns]
            data[ds_name] = (x, y, data_num)


        # training
        x_train, y_train, total_train_data_num = data["train"]
        data_split_mode = DataSplitMode(split_mode)
        dmat_train = xgb.DMatrix(x_train, label=y_train, data_split_mode=data_split_mode)

        # validation
        x_valid, y_valid, total_valid_data_num = data["test"]
        dmat_valid = xgb.DMatrix(x_valid, label=y_valid, data_split_mode=data_split_mode)

        return dmat_train, dmat_valid
```

We are now ready to run all the code

## Run All the Jobs End-to-end
Here we are going to run each job in sequence. For real-world use case,

* prepare data is not needed, as you already have the data
* feature enrichment / encoding scripts need to be defined based on your own technique
* for XGBoost Job, you will need to write your own data loader 

### Prepare Data

In [1]:
import kagglehub

# Download latest version
path = kagglehub.dataset_download("samayashar/fraud-detection-transactions-dataset")
input_csv = f"{path}/synthetic_fraud_dataset.csv"


# only generate config file, or also run the simulated job (on the same machine)
config_only = False
# the workdir is used to store the job config and the simulated job results for each node
work_dir = "/tmp/czt/jobs/workdir"
# the processed dataset folder is used to store the processed data, preparing for each node, and also output the results
output_folder = "/tmp/czt/dataset"

!mkdir -p {output_folder}
!mkdir -p {output_folder}

import sys
PY = sys.executable

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
! {PY} ./utils/prepare_data.py -i {input_csv} -o {output_folder}

Historical DataFrame size: 27500
Training DataFrame size: 17500
Testing DataFrame size: 5000
Saved HCBHSGSG history transactions to /tmp/czt/dataset/HCBHSGSG_Bank_9/history.csv
Saved XITXUS33 history transactions to /tmp/czt/dataset/XITXUS33_Bank_10/history.csv
Saved YSYCESMM history transactions to /tmp/czt/dataset/YSYCESMM_Bank_7/history.csv
Saved YXRXGB22 history transactions to /tmp/czt/dataset/YXRXGB22_Bank_3/history.csv
Saved ZNZZAU3M history transactions to /tmp/czt/dataset/ZNZZAU3M_Bank_8/history.csv
Saved HCBHSGSG train transactions to /tmp/czt/dataset/HCBHSGSG_Bank_9/train.csv
Saved XITXUS33 train transactions to /tmp/czt/dataset/XITXUS33_Bank_10/train.csv
Saved YSYCESMM train transactions to /tmp/czt/dataset/YSYCESMM_Bank_7/train.csv
Saved YXRXGB22 train transactions to /tmp/czt/dataset/YXRXGB22_Bank_3/train.csv
Saved ZNZZAU3M train transactions to /tmp/czt/dataset/ZNZZAU3M_Bank_8/train.csv
Saved HCBHSGSG test transactions to /tmp/czt/dataset/HCBHSGSG_Bank_9/test.csv
Saved X

In [3]:
site_names = [
    "HCBHSGSG_Bank_9",
    "XITXUS33_Bank_10",
    "YSYCESMM_Bank_7",
    "YXRXGB22_Bank_3",
    "ZNZZAU3M_Bank_8",
]

!echo {' '.join(site_names)}

HCBHSGSG_Bank_9 XITXUS33_Bank_10 YSYCESMM_Bank_7 YXRXGB22_Bank_3 ZNZZAU3M_Bank_8


In [4]:
from nvflare import FedJob
from nvflare.app_common.workflows.etl_controller import ETLController
from nvflare.job_config.script_runner import ScriptRunner

### Enrich data

In [5]:
job = FedJob(name="enrich_job")

enrich_ctrl = ETLController(task_name="enrich")
job.to(enrich_ctrl, "server", id="enrich")

# Add clients
for site_name in site_names:
    executor = ScriptRunner(
        # for this, we output the enriched data to the same folder
        script="src/enrich.py", script_args=f"-i {output_folder} -o {output_folder}"
    )
    job.to(executor, site_name, tasks=["enrich"])

if work_dir:
    print(f"{work_dir=}")
    job.export_job(work_dir)

if not config_only:
    job.simulator_run(work_dir)

work_dir='/tmp/czt/jobs/workdir'
[38m2025-09-06 13:27:20,784 - ETLController - INFO - Initializing BaseModelController workflow.[0m
[38m2025-09-06 13:27:20,785 - ETLController - INFO - Beginning model controller run.[0m
[38m2025-09-06 13:27:20,785 - ETLController - INFO - enrich task started.[0m
[38m2025-09-06 13:27:20,786 - ETLController - INFO - Sampled clients: ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-09-06 13:27:20,786 - ETLController - INFO - Sending task enrich to ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-09-06 13:27:26,053 - TaskScriptRunner - INFO - start task run() with full path: /tmp/czt/jobs/workdir/YSYCESMM_Bank_7/simulate_job/app_YSYCESMM_Bank_7/custom/src/enrich.py[0m
[38m2025-09-06 13:27:26,059 - TaskScriptRunner - INFO - start task run() with full path: /tmp/czt/jobs/workdir/HCBHSGSG_Bank_9/simulate_job/app_HCBHSGSG_Bank_9/

### Pre-Process Data

In [6]:
job = FedJob(name="pre_processing_job")

pre_process_ctrl = ETLController(task_name="pre_process")
job.to(pre_process_ctrl, "server", id="pre_process")

# Add clients
for site_name in site_names:
    executor = ScriptRunner(script="src/pre_process.py", script_args=f"-i {output_folder} -o {output_folder}")
    job.to(executor, site_name, tasks=["pre_process"])

if work_dir:
    print(f"{work_dir=}")
    job.export_job(work_dir)

if not config_only:
    job.simulator_run(work_dir)

work_dir='/tmp/czt/jobs/workdir'
[38m2025-09-06 13:27:34,959 - ETLController - INFO - Initializing BaseModelController workflow.[0m
[38m2025-09-06 13:27:34,960 - ETLController - INFO - Beginning model controller run.[0m
[38m2025-09-06 13:27:34,960 - ETLController - INFO - pre_process task started.[0m
[38m2025-09-06 13:27:34,960 - ETLController - INFO - Sampled clients: ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-09-06 13:27:34,960 - ETLController - INFO - Sending task pre_process to ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-09-06 13:27:40,234 - TaskScriptRunner - INFO - start task run() with full path: /tmp/czt/jobs/workdir/YSYCESMM_Bank_7/simulate_job/app_YSYCESMM_Bank_7/custom/src/pre_process.py[0m
[38m2025-09-06 13:27:40,245 - TaskScriptRunner - INFO - start task run() with full path: /tmp/czt/jobs/workdir/XITXUS33_Bank_10/simulate_job/app_

### Construct Graph

In [7]:
job = FedJob(name="graph_construct_job")

graph_construct_ctrl = ETLController(task_name="graph_construct")
job.to(graph_construct_ctrl, "server", id="graph_construct")

# Add clients
for site_name in site_names:
    executor = ScriptRunner(script="src/graph_construct.py", script_args=f"-i {output_folder} -o {output_folder}")
    job.to(executor, site_name, tasks=["graph_construct"])

if work_dir:
    print(f"{work_dir=}")
    job.export_job(work_dir)

if not config_only:
    job.simulator_run(work_dir)

work_dir='/tmp/czt/jobs/workdir'
[38m2025-09-06 13:27:50,213 - ETLController - INFO - Initializing BaseModelController workflow.[0m
[38m2025-09-06 13:27:50,214 - ETLController - INFO - Beginning model controller run.[0m
[38m2025-09-06 13:27:50,214 - ETLController - INFO - graph_construct task started.[0m
[38m2025-09-06 13:27:50,215 - ETLController - INFO - Sampled clients: ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-09-06 13:27:50,215 - ETLController - INFO - Sending task graph_construct to ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-09-06 13:27:55,358 - TaskScriptRunner - INFO - start task run() with full path: /tmp/czt/jobs/workdir/HCBHSGSG_Bank_9/simulate_job/app_HCBHSGSG_Bank_9/custom/src/graph_construct.py[0m
[38m2025-09-06 13:27:55,371 - PTInProcessClientAPIExecutor - INFO - execute for task (graph_construct)[0m
[38m2025-09-06 13:27:55,3

### GNN Training and Encoding

In [8]:
from torch_geometric.nn import GraphSAGE

from nvflare import FedJob
from nvflare.app_common.workflows.fedavg import FedAvg
from nvflare.app_opt.pt.job_config.model import PTModel
from nvflare.job_config.script_runner import ScriptRunner

job = FedJob(name="gnn_train_encode_job")

# Define the controller workflow and send to server
controller = FedAvg(
    num_clients=len(site_names),
    num_rounds=40,
)
job.to(controller, "server")

# Define the model
model = GraphSAGE(
    in_channels=30,
    hidden_channels=32,
    num_layers=2,
    out_channels=32,
)
job.to(PTModel(model), "server")

# Add clients
for site_name in site_names:
    executor = ScriptRunner(script="src/gnn_train_encode.py", script_args=f"-i {output_folder} -o {output_folder}")
    job.to(executor, site_name)

if work_dir:
    print(f"{work_dir=}")
    job.export_job(work_dir)

if not config_only:
    job.simulator_run(work_dir)

work_dir='/tmp/czt/jobs/workdir'
[38m2025-09-06 13:28:21,794 - FedAvg - INFO - Initializing BaseModelController workflow.[0m
[38m2025-09-06 13:28:21,795 - FedAvg - INFO - Beginning model controller run.[0m
[38m2025-09-06 13:28:21,795 - FedAvg - INFO - Start FedAvg.[0m
[38m2025-09-06 13:28:21,795 - FedAvg - INFO - loading initial model from persistor[0m
[38m2025-09-06 13:28:21,796 - PTFileModelPersistor - INFO - Both source_ckpt_file_full_name and ckpt_preload_path are not provided. Using the default model weights initialized on the persistor side.[0m
[38m2025-09-06 13:28:21,796 - FedAvg - INFO - Round 0 started.[0m
[38m2025-09-06 13:28:21,797 - FedAvg - INFO - Sampled clients: ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-09-06 13:28:21,797 - FedAvg - INFO - Sending task train to ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-09-06 13:28:26,523 -

### GNN Encoding Merge

In [9]:
! {PY} ./utils/merge_feat.py -i {output_folder}

Processing folder:  ZNZZAU3M_Bank_8
  Processing train dataset:
    GNN features shape: (3486, 32)
    Embedding shape: (3486, 34)
    Combined shape: (3518, 64)
    Columns: 64 features
    Saved to: /tmp/czt/dataset/ZNZZAU3M_Bank_8/train_combined.csv
  Processing test dataset:
    GNN features shape: (1008, 32)
    Embedding shape: (1008, 34)
    Combined shape: (1012, 64)
    Columns: 64 features
    Saved to: /tmp/czt/dataset/ZNZZAU3M_Bank_8/test_combined.csv
Processing folder:  YXRXGB22_Bank_3
  Processing train dataset:
    GNN features shape: (3558, 32)
    Embedding shape: (3558, 34)
    Combined shape: (3594, 64)
    Columns: 64 features
    Saved to: /tmp/czt/dataset/YXRXGB22_Bank_3/train_combined.csv
  Processing test dataset:
    GNN features shape: (952, 32)
    Embedding shape: (952, 34)
    Combined shape: (952, 64)
    Columns: 64 features
    Saved to: /tmp/czt/dataset/YXRXGB22_Bank_3/test_combined.csv
Processing folder:  YSYCESMM_Bank_7
  Processing train dataset:
   

### Run XGBoost Job
#### Without GNN embeddings

In [10]:
from nvflare.app_opt.xgboost.histogram_based_v2.fed_controller import XGBFedController
from nvflare.app_opt.xgboost.histogram_based_v2.fed_executor import (
    FedXGBHistogramExecutor,
)

from xgb_data_loader import CreditCardDataLoader


num_rounds = 8
early_stopping_rounds = 5
xgb_params = {
    "max_depth": 7,
    "eta": 0.1,
    "objective": "binary:logistic",
    "eval_metric": "auc",
    "tree_method": "hist",
    "nthread": 4,
    "max_bin": 256,
    "grow_policy": "lossguide",
    "max_leaves": 64,
}

job = FedJob(name="xgb_job")

# Define the controller workflow and send to server
controller = XGBFedController(
    num_rounds=num_rounds,
    data_split_mode=0,
    secure_training=False,
    xgb_params=xgb_params,
    xgb_options={"early_stopping_rounds": early_stopping_rounds},
)
job.to(controller, "server")

# Add clients
for site_name in site_names:
    executor = FedXGBHistogramExecutor(data_loader_id="data_loader")
    job.to(executor, site_name)
    data_loader = CreditCardDataLoader(root_dir=output_folder, file_postfix="_normalized.csv")
    job.to(data_loader, site_name, id="data_loader")

if work_dir:
    print("work_dir=", work_dir)
    job.export_job(work_dir)

if not config_only:
    job.simulator_run(work_dir)


work_dir= /tmp/czt/jobs/workdir
[38m2025-09-06 13:31:18,435 - XGBFedController - INFO - Waiting for clients to be ready: ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-09-06 13:31:18,436 - XGBFedController - INFO - Configuring clients ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-09-06 13:31:18,436 - XGBFedController - INFO - sending task config to clients ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-09-06 13:31:23,144 - FedXGBHistogramExecutor - INFO - got my rank: 2[0m
[38m2025-09-06 13:31:23,147 - FedXGBHistogramExecutor - INFO - got my rank: 1[0m
[38m2025-09-06 13:31:23,150 - FedXGBHistogramExecutor - INFO - got my rank: 3[0m
[38m2025-09-06 13:31:23,153 - FedXGBHistogramExecutor - INFO - got my rank: 0[0m
[38m2025-09-06 13:31:23,166 - XGBFedController - INFO - successfully configur

[13:31:23] Insecure federated server listening on 0.0.0.0:22698, world size 5


[38m2025-09-06 13:31:25,187 - GrpcClientAdaptor - INFO - Start internal server at 127.0.0.1:29581[0m
[38m2025-09-06 13:31:25,190 - GrpcClientAdaptor - INFO - Start internal server at 127.0.0.1:12541[0m
[38m2025-09-06 13:31:25,196 - GrpcClientAdaptor - INFO - Start internal server at 127.0.0.1:6927[0m
[38m2025-09-06 13:31:25,199 - GrpcClientAdaptor - INFO - Start internal server at 127.0.0.1:8649[0m
[38m2025-09-06 13:31:25,199 - GrpcServer - INFO - XGBServer: added insecure port at 127.0.0.1:12541[0m
[38m2025-09-06 13:31:25,199 - GrpcServer - INFO - XGBServer: added insecure port at 127.0.0.1:29581[0m
[38m2025-09-06 13:31:25,200 - GrpcServer - INFO - starting gRPC Server[0m
[38m2025-09-06 13:31:25,200 - GrpcServer - INFO - starting gRPC Server[0m
[38m2025-09-06 13:31:25,201 - GrpcClientAdaptor - INFO - Started internal server at 127.0.0.1:29581[0m
[38m2025-09-06 13:31:25,201 - GrpcClientAdaptor - INFO - Started internal server at 127.0.0.1:12541[0m
[38m2025-09-06 13

[13:31:30] [0]	eval-auc:0.76751	train-auc:0.82901
[13:31:30] [0]	eval-auc:0.76751	train-auc:0.82901
[13:31:30] [0]	eval-auc:0.76751	train-auc:0.82901
[13:31:30] [0]	eval-auc:0.76751	train-auc:0.82901
[13:31:30] [0]	eval-auc:0.76751	train-auc:0.82901


[38m2025-09-06 13:31:30,770 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:30,771 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:30,771 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:30,772 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:30,773 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:30,783 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=60 finished processing[0m
[38m2025-09-06 13:31:30,796 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=60 finished processing[0m
[38m2025-09-06 13:31:30,803 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=60 finished processing[0m
[38m2025-09-06 13:31:30,807 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=60 finished processing[0m
[38m2025-09-06 13:31:30,815 - GrpcClientAd

[13:31:34] [1]	eval-auc:0.77011	train-auc:0.83433
[13:31:34] [1]	eval-auc:0.77011	train-auc:0.83433
[13:31:34] [1]	eval-auc:0.77011	train-auc:0.83433
[13:31:34] [1]	eval-auc:0.77011	train-auc:0.83433
[13:31:34] [1]	eval-auc:0.77011	train-auc:0.83433


[38m2025-09-06 13:31:34,572 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:34,573 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:34,573 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:34,573 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:34,573 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:34,611 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=106 finished processing[0m
[38m2025-09-06 13:31:34,614 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=106 finished processing[0m
[38m2025-09-06 13:31:34,615 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=106 finished processing[0m
[38m2025-09-06 13:31:34,615 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=106 finished processing[0m
[38m2025-09-06 13:31:34,616 - GrpcClie

[13:31:38] [2]	eval-auc:0.77244	train-auc:0.83955
[13:31:38] [2]	eval-auc:0.77244	train-auc:0.83955
[13:31:38] [2]	eval-auc:0.77244	train-auc:0.83955
[13:31:38] [2]	eval-auc:0.77244	train-auc:0.83955
[13:31:38] [2]	eval-auc:0.77244	train-auc:0.83955


[38m2025-09-06 13:31:38,607 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=154 finished processing[0m
[38m2025-09-06 13:31:38,614 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=154 finished processing[0m
[38m2025-09-06 13:31:38,662 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:38,663 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:38,663 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:38,663 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:38,663 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:38,671 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=155 finished processing[0m
[38m2025-09-06 13:31:38,672 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=155 finished processing[0m
[38m2025-09-06 13:31:38,676 - GrpcClie

[13:31:42] [3]	eval-auc:0.77455	train-auc:0.84388
[13:31:42] [3]	eval-auc:0.77455	train-auc:0.84388
[13:31:42] [3]	eval-auc:0.77455	train-auc:0.84388
[13:31:42] [3]	eval-auc:0.77455	train-auc:0.84388
[13:31:42] [3]	eval-auc:0.77455	train-auc:0.84388


[38m2025-09-06 13:31:42,864 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:42,865 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:42,865 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:42,866 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:42,866 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:42,877 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=201 finished processing[0m
[38m2025-09-06 13:31:42,879 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=201 finished processing[0m
[38m2025-09-06 13:31:42,900 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=201 finished processing[0m
[38m2025-09-06 13:31:42,905 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=201 finished processing[0m
[38m2025-09-06 13:31:42,907 - GrpcClie

[13:31:46] [4]	eval-auc:0.77625	train-auc:0.84757
[13:31:46] [4]	eval-auc:0.77625	train-auc:0.84757
[13:31:46] [4]	eval-auc:0.77625	train-auc:0.84757
[13:31:46] [4]	eval-auc:0.77625	train-auc:0.84757
[13:31:46] [4]	eval-auc:0.77625	train-auc:0.84757


[38m2025-09-06 13:31:46,408 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:46,408 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:46,408 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:46,409 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:46,409 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:46,420 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=244 finished processing[0m
[38m2025-09-06 13:31:46,422 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=244 finished processing[0m
[38m2025-09-06 13:31:46,447 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=244 finished processing[0m
[38m2025-09-06 13:31:46,447 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=244 finished processing[0m
[38m2025-09-06 13:31:46,463 - GrpcClie

[13:31:50] [5]	eval-auc:0.77665	train-auc:0.85229
[13:31:50] [5]	eval-auc:0.77665	train-auc:0.85229
[13:31:50] [5]	eval-auc:0.77665	train-auc:0.85229[13:31:50] [5]	eval-auc:0.77665	train-auc:0.85229

[13:31:50] [5]	eval-auc:0.77665	train-auc:0.85229


[38m2025-09-06 13:31:50,126 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:50,127 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:50,127 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:50,127 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:50,127 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:50,138 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=289 finished processing[0m
[38m2025-09-06 13:31:50,142 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=289 finished processing[0m
[38m2025-09-06 13:31:50,146 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=289 finished processing[0m
[38m2025-09-06 13:31:50,147 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=289 finished processing[0m
[38m2025-09-06 13:31:50,168 - GrpcClie

[13:31:54] [6]	eval-auc:0.77753	train-auc:0.85540
[13:31:54] [6]	eval-auc:0.77753	train-auc:0.85540
[13:31:54] [6]	eval-auc:0.77753	train-auc:0.85540
[13:31:54] [6]	eval-auc:0.77753	train-auc:0.85540
[13:31:54] [6]	eval-auc:0.77753	train-auc:0.85540


[38m2025-09-06 13:31:54,508 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=339 finished processing[0m
[38m2025-09-06 13:31:54,511 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=339 finished processing[0m
[38m2025-09-06 13:31:54,515 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=339 finished processing[0m
[38m2025-09-06 13:31:54,564 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:54,564 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:54,565 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:54,565 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:54,565 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:31:54,584 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=340 finished processing[0m
[38m2025-09-06 13:31:54,587 - GrpcClie

[13:31:58] [7]	eval-auc:0.77958	train-auc:0.85790
[13:31:58] Finished training
[13:31:58] [7]	eval-auc:0.77958	train-auc:0.85790
[13:31:58] Finished training
[13:31:58] [7]	eval-auc:0.77958	train-auc:0.85790
[13:31:58] Finished training
[13:31:58] [7]	eval-auc:0.77958	train-auc:0.85790
[13:31:58] Finished training
[13:31:58] [7]	eval-auc:0.77958	train-auc:0.85790
[13:31:58] Finished training


[38m2025-09-06 13:31:59,067 - FedXGBHistogramExecutor - INFO - XGB Client Stopped[0m
[38m2025-09-06 13:31:59,070 - FedXGBHistogramExecutor - INFO - XGB Client Stopped[0m
[38m2025-09-06 13:31:59,071 - XGBFedController - INFO - XGB client is done with exit code 0[0m
[38m2025-09-06 13:31:59,072 - FedXGBHistogramExecutor - INFO - XGB Client Stopped[0m
[38m2025-09-06 13:31:59,073 - XGBFedController - INFO - XGB client is done with exit code 0[0m
[38m2025-09-06 13:31:59,074 - FedXGBHistogramExecutor - INFO - XGB Client Stopped[0m
[38m2025-09-06 13:31:59,075 - XGBFedController - INFO - XGB client is done with exit code 0[0m
[38m2025-09-06 13:31:59,076 - XGBFedController - INFO - XGB client is done with exit code 0[0m
[38m2025-09-06 13:32:00,506 - XGBFedController - INFO - client HCBHSGSG_Bank_9 is Done[0m
[38m2025-09-06 13:32:00,507 - XGBFedController - INFO - client XITXUS33_Bank_10 is Done[0m
[38m2025-09-06 13:32:00,507 - XGBFedController - INFO - client YXRXGB22_Bank_3

In [11]:

# Save off the final model for Bank 3 for later analysis
import os
import shutil
# Create directory for saved models
save_dir = os.path.expanduser("./saved_models/credit_fraud")
os.makedirs(save_dir, exist_ok=True)

# Copy the model
source_model = "/tmp/czt/jobs/workdir/YXRXGB22_Bank_3/simulate_job/model.json"
dest_model = os.path.join(save_dir, "xgb_model_bank3.json")

shutil.copy2(source_model, dest_model)
print(f"Model saved to: {dest_model}")

Model saved to: ./saved_models/credit_fraud/xgb_model_bank3.json


#### With GNN embeddings

In [12]:
from xgb_data_loader import CreditCardDataLoaderWithGNN

from nvflare import FedJob
from nvflare.app_opt.xgboost.histogram_based_v2.fed_controller import XGBFedController
from nvflare.app_opt.xgboost.histogram_based_v2.fed_executor import (
    FedXGBHistogramExecutor,
)

num_rounds = 8
early_stopping_rounds = 5
xgb_params = {
    "max_depth": 7,
    "eta": 0.1,
    "objective": "binary:logistic",
    "eval_metric": "auc",
    "tree_method": "hist",
    "nthread": 4,
    "max_bin": 256,
    "grow_policy": "lossguide",
    "max_leaves": 64,
}

job = FedJob(name="xgb_job_embed")

# Define the controller workflow and send to server
controller = XGBFedController(
    num_rounds=num_rounds,
    data_split_mode=0,
    secure_training=False,
    xgb_params=xgb_params,
    xgb_options={"early_stopping_rounds": early_stopping_rounds},
)
job.to(controller, "server")

# Add clients
for site_name in site_names:
    executor = FedXGBHistogramExecutor(data_loader_id="data_loader")
    job.to(executor, site_name)
    data_loader = CreditCardDataLoaderWithGNN(
        root_dir=output_folder, file_postfix="_combined.csv"
    )
    job.to(data_loader, site_name, id="data_loader")

if work_dir:
    print("work_dir=", work_dir)
    job.export_job(work_dir)

if not config_only:
    job.simulator_run(work_dir)



work_dir= /tmp/czt/jobs/workdir
[38m2025-09-06 13:32:59,721 - XGBFedController - INFO - Waiting for clients to be ready: ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-09-06 13:32:59,722 - XGBFedController - INFO - Configuring clients ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-09-06 13:32:59,722 - XGBFedController - INFO - sending task config to clients ['HCBHSGSG_Bank_9', 'XITXUS33_Bank_10', 'YSYCESMM_Bank_7', 'YXRXGB22_Bank_3', 'ZNZZAU3M_Bank_8'][0m
[38m2025-09-06 13:33:04,288 - FedXGBHistogramExecutor - INFO - got my rank: 1[0m
[38m2025-09-06 13:33:04,300 - XGBFedController - INFO - successfully configured client XITXUS33_Bank_10[0m
[38m2025-09-06 13:33:04,340 - FedXGBHistogramExecutor - INFO - got my rank: 2[0m
[38m2025-09-06 13:33:04,352 - XGBFedController - INFO - successfully configured client YSYCESMM_Bank_7[0m
[38m2025-09-06 13:33:04,36

[13:33:04] Insecure federated server listening on 0.0.0.0:25531, world size 5


[38m2025-09-06 13:33:06,313 - GrpcClientAdaptor - INFO - Start internal server at 127.0.0.1:36990[0m
[38m2025-09-06 13:33:06,319 - GrpcServer - INFO - XGBServer: added insecure port at 127.0.0.1:36990[0m
[38m2025-09-06 13:33:06,319 - GrpcServer - INFO - starting gRPC Server[0m
[38m2025-09-06 13:33:06,321 - GrpcClientAdaptor - INFO - Started internal server at 127.0.0.1:36990[0m
[38m2025-09-06 13:33:06,321 - GrpcClientAdaptor - INFO - starting XGBoost Server in another thread[0m
[38m2025-09-06 13:33:06,322 - XGBClientRunner - INFO - XGB data_split_mode: 0 secure_training: False params: {'max_depth': 7, 'eta': 0.1, 'objective': 'binary:logistic', 'eval_metric': 'auc', 'tree_method': 'hist', 'nthread': 4, 'max_bin': 256, 'grow_policy': 'lossguide', 'max_leaves': 64} XGB options: {'early_stopping_rounds': 5}[0m
[38m2025-09-06 13:33:06,322 - GrpcClientAdaptor - INFO - Started external XGB Client[0m
[38m2025-09-06 13:33:06,323 - XGBClientRunner - INFO - server address is 127.0

[13:33:10] [0]	eval-auc:0.78870	train-auc:0.80790
[13:33:10] [0]	eval-auc:0.78870	train-auc:0.80790
[13:33:10] [0]	eval-auc:0.78870	train-auc:0.80790[13:33:10] [0]	eval-auc:0.78870	train-auc:0.80790

[13:33:10] [0]	eval-auc:0.78870	train-auc:0.80790


[38m2025-09-06 13:33:10,517 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:10,517 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:10,518 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:10,519 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:10,520 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:10,524 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=61 finished processing[0m
[38m2025-09-06 13:33:10,527 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=61 finished processing[0m
[38m2025-09-06 13:33:10,532 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=61 finished processing[0m
[38m2025-09-06 13:33:10,539 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=61 finished processing[0m
[38m2025-09-06 13:33:10,539 - GrpcClientAd

[13:33:12] [1]	eval-auc:0.79704	train-auc:0.82234
[13:33:12] [1]	eval-auc:0.79704	train-auc:0.82234
[13:33:12] [1]	eval-auc:0.79704	train-auc:0.82234
[13:33:12] [1]	eval-auc:0.79704	train-auc:0.82234
[13:33:12] [1]	eval-auc:0.79704	train-auc:0.82234


[38m2025-09-06 13:33:12,417 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:12,417 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:12,417 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:12,418 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:12,418 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:12,448 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=101 finished processing[0m
[38m2025-09-06 13:33:12,451 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=101 finished processing[0m
[38m2025-09-06 13:33:12,458 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=101 finished processing[0m
[38m2025-09-06 13:33:12,458 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=101 finished processing[0m
[38m2025-09-06 13:33:12,464 - GrpcClie

[13:33:14] [2]	eval-auc:0.79584	train-auc:0.82792
[13:33:14] [2]	eval-auc:0.79584	train-auc:0.82792
[13:33:14] [2]	eval-auc:0.79584	train-auc:0.82792
[13:33:14] [2]	eval-auc:0.79584	train-auc:0.82792
[13:33:14] [2]	eval-auc:0.79584	train-auc:0.82792


[38m2025-09-06 13:33:14,645 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:14,646 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:14,646 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:14,648 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:14,649 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:14,651 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=147 finished processing[0m
[38m2025-09-06 13:33:14,654 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=147 finished processing[0m
[38m2025-09-06 13:33:14,660 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=147 finished processing[0m
[38m2025-09-06 13:33:14,664 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=147 finished processing[0m
[38m2025-09-06 13:33:14,675 - GrpcClie

[13:33:16] [3]	eval-auc:0.79277	train-auc:0.83214
[13:33:16] [3]	eval-auc:0.79277	train-auc:0.83214
[13:33:16] [3]	eval-auc:0.79277	train-auc:0.83214
[13:33:16] [3]	eval-auc:0.79277	train-auc:0.83214
[13:33:16] [3]	eval-auc:0.79277	train-auc:0.83214


[38m2025-09-06 13:33:16,909 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=193 finished processing[0m
[38m2025-09-06 13:33:16,924 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=193 finished processing[0m
[38m2025-09-06 13:33:16,927 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=193 finished processing[0m
[38m2025-09-06 13:33:16,928 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=193 finished processing[0m
[38m2025-09-06 13:33:16,928 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=193 finished processing[0m
[38m2025-09-06 13:33:16,942 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:16,942 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:16,942 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:16,943 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m202

[13:33:18] [4]	eval-auc:0.79547	train-auc:0.83660
[13:33:18] [4]	eval-auc:0.79547	train-auc:0.83660
[13:33:18] [4]	eval-auc:0.79547	train-auc:0.83660[13:33:18] [4]	eval-auc:0.79547	train-auc:0.83660

[13:33:18] [4]	eval-auc:0.79547	train-auc:0.83660


[38m2025-09-06 13:33:18,711 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:18,711 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:18,711 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:18,711 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:18,712 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:18,742 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=242 finished processing[0m
[38m2025-09-06 13:33:18,749 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=242 finished processing[0m
[38m2025-09-06 13:33:18,749 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=242 finished processing[0m
[38m2025-09-06 13:33:18,753 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=242 finished processing[0m
[38m2025-09-06 13:33:18,753 - GrpcClie

[13:33:20] [5]	eval-auc:0.79700	train-auc:0.84373
[13:33:20] [5]	eval-auc:0.79700	train-auc:0.84373
[13:33:20] [5]	eval-auc:0.79700	train-auc:0.84373
[13:33:20] [5]	eval-auc:0.79700	train-auc:0.84373
[13:33:20] [5]	eval-auc:0.79700	train-auc:0.84373


[38m2025-09-06 13:33:20,771 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:20,772 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:20,774 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:20,775 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:20,779 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=288 finished processing[0m
[38m2025-09-06 13:33:20,780 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=2 seq=288 finished processing[0m
[38m2025-09-06 13:33:20,782 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=288 finished processing[0m
[38m2025-09-06 13:33:20,785 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=288 finished processing[0m
[38m2025-09-06 13:33:20,786 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=1 seq=288 finished processing[0m
[38m202

[13:33:22] [6]	eval-auc:0.79738	train-auc:0.84528
[13:33:22] [6]	eval-auc:0.79738	train-auc:0.84528
[13:33:22] [6]	eval-auc:0.79738	train-auc:0.84528
[13:33:22] [6]	eval-auc:0.79738	train-auc:0.84528
[13:33:22] [6]	eval-auc:0.79738	train-auc:0.84528


[38m2025-09-06 13:33:23,085 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=3 seq=338 finished processing[0m
[38m2025-09-06 13:33:23,086 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=4 seq=338 finished processing[0m
[38m2025-09-06 13:33:23,094 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=338 finished processing[0m
[38m2025-09-06 13:33:23,106 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:23,107 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:23,107 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:23,107 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:23,108 - XGBFedController - INFO - received reply for 'all_reduce'[0m
[38m2025-09-06 13:33:23,112 - GrpcClientAdaptor - INFO - Request seq op='allreduce' rank=0 seq=339 finished processing[0m
[38m2025-09-06 13:33:23,115 - GrpcClie

[13:33:24] [7]	eval-auc:0.79833	train-auc:0.84825
[13:33:24] Finished training
[13:33:24] [7]	eval-auc:0.79833	train-auc:0.84825
[13:33:24] [7]	eval-auc:0.79833	train-auc:0.84825
[13:33:24] Finished training
[13:33:24] Finished training
[13:33:24] [7]	eval-auc:0.79833	train-auc:0.84825
[13:33:24] [7]	eval-auc:0.79833	train-auc:0.84825
[13:33:24] Finished training
[13:33:24] Finished training


[38m2025-09-06 13:33:24,972 - FedXGBHistogramExecutor - INFO - XGB Client Stopped[0m
[38m2025-09-06 13:33:24,975 - XGBFedController - INFO - XGB client is done with exit code 0[0m
[38m2025-09-06 13:33:24,981 - FedXGBHistogramExecutor - INFO - XGB Client Stopped[0m
[38m2025-09-06 13:33:24,984 - XGBFedController - INFO - XGB client is done with exit code 0[0m
[38m2025-09-06 13:33:24,987 - FedXGBHistogramExecutor - INFO - XGB Client Stopped[0m
[38m2025-09-06 13:33:24,990 - XGBFedController - INFO - XGB client is done with exit code 0[0m
[38m2025-09-06 13:33:25,037 - FedXGBHistogramExecutor - INFO - XGB Client Stopped[0m
[38m2025-09-06 13:33:25,040 - XGBFedController - INFO - XGB client is done with exit code 0[0m
[38m2025-09-06 13:33:25,041 - FedXGBHistogramExecutor - INFO - XGB Client Stopped[0m
[38m2025-09-06 13:33:25,043 - XGBFedController - INFO - XGB client is done with exit code 0[0m
[38m2025-09-06 13:33:26,579 - XGBFedController - INFO - client HCBHSGSG_Bank_9 

In [13]:
# Save off the final model for Bank 3 for later analysis
import os
import shutil
# Create directory for saved models
save_dir = os.path.expanduser("./saved_models")
os.makedirs(save_dir, exist_ok=True)

# Copy the model
source_model = "/tmp/czt/jobs/workdir/YXRXGB22_Bank_3/simulate_job/model.json"
dest_model = os.path.join(save_dir, "xgb_gnn_model_bank3.json")

shutil.copy2(source_model, dest_model)
print(f"Model saved to: {dest_model}")

Model saved to: ./saved_models/xgb_gnn_model_bank3.json


## Prepare Job for POC and Production

With job running well in simulator, we are ready to run in a POC mode, so we can simulate the deployment in localhost or simply deploy to production. 

All we need is the job definition; we can use the job.export_job() method to generate the job configuration and export it to a given directory. For example, in xgb_job.py, we have the following

```
    if work_dir:
        print("work_dir=", work_dir)
        job.export_job(work_dir)

    if not args.config_only:
        job.simulator_run(work_dir)
```

let's try this out and then look at the directory. We use ```tree``` command if you have it. othewise, simply use ```ls -al ```

In [14]:
!find {work_dir} -type f -path "*/simulate_job/*"

/tmp/czt/jobs/workdir/ZNZZAU3M_Bank_8/simulate_job/shap_beeswarm.png
/tmp/czt/jobs/workdir/ZNZZAU3M_Bank_8/simulate_job/app_ZNZZAU3M_Bank_8/custom/xgb_data_loader.py
/tmp/czt/jobs/workdir/ZNZZAU3M_Bank_8/simulate_job/app_ZNZZAU3M_Bank_8/config/config_fed_client.json
/tmp/czt/jobs/workdir/ZNZZAU3M_Bank_8/simulate_job/model.json
/tmp/czt/jobs/workdir/ZNZZAU3M_Bank_8/simulate_job/meta.json
/tmp/czt/jobs/workdir/YXRXGB22_Bank_3/simulate_job/shap_beeswarm.png
/tmp/czt/jobs/workdir/YXRXGB22_Bank_3/simulate_job/model.json
/tmp/czt/jobs/workdir/YXRXGB22_Bank_3/simulate_job/app_YXRXGB22_Bank_3/custom/xgb_data_loader.py
/tmp/czt/jobs/workdir/YXRXGB22_Bank_3/simulate_job/app_YXRXGB22_Bank_3/config/config_fed_client.json
/tmp/czt/jobs/workdir/YXRXGB22_Bank_3/simulate_job/meta.json
/tmp/czt/jobs/workdir/YSYCESMM_Bank_7/simulate_job/shap_beeswarm.png
/tmp/czt/jobs/workdir/YSYCESMM_Bank_7/simulate_job/app_YSYCESMM_Bank_7/custom/xgb_data_loader.py
/tmp/czt/jobs/workdir/YSYCESMM_Bank_7/simulate_job/app

In [15]:
!cat /tmp/czt/jobs/workdir/server/simulate_job/meta.json

{
    "name": "xgb_job_embed",
    "resource_spec": {},
    "min_clients": 1,
    "deploy_map": {
        "app_server": [
            "server"
        ],
        "app_HCBHSGSG_Bank_9": [
            "HCBHSGSG_Bank_9"
        ],
        "app_XITXUS33_Bank_10": [
            "XITXUS33_Bank_10"
        ],
        "app_YSYCESMM_Bank_7": [
            "YSYCESMM_Bank_7"
        ],
        "app_YXRXGB22_Bank_3": [
            "YXRXGB22_Bank_3"
        ],
        "app_ZNZZAU3M_Bank_8": [
            "ZNZZAU3M_Bank_8"
        ]
    },
    "job_folder_name": "xgb_job_embed",
    "byoc": true,
    "job_clients": [
        {
            "name": "HCBHSGSG_Bank_9"
        },
        {
            "name": "XITXUS33_Bank_10"
        },
        {
            "name": "YSYCESMM_Bank_7"
        },
        {
            "name": "YXRXGB22_Bank_3"
        },
        {
            "name": "ZNZZAU3M_Bank_8"
        }
    ]
}

Now we have the job definition, you can either run it in POC mode or production setup. 

* setup POC
``` 
    nvfalre poc prepare -c <list of clients>
    nvflare poc start -ex admin@nvidia.com  
```
  
* submit job using NVFLARE console 
        
    from different terminal 
   
   ```
   nvflare poc start -p admin@nvidia.com
   ```
   using submit job command
    
* use nvflare job submit command  to submit job

* use NVFLARE API to submit job