# 03 - Federated Learning

## Defines

Define the available types of federated learning.

 - 'STRATIFIED': Stratified sampling of the data. The data is split into a number of shards, and each shard is assigned to a client. The data is split in a stratified manner, meaning that the distribution of the labels is approximately the same in each shard.
 - 'MISSING_1_ATTACK' - Each client is assigned a shard of data, each shard is missing one of the attack labels. Other clients in the network are exposed to the attack label, but the specific client is not. This demonstrates the ability of federated learning to protect against unknown attacks.
 - '1_ATTACK_ONLY' - Each client is assigned a shard of data, each shard contains only one of the attack labels.
 - 'HALF_BENIGN_ONLY' - Half of the clients are exposed to Benign data only, the other half are exposed to all data.


In [49]:
### THIS SECTION NEEDS TO BE SET TO DETERMINE WHICH CONFIGURATION METHOD TO UTILISE

SPLIT_AVAILABLE_METHODS = ['STRATIFIED','MISSING_1_ATTACK', '1_ATTACK_ONLY', 'HALF_BENIGN_ONLY' ]
METHOD = 'HALF_BENIGN_ONLY'
NUM_OF_STRATIFIED_CLIENTS = 10  # only applies to stratified method
NUM_OF_ROUNDS = 10              # Number of FL rounds


The above test method in conjunction with the below classification selection will determine the number of clients.

EG: 
`STRATIFIED` with:
 - `ALL TYPES` - Results in `NUM_OF_STRATIFIED_CLIENTS` clients. Each client will have a stratified sample of the data.

`MISSING_1_ATTACK` with:
 - `individual_classifier` - Results in 33 clients. Each client will have benign traffic and 32 attack labels.
 - `group_classifier` - Results in 7 clients. Each client will have benign traffic and 6 attack groups.
 - `binary_classifier` - Results in 10 clients. Five clients will have benign traffic only and the other will have Benign and malicious attack labels.

`1_ATTACK_ONLY` with:
 - `individual_classifier` - Results in 33 clients. Each client will have benign traffic and 1 attack label.
 - `group_classifier` - Results in 7 clients. Each client will have benign traffic and 1 attack groups.
 - `binary_classifier` - Results in 10 clients. Five clients will have benign traffic only and the other will have Benign and malicious attack labels. - SAME AS MISSING_1_ATTACK for binary classifier

`HALF_BENIGN_ONLY` with:
 - `individual_classifier` - Results in 10 clients. Five clients will have benign traffic only and the other will have Benign and 33 malicious attack labels.
 - `group_classifier` - Results in 10 clients. Five clients will have benign traffic only and the other will have Benign and 7 malicious attack groups.
 - `binary_classifier` - Results in 10 clients. Five clients will have benign traffic only and the other will have Benign and malicious attack labels. - SAME AS MISSING_1_ATTACK for binary classifier


In [50]:
individual_classifier = False
group_classifier = False
binary_classifier = True


Include the defines for the dataframe columns and the attack labels and their mappings

In [51]:
from enum import Enum
from includes import *

##  Imports

In [52]:
%%capture
%pip install flwr[simulation] torch torchvision matplotlib sklearn openml

In [53]:
import os
import pandas as pd
import numpy as np
import flwr as fl
from tqdm import tqdm
import warnings
#warnings.filterwarnings('ignore')

import sklearn
from sklearn.linear_model import LogisticRegression
from sklearn import preprocessing
from sklearn.utils import shuffle
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
from sklearn.metrics import accuracy_score

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
from flwr.common import Metrics
from torch.utils.data import DataLoader, random_split


In [54]:
print("flwr", fl.__version__)
print("numpy", np.__version__)
print("torch", torch.__version__)

DEVICE = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(f"Training on {DEVICE}")

flwr 1.4.0
numpy 1.23.5
torch 2.0.1+cpu
Training on cpu


## Load the Dataset

In [55]:
DATASET_DIRECTORY = '../datasets/CICIoT2023/'

## Training data

Either read the training pickle file if it exists, or process the dataset from scratch.

In [56]:
# Check to see if the file 'training_data.pkl' exists in the directory. If it does, load it. If not, print an error.
if os.path.isfile('training_data.pkl'):
    print("File exists, loading data...")
    train_df = pd.read_pickle('training_data.pkl')
    print("Training data loaded from pickle file.")

else:
    df_sets = [k for k in os.listdir(DATASET_DIRECTORY) if k.endswith('.csv')]
    df_sets.sort()
    training_sets = df_sets[:int(len(df_sets)*.8)]
    test_sets = df_sets[int(len(df_sets)*.8):]

    # Print the number of files in each set
    print('Training sets: {}'.format(len(training_sets)))
    print('Test sets: {}'.format(len(test_sets)))

    # ######################
    # # HACK TEMP CODE
    # ######################
    # # Set training_sets to the last entry of training_sets
    # training_sets = training_sets[-33:]
    # print(f"HACK TO REPLICATE ORIGINAL AUTHORS CODE WITH ONE FILE TRAIN - {training_sets}")
    # #####################
    # # HACK END TEMP CODE
    # ######################

    # Concatenate all training sets into one dataframe
    dfs = []
    print("Reading training data...")
    for train_set in tqdm(training_sets):
        df_new = pd.read_csv(DATASET_DIRECTORY + train_set)
        dfs.append(df_new)
    train_df = pd.concat(dfs, ignore_index=True)

    # Complete training data set size
    print("Complete training data size: {}".format(train_df.shape))

    # Map y column to the dict_34_classes values - The pickle file already has this done.
    train_df['label'] = train_df['label'].map(dict_34_classes)

    # The training data is the 80% of the CSV files in the dataset. The test data is the remaining 20%.
    # The Ray Federated learning mechanism cannot cope with all of the 80% training data, so we will split
    # the training data using test_train_split. The test data will be ignored as we will use all the data 
    # from the train_sets files as our training data to keep parity with the original authors code.
    # 
    # By using a subset of the training data split this way, we can have a randomised selection of data
    # from all the training CSV files, stratified by the attack types.
    
    # Percentage of original training data to use.
    TRAIN_SIZE = 0.0125
    
    print(f"Splitting the data into {TRAIN_SIZE*100}%")
    
    X_train, X_test, y_train, y_test = train_test_split(train_df[X_columns], train_df[y_column], test_size= (1 - TRAIN_SIZE), random_state=42, stratify=train_df[y_column])

    # Recombine X_train, and y_train into a dataframe
    train_df = pd.concat([X_train, y_train], axis=1)

    # Clean up unused variables

    del X_train, y_train, X_test, y_test
    
    # Save the output to a pickle file
    print("Writing training data to pickle file...")
    train_df.to_pickle('training_data.pkl')

print("Training data size: {}".format(train_df.shape))


File exists, loading data...
Training data loaded from pickle file.
Training data size: (454330, 47)


In [57]:
# show the unique values counts in the label column for train_df
print("Counts of attacks in train_df:")
print(train_df['label'].value_counts())

Counts of attacks in train_df:
6     70072
4     52698
5     43768
2     39835
3     39480
1     39364
7     35032
13    32300
15    25994
14    19737
0     10686
17     9647
19     8665
18     7319
10     4397
26     2995
9      2798
8      2780
25     1739
24     1308
21      957
22      798
16      701
23      364
12      279
11      227
33      126
27       58
32       53
31       51
29       37
28       31
20       22
30       12
Name: label, dtype: int64


In [58]:
train_df

Unnamed: 0,flow_duration,Header_Length,Protocol Type,Duration,Rate,Srate,Drate,fin_flag_number,syn_flag_number,rst_flag_number,...,Std,Tot size,IAT,Number,Magnitue,Radius,Covariance,Variance,Weight,label
29467977,0.000000,172.92,6.00,65.69,4.552125,4.552125,0.0,0.0,0.0,0.0,...,6.433546,172.92,8.306406e+07,9.5,18.498183,9.106583,233.739318,0.19,141.55,5
8309328,0.000000,0.00,1.00,64.00,0.400035,0.400035,0.0,0.0,0.0,0.0,...,0.000000,42.00,8.314956e+07,9.5,9.165151,0.000000,0.000000,0.00,141.55,6
12088899,5.799076,108.00,6.00,64.00,0.344883,0.344883,0.0,0.0,0.0,0.0,...,0.000000,54.00,8.294671e+07,9.5,9.165151,0.000000,0.000000,0.00,141.55,15
2403491,0.146236,522.40,9.80,81.00,32.454440,32.454440,0.0,0.0,0.0,0.0,...,129.331845,163.50,3.758001e-03,5.5,19.087867,182.902849,21525.530784,0.90,38.50,25
33786435,0.000000,54.00,6.00,64.00,13.801025,13.801025,0.0,0.0,1.0,0.0,...,0.000000,54.00,8.308929e+07,9.5,10.392305,0.000000,0.000000,0.00,141.55,7
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
32944878,0.000000,0.00,1.00,64.00,108.962772,108.962772,0.0,0.0,0.0,0.0,...,0.000000,42.00,8.315005e+07,9.5,9.165151,0.000000,0.000000,0.00,141.55,6
23809291,0.000000,54.00,6.00,64.00,17.262038,17.262038,0.0,0.0,0.0,0.0,...,0.000000,54.00,8.307206e+07,9.5,10.425904,1.411650,6.126863,0.17,141.55,5
12491659,1.159555,79.38,5.94,63.36,0.620923,0.620923,0.0,0.0,1.0,0.0,...,0.273154,54.06,8.336239e+07,9.5,10.401036,0.386753,0.483918,0.16,141.55,7
34788276,0.000000,0.00,141.55,13,,,,,,,,,,,,,,,,,


---
## Test Data
Concat the test data into a single dataframe

In [59]:
# Check to see if the file 'test_data.pkl' exists in the directory. If it does, load it. If not, print an error.
testing_data_pickle_file = 'testing_data.pkl'

if os.path.isfile(testing_data_pickle_file):
    print(f"File {testing_data_pickle_file} exists, loading data...")
    test_df = pd.read_pickle(testing_data_pickle_file)
    print("Test data loaded from pickle file.")

else:
    print(f"File {testing_data_pickle_file} does not exist, constructing data...")

    df_sets = [k for k in os.listdir(DATASET_DIRECTORY) if k.endswith('.csv')]
    df_sets.sort()
    training_sets = df_sets[:int(len(df_sets)*.8)]
    test_sets = df_sets[int(len(df_sets)*.8):]

    ############################################
    ############################################
    # HACK - Make things quicker for now
    ############################################
    ############################################

    # test_sets = df_sets[int(len(df_sets)*.95):]
    
    # # Set training_sets to the last entry of training_sets
    # test_sets = test_sets[-2:]
    
    ############################################
    ############################################
    # END HACK 
    ############################################
    ############################################

    # Print the number of files in each set
    print('Test sets: {}'.format(len(test_sets)))
    
    # Concatenate all testing sets into one dataframe
    dfs = []
    print("Reading test data...")
    for test_set in tqdm(test_sets):
        df_new = pd.read_csv(DATASET_DIRECTORY + test_set)
        dfs.append(df_new)
    test_df = pd.concat(dfs, ignore_index=True)

    # Map y column to the dict_34_classes values - The pickle file already has this done.
    test_df['label'] = test_df['label'].map(dict_34_classes)

    # Save the output to a pickle file
    print(f"Writing test data to pickle file {testing_data_pickle_file}...")
    test_df.to_pickle(testing_data_pickle_file)

print("Testing data size: {}".format(test_df.shape))

File testing_data.pkl exists, loading data...
Test data loaded from pickle file.
Testing data size: (10340161, 47)


In [60]:
print("Number of rows in train_df: {}".format(len(train_df)))
print("Number of rows in test_df: {}".format(len(test_df)))

train_size = len(train_df)
test_size = len(test_df)

Number of rows in train_df: 454330
Number of rows in test_df: 10340161


---
# Scale the test and train data

### Scale the training data input features

In [61]:
scaler = StandardScaler()
train_df[X_columns] = scaler.fit_transform(train_df[X_columns])

### Scale the testing data input features

In [62]:
test_df[X_columns] = scaler.fit_transform(test_df[X_columns])

---
# Define the classification problem - (2 classes, 8 classes or 34 classes)
Change the following cell to select the classification type

If the METHOD == STRATIFIED, then we can use any classifier
If the METHOD == ATTACK_GROUP then we must use Group Classifier.

In [63]:

class_size_map = {2: "Binary", 8: "Group", 34: "Individual"}

if group_classifier:
    print("Group 8 Class Classifier... - Adjusting labels in test and train dataframes")
    # Map y column to the dict_7_classes values
    test_df['label'] = test_df['label'].map(dict_8_classes)
    train_df['label'] = train_df['label'].map(dict_8_classes)
    class_size = "8"      
    
elif binary_classifier:
    print("Binary 2 Class Classifier... - Adjusting labels in test and train dataframes")
    # Map y column to the dict_2_classes values
    test_df['label'] = test_df['label'].map(dict_2_classes)
    train_df['label'] = train_df['label'].map(dict_2_classes)
    class_size = "2"

else:
    print ("Individual 34 Class classifier... - No adjustments to labels in test and train dataframes")
    class_size = "34"


Binary 2 Class Classifier... - Adjusting labels in test and train dataframes


---
# Split the Training Data into partitions for the Federated Learning clients depending on the test required
As a reminder:

`STRATIFIED` with:
 - `ALL TYPES` - Results in `NUM_OF_STRATIFIED_CLIENTS` clients. Each client will have a stratified sample of the data.

`MISSING_1_ATTACK` with:
 - `individual_classifier` - Results in 33 clients. Each client will have benign traffic and 32 attack labels.
 - `group_classifier` - Results in 7 clients. Each client will have benign traffic and 6 attack groups.
 - `binary_classifier` - Results in 10 clients. Five clients will have benign traffic only and the other will have Benign and malicious attack labels.

`1_ATTACK_ONLY` with:
 - `individual_classifier` - Results in 33 clients. Each client will have benign traffic and 1 attack label.
 - `group_classifier` - Results in 7 clients. Each client will have benign traffic and 1 attack groups.
 - `binary_classifier` - Results in 10 clients. Five clients will have benign traffic only and the other will have Benign and malicious attack labels. - SAME AS MISSING_1_ATTACK for binary classifier

`HALF_BENIGN_ONLY` with:
 - `individual_classifier` - Results in 10 clients. Five clients will have benign traffic only and the other will have Benign and 33 malicious attack labels.
 - `group_classifier` - Results in 10 clients. Five clients will have benign traffic only and the other will have Benign and 7 malicious attack groups.
 - `binary_classifier` - Results in 10 clients. Five clients will have benign traffic only and the other will have Benign and malicious attack labels. - SAME AS MISSING_1_ATTACK for binary classifier


In [64]:
from sklearn.model_selection import StratifiedKFold

# Define fl_X_train and fl_y_train
fl_X_train = []
fl_y_train = []

client_df = pd.DataFrame()

if METHOD == 'STRATIFIED':
    print(f"{Colours.YELLOW.value}STRATIFIED METHOD{Colours.NORMAL.value} with {class_size} class classifier")
    # We are going to split the training data into 'NUM_OF_STRATIFIED_CLIENTS' smaller groups using StratifiedKFold
    skf = StratifiedKFold(n_splits=NUM_OF_STRATIFIED_CLIENTS, shuffle=True, random_state=42)
    for train_index, test_index in skf.split(train_df[X_columns], train_df[y_column]):
        fl_X_train.append(train_df[X_columns].iloc[test_index])
        fl_y_train.append(train_df[y_column].iloc[test_index])

elif METHOD == 'MISSING_1_ATTACK':
    print(f"{Colours.YELLOW.value}MISSING_1_ATTACK METHOD{Colours.NORMAL.value} with {class_size} class classifier")

    if individual_classifier or group_classifier:
        # Set the number of splits required to the number of classes - 1
        num_splits = int(class_size) - 1
    else:
        # For binary classifier, set the number of splits to 10
        num_splits = 10

    skf = StratifiedKFold(n_splits=num_splits, shuffle=True, random_state=42)

    # When creating the clients, we will remove one attack class from the training data
    # For the binary classifier, evey other client will have the benign class removed
    for i, (train_index, test_index) in enumerate(skf.split(train_df[X_columns], train_df[y_column])):
        if binary_classifier:
            print(f"i: {i} = i % 2 = {i % 2}")
            if i % 2 == 0:
                print("Benign only")
                # Create a new dataframe for the client data with only benign traffic
                client_df = pd.concat([train_df.iloc[test_index][train_df[y_column] != 1]], ignore_index=True)
                fl_X_train.append(client_df[X_columns])
                fl_y_train.append(client_df[y_column])
            else:
                print("Both")
                # Create a new dataframe for the client data
                fl_X_train.append(train_df[X_columns].iloc[test_index])
                fl_y_train.append(train_df[y_column].iloc[test_index])
        else:
            # Create a new dataframe for the client data
            client_df = pd.concat([train_df.iloc[test_index][train_df[y_column] != i+1]], ignore_index=True)
            fl_X_train.append(client_df[X_columns])
            fl_y_train.append(client_df[y_column])

elif METHOD == '1_ATTACK_ONLY':
    print(f"{Colours.YELLOW.value}1_ATTACK_ONLY METHOD{Colours.NORMAL.value} with {class_size} class classifier")
    # Each client only has one attack class in their training data along with the Benign data
    
    if individual_classifier or group_classifier:
        # Set the number of splits required to the number of classes - 1
        num_splits = int(class_size) - 1
    else:
        # For binary classifier, set the number of splits to 10
        num_splits = 10

    skf = StratifiedKFold(n_splits=num_splits, shuffle=True, random_state=42)

    # When creating the clients, we will only add the benign data and the attack class for that client
    for i, (train_index, test_index) in enumerate(skf.split(train_df[X_columns], train_df[y_column])):
        if binary_classifier:
            print(f"i: {i} = i % 2 = {i % 2}")
            if i % 2 == 0:
                print("Benign only")
                # Create a new dataframe for the client data with only benign traffic
                client_df = pd.concat([train_df.iloc[test_index][train_df[y_column] != 1]], ignore_index=True)
                fl_X_train.append(client_df[X_columns])
                fl_y_train.append(client_df[y_column])
            else:
                print("Both")
                # Create a new dataframe for the client data
                fl_X_train.append(train_df[X_columns].iloc[test_index])
                fl_y_train.append(train_df[y_column].iloc[test_index])
        else:
            # Create a new dataframe for the client data
            client_df = pd.concat([train_df.iloc[test_index][(train_df[y_column] == 0) | (train_df[y_column] == i+1)]], ignore_index=True)
            fl_X_train.append(client_df[X_columns])
            fl_y_train.append(client_df[y_column])

elif METHOD == 'HALF_BENIGN_ONLY':
    print(f"{Colours.YELLOW.value}HALF_BENIGN_ONLY METHOD{Colours.NORMAL.value} with {class_size} class classifier")

    num_splits = 10

    # Split into 10 client data
    skf = StratifiedKFold(n_splits=NUM_OF_STRATIFIED_CLIENTS, shuffle=True, random_state=42)

    # For i % 2 == 0, add only benign data
    # For i % 2 == 1, add all data
    for i, (train_index, test_index) in enumerate(skf.split(train_df[X_columns], train_df[y_column])):
        if i % 2 == 0:
            print("Benign only")
            # Create a new dataframe for the client data with only benign traffic
            client_df = pd.concat([train_df.iloc[test_index][train_df[y_column] == 0]], ignore_index=True)
            fl_X_train.append(client_df[X_columns])
            fl_y_train.append(client_df[y_column])
        else:
            print("All Classes")
            fl_X_train.append(train_df[X_columns].iloc[test_index])
            fl_y_train.append(train_df[y_column].iloc[test_index])
else:
    print(f"{Colours.RED.value}ERROR: Method {METHOD} not recognised{Colours.NORMAL.value}")
    exit()



[33mSTRATIFIED METHOD[0m with 2 class classifier


In [65]:
NUM_OF_CLIENTS = len(fl_X_train)

for i in range(len(fl_X_train)):
    # Show the unique values in the y column
    (f"Client ID: {i}")
    print(f"fl_X_train[{i}].shape: {fl_X_train[i].shape}")  
    print(f"fl_y_train[{i}].value_counts():\n{fl_y_train[i].value_counts()}")
    print(f"fl_y_train[{i}].unique(): {fl_y_train[i].unique()}\n")

# Check that fl_X_train[0] and fl_X_train[1] contain different data
print(f"fl_X_train[0].equals(fl_X_train[1]): {fl_X_train[0].equals(fl_X_train[1])}")

fl_X_train[0].shape: (45433, 46)
fl_y_train[0].value_counts():
1    44365
0     1068
Name: label, dtype: int64
fl_y_train[0].unique(): [1 0]

fl_X_train[1].shape: (45433, 46)
fl_y_train[1].value_counts():
1    44365
0     1068
Name: label, dtype: int64
fl_y_train[1].unique(): [1 0]

fl_X_train[2].shape: (45433, 46)
fl_y_train[2].value_counts():
1    44365
0     1068
Name: label, dtype: int64
fl_y_train[2].unique(): [1 0]

fl_X_train[3].shape: (45433, 46)
fl_y_train[3].value_counts():
1    44365
0     1068
Name: label, dtype: int64
fl_y_train[3].unique(): [1 0]

fl_X_train[4].shape: (45433, 46)
fl_y_train[4].value_counts():
1    44364
0     1069
Name: label, dtype: int64
fl_y_train[4].unique(): [1 0]

fl_X_train[5].shape: (45433, 46)
fl_y_train[5].value_counts():
1    44364
0     1069
Name: label, dtype: int64
fl_y_train[5].unique(): [1 0]

fl_X_train[6].shape: (45433, 46)
fl_y_train[6].value_counts():
1    44364
0     1069
Name: label, dtype: int64
fl_y_train[6].unique(): [1 0]

fl_X_t

Prepare an output directory where we can store the results of the federated learning

In [66]:
# Create an "Output" directory if it doesnt exist already
if not os.path.exists("Output"):
    os.makedirs("Output")

sub_dir_name = f"train_size-{train_size}_test_size-{test_size}"

# if sub_dir_name does not exist, create it
if not os.path.exists(f"Output/{sub_dir_name}"):
    os.makedirs(f"Output/{sub_dir_name}")

test_directory_name = f"{METHOD}_Classifier-{class_size}_Clients-{NUM_OF_CLIENTS}"

# Create an "Output/{METHOD}-{NUM_OF_CLIENTS}-{NUM_OF_ROUNDS}" directory if it doesnt exist already
if not os.path.exists(f"Output/{sub_dir_name}/{test_directory_name}"):
    os.makedirs(f"Output/{sub_dir_name}/{test_directory_name}")

# Ensure the directory is empty
for file in os.listdir(f"Output/{sub_dir_name}/{test_directory_name}"):
    file_path = os.path.join(f"Output/{sub_dir_name}/{test_directory_name}", file)
    if os.path.isfile(file_path):
        os.unlink(file_path)

# Original training size is the sum of all the fl_X_train sizes
original_training_size = 0
for i in range(len(fl_X_train)):
    original_training_size += fl_X_train[i].shape[0]

# Write this same info to the output directory/Class Split Info.txt
with open(f"Output/{sub_dir_name}/{test_directory_name}/Class Split Info.txt", "w") as f:
    for i in range(len(fl_X_train)):
        f.write(f"Client ID: {i}\n")
        f.write(f"fl_X_train.shape: {fl_X_train[i].shape}\n")
        f.write(f"Training data used {original_training_size}")
        f.write(f"fl_y_train.value_counts():\n{fl_y_train[i].value_counts()}\n")
        f.write(f"fl_y_train.unique(): {fl_y_train[i].unique()}\n\n")

### Convert the training dataset

In [67]:
# Convert the testing daya to X_test and y_test ndarrays
X_test = test_df[X_columns].to_numpy()
y_test = test_df[y_column].to_numpy()

In [68]:
num_unique_classes = len(train_df[y_column].unique())

train_df_shape = train_df.shape
test_df_shape = test_df.shape

# We are now done with the train_df and test_df dataframes, so we can delete them to free up memory
del train_df
del test_df
del client_df

---
### Data check

In [69]:
print("NUM_CLIENTS:", NUM_OF_CLIENTS)

print("NUM_ROUNDS:", NUM_OF_ROUNDS)
print()


print("Original training size: {}".format(original_training_size))


print("Checking training data split groups")
for i in range(len(fl_X_train)):
    print(i, ":", "X Shape", fl_X_train[i].shape, "Y Shape", fl_y_train[i].shape)


# Print the sizes of X_test and y_test
print("\nChecking testing data")
print("X_test size: {}".format(X_test.shape))
print("y_test size: {}".format(y_test.shape))

print("\nDeploy Simulation")

NUM_CLIENTS: 10
NUM_ROUNDS: 10

Original training size: 454330
Checking training data split groups
0 : X Shape (45433, 46) Y Shape (45433,)
1 : X Shape (45433, 46) Y Shape (45433,)
2 : X Shape (45433, 46) Y Shape (45433,)
3 : X Shape (45433, 46) Y Shape (45433,)
4 : X Shape (45433, 46) Y Shape (45433,)
5 : X Shape (45433, 46) Y Shape (45433,)
6 : X Shape (45433, 46) Y Shape (45433,)
7 : X Shape (45433, 46) Y Shape (45433,)
8 : X Shape (45433, 46) Y Shape (45433,)
9 : X Shape (45433, 46) Y Shape (45433,)

Checking testing data
X_test size: (10340161, 46)
y_test size: (10340161,)

Deploy Simulation


----
# Federated Learning
## Import the libraries and print the versions

In [70]:
import os
import flwr as fl
import numpy as np
import tensorflow as tf

# Make TensorFlow log less verbose
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Dropout



Define the Client and Server code

In [71]:
import os
import flwr as fl
import numpy as np
import tensorflow as tf

print('scikit-learn {}.'.format(sklearn.__version__))
print("flwr", fl.__version__)
print("numpy", np.__version__)
print("tf", tf.__version__)
# Make TensorFlow log less verbose
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Activation
from tensorflow.keras.layers import BatchNormalization
from tensorflow.keras.layers import Dropout

import datetime

client_evaluations = []

class NumpyFlowerClient(fl.client.NumPyClient):
    def __init__(self, cid, model, train_data, train_labels):
        self.model = model
        self.cid = cid
        self.train_data = train_data
        self.train_labels = train_labels

    def get_parameters(self, config):
        return self.model.get_weights()

    def fit(self, parameters, config):
        self.model.set_weights(parameters)
        print ("Client ", self.cid, "Training...")
        self.model.fit(self.train_data, self.train_labels, epochs=5, batch_size=32)
        print ("Client ", self.cid, "Training complete...")
        return self.model.get_weights(), len(self.train_data), {}

    def evaluate(self, parameters, config):
        self.model.set_weights(parameters)
        print ("Client ", self.cid, "Evaluating...")
        loss, accuracy = self.model.evaluate(self.train_data, self.train_labels, batch_size=32)
        print(f"{Colours.YELLOW.value}Client {self.cid} evaluation complete - Accuracy: {accuracy:.6f}, Loss: {loss:.6f}{Colours.NORMAL.value}")

        # Write the same message to the "Output/{cid}_Evaluation.txt" file
        with open(f"Output/{sub_dir_name}/{test_directory_name}/{self.cid}_Evaluation.txt", "a") as f:
            f.write(f"{datetime.datetime.now()} - Client {self.cid} evaluation complete - Accuracy: {accuracy:.6f}, Loss: {loss:.6f}\n")

            # Close the file
            f.close()

        return loss, len(self.train_data), {"accuracy": accuracy}
    
    def predict(self, incoming):
        prediction = np.argmax( self.model.predict(incoming) ,axis=1)
        return prediction

def client_fn(cid: str) -> NumpyFlowerClient:
    """Create a Flower client representing a single organization."""

    # Load model
    #model = tf.keras.applications.MobileNetV2((32, 32, 3), classes=10, weights=None)
    #model.compile("adam", "sparse_categorical_crossentropy", metrics=["accuracy"])

    print ("Client ID:", cid)

    model = Sequential([
      #Flatten(input_shape=(79,1)),
      Flatten(input_shape=(fl_X_train[0].shape[1] , 1)),
      Dense(50, activation='relu'),  
      Dense(25, activation='relu'),  
      Dense(num_unique_classes, activation='softmax')
    ])
    
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

   
    partition_id = int(cid)
    X_train_c = fl_X_train[partition_id]
    y_train_c = fl_y_train[partition_id]

    # Create a  single Flower client representing a single organization
    return NumpyFlowerClient(cid, model, X_train_c, y_train_c)


from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
eval_count = 0

def get_evaluate_fn(server_model):
    global eval_count
    """Return an evaluation function for server-side evaluation."""
    # The `evaluate` function will be called after every round
    
    
    def evaluate(server_round, parameters, config):
        global eval_count
        
        # Update model with the latest parameters
        server_model.set_weights(parameters)
        print (f"Server Evaluating... Evaluation Count:{eval_count}")
        loss, accuracy = server_model.evaluate(X_test, y_test)
        
        y_pred = server_model.predict(X_test)
        print ("Prediction: ", y_pred, y_pred.shape)
        #cmatrix = confusion_matrix(y_test, np.rint(y_pred))
        #print ("confusion_matrix:", cmatrix, cmatrix.shape)
                        
        print(f"{Colours.YELLOW.value}Server evaluation complete - Accuracy: {accuracy:.4f}, Loss: {loss:.4f}{Colours.NORMAL.value}")

        # Write the same message to the "Output/Server_Evaluation.txt" file
        with open(f"Output/{sub_dir_name}/{test_directory_name}/Server_Evaluation.txt", "a") as f:
            f.write(f"{datetime.datetime.now()} - {server_round} : Server evaluation complete - Accuracy: {accuracy:.4f}, Loss: {loss:.4f}\n")

            # Close the file
            f.close()
        
        np.save("y_pred-" + str(eval_count) + ".npy", y_pred)
        #np.save("cmatrix-" + str(eval_count) + ".npy", cmatrix)
        eval_count = eval_count + 1
        
        return loss, {"accuracy": accuracy}
    return evaluate



server_model = Sequential([
    #Flatten(input_shape=(79,1)),
    Flatten(input_shape=(fl_X_train[0].shape[1] , 1)),
    Dense(50, activation='relu'),  
    Dense(25, activation='relu'),  
    Dense(num_unique_classes, activation='softmax')
])


server_model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

# Create FedAvg strategy
strategy = fl.server.strategy.FedAvg(
        fraction_fit=1.0,
        fraction_evaluate=0.5,
        min_fit_clients=2, #10,
        min_evaluate_clients=2, #5,
        min_available_clients=2, #10,
        evaluate_fn=get_evaluate_fn(server_model),
        #evaluate_metrics_aggregation_fn=weighted_average,
)

scikit-learn 1.2.2.
flwr 1.4.0
numpy 1.23.5
tf 2.12.0


In [72]:
%%time
print (f"{Colours.YELLOW.value}\nDeploy simulation... Method = {METHOD} - {class_size_map[num_unique_classes]} ({class_size}) Classifier")
print (f"Number of Clients = {NUM_OF_CLIENTS}\n")
print (f"Writing output to: {sub_dir_name}/{test_directory_name}\n{Colours.NORMAL.value}")

# Output the same information to the Output/Run_details.txt file
with open(f"Output/{sub_dir_name}/{test_directory_name}/Run_details.txt", "a") as f:
    f.write(f"{datetime.datetime.now()} - Deploy simulation... Method = {METHOD} - {class_size_map[num_unique_classes]} ({class_size}) Classifier\n")
    f.write(f"{datetime.datetime.now()} - Number of Clients = {NUM_OF_CLIENTS}\n")

    # Write Original train_df size
    f.write(f"{datetime.datetime.now()} - Original train_df size: {train_df_shape}\n")

    # Write the training data split groups
    for i in range(len(fl_X_train)):
        f.write(f"{datetime.datetime.now()} - {i}: X Shape {fl_X_train[i].shape}, Y Shape {fl_y_train[i].shape}\n")

    # Write the testing data
    f.write(f"{datetime.datetime.now()} - X_test size: {X_test.shape}\n")
    f.write(f"{datetime.datetime.now()} - y_test size: {y_test.shape}\n")
    
# close the file
f.close()

start_time = datetime.datetime.now()

# Start simulation
fl.simulation.start_simulation(
    client_fn=client_fn,
    num_clients=NUM_OF_CLIENTS,
    config=fl.server.ServerConfig(num_rounds=NUM_OF_ROUNDS),
    strategy=strategy,
)

end_time = datetime.datetime.now()
print("Total time taken: ", end_time - start_time)

print (f"{Colours.YELLOW.value} SIMULATION COMPLETE. Method = {METHOD} - {class_size_map[num_unique_classes]} ({class_size}) Classifier")
print (f"Number of Clients = {NUM_OF_CLIENTS}{Colours.NORMAL.value}\n")

# Output the same information to the Output/Run_details.txt file
with open(f"Output/{sub_dir_name}/{test_directory_name}/Run_details.txt", "a") as f:
    f.write(f"{datetime.datetime.now()} - SIMULATION COMPLETE. Method = {METHOD} - {class_size_map[num_unique_classes]} ({class_size}) Classifier\n")
    f.write(f"{datetime.datetime.now()} - Total time taken: {end_time - start_time}\n")

INFO flwr 2023-07-13 07:35:21,920 | app.py:146 | Starting Flower simulation, config: ServerConfig(num_rounds=10, round_timeout=None)


[33m
Deploy simulation... Method = STRATIFIED - Binary (2) Classifier
Number of Clients = 10

Writing output to: train_size-454330_test_size-10340161/STRATIFIED_Classifier-2_Clients-10
[0m


[2m[36m(launch_and_evaluate pid=13520)[0m Client ID: 9[32m [repeated 4x across cluster][0m
[2m[36m(launch_and_evaluate pid=13520)[0m [32m [repeated 118x across cluster][0m
[2m[36m(launch_and_evaluate pid=13520)[0m Client  9 Evaluating...[32m [repeated 4x across cluster][0m
[2m[36m(launch_and_evaluate pid=13520)[0m   28/1420 [..............................] - ETA: 2s - loss: 0.0303 - accuracy: 0.9877  [32m [repeated 8x across cluster][0m
[2m[36m(launch_and_evaluate pid=13520)[0m   86/1420 [>.............................] - ETA: 2s - loss: 0.0343 - accuracy: 0.9866[32m [repeated 9x across cluster][0m
[2m[36m(launch_and_evaluate pid=13520)[0m  122/1420 [=>............................] - ETA: 2s - loss: 0.0410 - accuracy: 0.9841[32m [repeated 4x across cluster][0m
[2m[36m(launch_and_evaluate pid=13520)[0m  154/1420 [==>...........................] - ETA: 2s - loss: 0.0374 - accuracy: 0.9856[32m [repeated 6x across cluster][0m
[2m[36m(launch_and_evaluate

2023-07-13 07:35:32,434	INFO worker.py:1636 -- Started a local Ray instance.
INFO flwr 2023-07-13 07:35:36,467 | app.py:180 | Flower VCE: Ray initialized with resources: {'CPU': 12.0, 'node:127.0.0.1': 1.0, 'memory': 11621872436.0, 'object_store_memory': 5810936217.0, 'GPU': 1.0}
INFO flwr 2023-07-13 07:35:36,468 | server.py:86 | Initializing global parameters
INFO flwr 2023-07-13 07:35:36,469 | server.py:273 | Requesting initial parameters from one random client
INFO flwr 2023-07-13 07:35:43,922 | server.py:277 | Received initial parameters from one random client
INFO flwr 2023-07-13 07:35:43,923 | server.py:88 | Evaluating initial parameters


[2m[36m(launch_and_get_parameters pid=14840)[0m Client ID: 8
Server Evaluating... Evaluation Count:0


INFO flwr 2023-07-13 07:46:16,168 | server.py:91 | initial parameters (loss, other metrics): 0.7927687168121338, {'accuracy': 0.46213555335998535}
INFO flwr 2023-07-13 07:46:16,169 | server.py:101 | FL starting
DEBUG flwr 2023-07-13 07:46:16,169 | server.py:218 | fit_round 1: strategy sampled 10 clients (out of 10)


Prediction:  [[0.43842757 0.56157243]
 [0.5899156  0.4100844 ]
 [0.4943871  0.50561285]
 ...
 [0.46340495 0.536595  ]
 [0.6299001  0.37009987]
 [0.49989554 0.5001045 ]] (10340161, 2)
[33mServer evaluation complete - Accuracy: 0.4621, Loss: 0.7928[0m
[2m[36m(launch_and_fit pid=14840)[0m Client ID: 8
[2m[36m(launch_and_fit pid=14840)[0m Client  8 Training...
[2m[36m(launch_and_fit pid=14840)[0m Epoch 1/5
[2m[36m(launch_and_fit pid=14840)[0m 
[2m[36m(launch_and_fit pid=14840)[0m    1/1420 [..............................] - ETA: 29:26 - loss: 0.7447 - accuracy: 0.5000
[2m[36m(launch_and_fit pid=14840)[0m   27/1420 [..............................] - ETA: 2s - loss: 0.4544 - accuracy: 0.8681   
[2m[36m(launch_and_fit pid=14840)[0m   48/1420 [>.............................] - ETA: 2s - loss: 0.3390 - accuracy: 0.9134
[2m[36m(launch_and_fit pid=14840)[0m 
[2m[36m(launch_and_fit pid=14840)[0m   70/1420 [>.............................] - ETA: 2s - loss: 0.2616 - accu

DEBUG flwr 2023-07-13 07:46:49,687 | server.py:232 | fit_round 1 received 10 results and 0 failures






Server Evaluating... Evaluation Count:1


INFO flwr 2023-07-13 07:57:11,987 | server.py:119 | fit progress: (1, 0.019425245001912117, {'accuracy': 0.9912475347518921}, 655.8169033000013)
DEBUG flwr 2023-07-13 07:57:11,989 | server.py:168 | evaluate_round 1: strategy sampled 5 clients (out of 10)


Prediction:  [[7.7612358e-06 9.9999225e-01]
 [1.2866749e-06 9.9999869e-01]
 [4.4505470e-02 9.5549458e-01]
 ...
 [5.5937830e-06 9.9999440e-01]
 [1.7545206e-06 9.9999821e-01]
 [7.4688510e-06 9.9999249e-01]] (10340161, 2)
[33mServer evaluation complete - Accuracy: 0.9912, Loss: 0.0194[0m
[2m[36m(launch_and_evaluate pid=17840)[0m Client ID: 3
[2m[36m(launch_and_fit pid=31708)[0m Client ID: 3
[2m[36m(launch_and_fit pid=27464)[0m [32m [repeated 60x across cluster][0m
[2m[36m(launch_and_fit pid=17840)[0m Client  0 Training complete...[32m [repeated 8x across cluster][0m
[2m[36m(launch_and_evaluate pid=17840)[0m Client  3 Evaluating...
[2m[36m(launch_and_evaluate pid=17840)[0m    1/1420 [..............................] - ETA: 7:19 - loss: 1.5284e-05 - accuracy: 1.0000
[2m[36m(launch_and_evaluate pid=17840)[0m   37/1420 [..............................] - ETA: 1s - loss: 0.0220 - accuracy: 0.9899      
[2m[36m(launch_and_evaluate pid=17840)[0m   69/1420 [>..........

DEBUG flwr 2023-07-13 07:57:14,557 | server.py:182 | evaluate_round 1 received 5 results and 0 failures
DEBUG flwr 2023-07-13 07:57:14,558 | server.py:218 | fit_round 2: strategy sampled 10 clients (out of 10)


[2m[36m(launch_and_evaluate pid=17840)[0m [33mClient 3 evaluation complete - Accuracy: 0.992164, Loss: 0.018096[0m
[2m[36m(launch_and_fit pid=15660)[0m Client  3 Training...
[2m[36m(launch_and_fit pid=12088)[0m Epoch 1/5
[2m[36m(launch_and_fit pid=23004)[0m Client ID: 4[32m [repeated 14x across cluster][0m
[2m[36m(launch_and_fit pid=31708)[0m [32m [repeated 205x across cluster][0m
[2m[36m(launch_and_evaluate pid=27464)[0m Client  0 Evaluating...[32m [repeated 4x across cluster][0m
[2m[36m(launch_and_fit pid=17300)[0m   38/1420 [..............................] - ETA: 3s - loss: 0.0172 - accuracy: 0.9910[32m [repeated 37x across cluster][0m
[2m[36m(launch_and_fit pid=16512)[0m   90/1420 [>.............................] - ETA: 3s - loss: 0.0179 - accuracy: 0.9906[32m [repeated 30x across cluster][0m
[2m[36m(launch_and_fit pid=17300)[0m  135/1420 [=>............................] - ETA: 3s - loss: 0.0204 - accuracy: 0.9894[32m [repeated 34x across clu

DEBUG flwr 2023-07-13 07:57:39,272 | server.py:232 | fit_round 2 received 10 results and 0 failures


Server Evaluating... Evaluation Count:2


INFO flwr 2023-07-13 08:08:56,973 | server.py:119 | fit progress: (2, 0.01820513978600502, {'accuracy': 0.9919068217277527}, 1360.8032244999922)
DEBUG flwr 2023-07-13 08:08:56,974 | server.py:168 | evaluate_round 2: strategy sampled 5 clients (out of 10)


Prediction:  [[2.4059941e-07 9.9999976e-01]
 [5.7170109e-08 1.0000000e+00]
 [6.8106577e-02 9.3189341e-01]
 ...
 [6.1230020e-07 9.9999940e-01]
 [1.1390665e-07 9.9999988e-01]
 [3.5537059e-07 9.9999964e-01]] (10340161, 2)
[33mServer evaluation complete - Accuracy: 0.9919, Loss: 0.0182[0m
[2m[36m(launch_and_evaluate pid=16512)[0m Client ID: 0
[2m[36m(launch_and_fit pid=17300)[0m Epoch 5/5[32m [repeated 3x across cluster][0m
[2m[36m(launch_and_fit pid=17300)[0m [32m [repeated 168x across cluster][0m
[2m[36m(launch_and_fit pid=17300)[0m Client  1 Training complete...[32m [repeated 9x across cluster][0m
[2m[36m(launch_and_evaluate pid=17300)[0m Client  5 Evaluating...
[2m[36m(launch_and_evaluate pid=16512)[0m    1/1420 [..............................] - ETA: 5:55 - loss: 2.7805e-05 - accuracy: 1.0000
[2m[36m(launch_and_evaluate pid=16512)[0m   82/1420 [>.............................] - ETA: 1s - loss: 0.0170 - accuracy: 0.9905
[2m[36m(launch_and_evaluate pid=165

DEBUG flwr 2023-07-13 08:08:59,168 | server.py:182 | evaluate_round 2 received 5 results and 0 failures
DEBUG flwr 2023-07-13 08:08:59,170 | server.py:218 | fit_round 3: strategy sampled 10 clients (out of 10)


[2m[36m(launch_and_evaluate pid=16512)[0m [33mClient 0 evaluation complete - Accuracy: 0.992010, Loss: 0.016592[0m
[2m[36m(launch_and_fit pid=27464)[0m Client  8 Training...
[2m[36m(launch_and_fit pid=12088)[0m Client ID: 4[32m [repeated 14x across cluster][0m
[2m[36m(launch_and_fit pid=12088)[0m Epoch 1/5[32m [repeated 10x across cluster][0m
[2m[36m(launch_and_fit pid=31708)[0m [32m [repeated 313x across cluster][0m
[2m[36m(launch_and_evaluate pid=6692)[0m Client  9 Evaluating...[32m [repeated 4x across cluster][0m
[2m[36m(launch_and_fit pid=17840)[0m   37/1420 [..............................] - ETA: 3s - loss: 0.0160 - accuracy: 0.9924[32m [repeated 34x across cluster][0m
[2m[36m(launch_and_fit pid=17840)[0m   77/1420 [>.............................] - ETA: 3s - loss: 0.0187 - accuracy: 0.9919[32m [repeated 27x across cluster][0m
[2m[36m(launch_and_fit pid=17840)[0m  120/1420 [=>............................] - ETA: 3s - loss: 0.0172 - accuracy

DEBUG flwr 2023-07-13 08:09:20,230 | server.py:232 | fit_round 3 received 10 results and 0 failures


Server Evaluating... Evaluation Count:3


INFO flwr 2023-07-13 08:19:39,846 | server.py:119 | fit progress: (3, 0.017775995656847954, {'accuracy': 0.992473840713501}, 2003.675927999997)
DEBUG flwr 2023-07-13 08:19:39,847 | server.py:168 | evaluate_round 3: strategy sampled 5 clients (out of 10)


Prediction:  [[2.2979812e-08 1.0000000e+00]
 [1.3413161e-08 1.0000000e+00]
 [9.1083191e-02 9.0891683e-01]
 ...
 [4.4866013e-07 9.9999952e-01]
 [2.6984651e-08 1.0000000e+00]
 [6.4388630e-08 9.9999988e-01]] (10340161, 2)
[33mServer evaluation complete - Accuracy: 0.9925, Loss: 0.0178[0m
[2m[36m(launch_and_evaluate pid=17840)[0m Client ID: 0
[2m[36m(launch_and_evaluate pid=17840)[0m Client  0 Evaluating...
[2m[36m(launch_and_fit pid=27464)[0m [32m [repeated 497x across cluster][0m
[2m[36m(launch_and_fit pid=27464)[0m Client  8 Training complete...[32m [repeated 9x across cluster][0m
[2m[36m(launch_and_evaluate pid=17840)[0m    1/1420 [..............................] - ETA: 5:16 - loss: 1.1462e-05 - accuracy: 1.0000
[2m[36m(launch_and_evaluate pid=17840)[0m   69/1420 [>.............................] - ETA: 2s - loss: 0.0148 - accuracy: 0.9928
[2m[36m(launch_and_evaluate pid=17840)[0m  108/1420 [=>............................] - ETA: 1s - loss: 0.0159 - accuracy: 

DEBUG flwr 2023-07-13 08:19:42,252 | server.py:182 | evaluate_round 3 received 5 results and 0 failures
DEBUG flwr 2023-07-13 08:19:42,254 | server.py:218 | fit_round 4: strategy sampled 10 clients (out of 10)


[2m[36m(launch_and_evaluate pid=17840)[0m [33mClient 0 evaluation complete - Accuracy: 0.992582, Loss: 0.015843[0m
[2m[36m(launch_and_fit pid=14840)[0m Client  7 Training...
[2m[36m(launch_and_fit pid=15660)[0m Epoch 1/5
[2m[36m(launch_and_fit pid=17300)[0m Client ID: 1[32m [repeated 14x across cluster][0m
[2m[36m(launch_and_evaluate pid=15660)[0m Client  8 Evaluating...[32m [repeated 4x across cluster][0m
[2m[36m(launch_and_fit pid=6692)[0m [32m [repeated 246x across cluster][0m
[2m[36m(launch_and_fit pid=23004)[0m   38/1420 [..............................] - ETA: 3s - loss: 0.0122 - accuracy: 0.9951[32m [repeated 39x across cluster][0m
[2m[36m(launch_and_fit pid=23004)[0m   76/1420 [>.............................] - ETA: 3s - loss: 0.0134 - accuracy: 0.9947[32m [repeated 26x across cluster][0m
[2m[36m(launch_and_fit pid=23004)[0m  135/1420 [=>............................] - ETA: 3s - loss: 0.0128 - accuracy: 0.9947[32m [repeated 30x across clus

DEBUG flwr 2023-07-13 08:20:09,047 | server.py:232 | fit_round 4 received 10 results and 0 failures


Server Evaluating... Evaluation Count:4


INFO flwr 2023-07-13 08:30:44,809 | server.py:119 | fit progress: (4, 0.017621871083974838, {'accuracy': 0.9926624894142151}, 2668.6324658999947)
DEBUG flwr 2023-07-13 08:30:44,809 | server.py:168 | evaluate_round 4: strategy sampled 5 clients (out of 10)


Prediction:  [[6.1153652e-11 1.0000000e+00]
 [2.2351005e-09 1.0000000e+00]
 [1.4548306e-01 8.5451692e-01]
 ...
 [2.6233869e-07 9.9999976e-01]
 [4.1911017e-09 1.0000000e+00]
 [9.1797698e-09 1.0000000e+00]] (10340161, 2)
[33mServer evaluation complete - Accuracy: 0.9927, Loss: 0.0176[0m
[2m[36m(launch_and_evaluate pid=14840)[0m Client ID: 4
[2m[36m(launch_and_evaluate pid=14840)[0m Client  4 Evaluating...
[2m[36m(launch_and_fit pid=27464)[0m [32m [repeated 382x across cluster][0m
[2m[36m(launch_and_fit pid=27464)[0m   32/1420 [..............................] - ETA: 4s - loss: 0.0106 - accuracy: 0.9961[32m [repeated 2x across cluster][0m
[2m[36m(launch_and_fit pid=27464)[0m Client  4 Evaluating...
[2m[36m(launch_and_fit pid=27464)[0m  131/1420 [=>............................] - ETA: 4s - loss: 0.0126 - accuracy: 0.9940[32m [repeated 3x across cluster][0m
[2m[36m(launch_and_fit pid=27464)[0m  189/1420 [==>...........................] - ETA: 4s - loss: 0.0138 - 

DEBUG flwr 2023-07-13 08:30:47,050 | server.py:182 | evaluate_round 4 received 5 results and 0 failures
DEBUG flwr 2023-07-13 08:30:47,050 | server.py:218 | fit_round 5: strategy sampled 10 clients (out of 10)


[2m[36m(launch_and_evaluate pid=16512)[0m [33mClient 9 evaluation complete - Accuracy: 0.993353, Loss: 0.014800[0m
[2m[36m(launch_and_fit pid=14840)[0m Client  1 Training...
[2m[36m(launch_and_fit pid=14840)[0m Epoch 1/5
[2m[36m(launch_and_fit pid=6692)[0m Client ID: 3[32m [repeated 14x across cluster][0m
[2m[36m(launch_and_evaluate pid=17300)[0m Client  3 Evaluating...[32m [repeated 4x across cluster][0m
[2m[36m(launch_and_fit pid=31708)[0m [32m [repeated 153x across cluster][0m
[2m[36m(launch_and_fit pid=6692)[0m   27/1420 [..............................] - ETA: 5s - loss: 0.0127 - accuracy: 0.9954[32m [repeated 37x across cluster][0m
[2m[36m(launch_and_fit pid=6692)[0m  129/1420 [=>............................] - ETA: 4s - loss: 0.0156 - accuracy: 0.9930[32m [repeated 32x across cluster][0m
[2m[36m(launch_and_fit pid=6692)[0m  186/1420 [==>...........................] - ETA: 3s - loss: 0.0159 - accuracy: 0.9923[32m [repeated 32x across cluster

DEBUG flwr 2023-07-13 08:31:11,464 | server.py:232 | fit_round 5 received 10 results and 0 failures


Server Evaluating... Evaluation Count:5


INFO flwr 2023-07-13 08:42:44,027 | server.py:119 | fit progress: (5, 0.017770841717720032, {'accuracy': 0.9929139614105225}, 3387.8455051999917)
DEBUG flwr 2023-07-13 08:42:44,028 | server.py:168 | evaluate_round 5: strategy sampled 5 clients (out of 10)


Prediction:  [[1.8813392e-13 1.0000000e+00]
 [4.3368736e-11 1.0000000e+00]
 [1.8301694e-01 8.1698304e-01]
 ...
 [5.7268114e-08 1.0000000e+00]
 [1.9791348e-10 1.0000000e+00]
 [2.2229103e-10 1.0000000e+00]] (10340161, 2)
[33mServer evaluation complete - Accuracy: 0.9929, Loss: 0.0178[0m
[2m[36m(launch_and_evaluate pid=14840)[0m Client ID: 7
[2m[36m(launch_and_fit pid=31708)[0m Client ID: 7
[2m[36m(launch_and_fit pid=31708)[0m Client ID: 7
[2m[36m(launch_and_fit pid=31708)[0m Client ID: 7
[2m[36m(launch_and_fit pid=31708)[0m Client ID: 7
[2m[36m(launch_and_fit pid=31708)[0m [32m [repeated 143x across cluster][0m
[2m[36m(launch_and_fit pid=31708)[0m Client ID: 7
[2m[36m(launch_and_fit pid=31708)[0m Client  7 Training complete...[32m [repeated 9x across cluster][0m
[2m[36m(launch_and_evaluate pid=14840)[0m Client  7 Evaluating...
[2m[36m(launch_and_evaluate pid=31708)[0m    1/1420 [..............................] - ETA: 5:33 - loss: 0.0108 - accuracy: 1.0

DEBUG flwr 2023-07-13 08:42:46,494 | server.py:182 | evaluate_round 5 received 5 results and 0 failures
DEBUG flwr 2023-07-13 08:42:46,495 | server.py:218 | fit_round 6: strategy sampled 10 clients (out of 10)


[2m[36m(launch_and_evaluate pid=14840)[0m [33mClient 7 evaluation complete - Accuracy: 0.993089, Loss: 0.015489[0m
[2m[36m(launch_and_fit pid=14840)[0m Client  8 Training...
[2m[36m(launch_and_fit pid=14840)[0m Epoch 1/5
[2m[36m(launch_and_fit pid=17300)[0m Client ID: 6[32m [repeated 14x across cluster][0m
[2m[36m(launch_and_fit pid=14840)[0m [32m [repeated 210x across cluster][0m
[2m[36m(launch_and_evaluate pid=27464)[0m Client  6 Evaluating...[32m [repeated 4x across cluster][0m
[2m[36m(launch_and_fit pid=15660)[0m   38/1420 [..............................] - ETA: 4s - loss: 0.0087 - accuracy: 0.9984[32m [repeated 37x across cluster][0m
[2m[36m(launch_and_fit pid=15660)[0m   78/1420 [>.............................] - ETA: 3s - loss: 0.0107 - accuracy: 0.9956[32m [repeated 26x across cluster][0m
[2m[36m(launch_and_fit pid=15660)[0m  139/1420 [=>............................] - ETA: 3s - loss: 0.0099 - accuracy: 0.9962[32m [repeated 33x across clu

DEBUG flwr 2023-07-13 08:43:12,011 | server.py:232 | fit_round 6 received 10 results and 0 failures


Server Evaluating... Evaluation Count:6


INFO flwr 2023-07-13 08:54:57,502 | server.py:119 | fit progress: (6, 0.01838020607829094, {'accuracy': 0.9929259419441223}, 4121.315887599994)
DEBUG flwr 2023-07-13 08:54:57,503 | server.py:168 | evaluate_round 6: strategy sampled 5 clients (out of 10)


Prediction:  [[9.8444011e-16 1.0000000e+00]
 [2.7745230e-12 1.0000000e+00]
 [1.9150369e-01 8.0849636e-01]
 ...
 [2.0505908e-08 1.0000000e+00]
 [4.0846587e-11 1.0000000e+00]
 [5.1562361e-12 1.0000000e+00]] (10340161, 2)
[33mServer evaluation complete - Accuracy: 0.9929, Loss: 0.0184[0m
[2m[36m(launch_and_evaluate pid=14840)[0m Client ID: 5
[2m[36m(launch_and_fit pid=14840)[0m Epoch 5/5[32m [repeated 3x across cluster][0m
[2m[36m(launch_and_fit pid=14840)[0m [32m [repeated 295x across cluster][0m
[2m[36m(launch_and_fit pid=14840)[0m Client  8 Training complete...[32m [repeated 9x across cluster][0m
[2m[36m(launch_and_evaluate pid=14840)[0m Client  5 Evaluating...
[2m[36m(launch_and_evaluate pid=14840)[0m    1/1420 [..............................] - ETA: 6:19 - loss: 0.0000e+00 - accuracy: 1.0000
[2m[36m(launch_and_evaluate pid=14840)[0m   43/1420 [..............................] - ETA: 1s - loss: 0.0114 - accuracy: 0.9935      
[2m[36m(launch_and_evaluate p

DEBUG flwr 2023-07-13 08:55:00,100 | server.py:182 | evaluate_round 6 received 5 results and 0 failures
DEBUG flwr 2023-07-13 08:55:00,103 | server.py:218 | fit_round 7: strategy sampled 10 clients (out of 10)


[2m[36m(launch_and_evaluate pid=14840)[0m [33mClient 5 evaluation complete - Accuracy: 0.993441, Loss: 0.014494[0m
[2m[36m(launch_and_fit pid=14840)[0m Client  0 Training...
[2m[36m(launch_and_fit pid=17300)[0m Client ID: 6[32m [repeated 14x across cluster][0m
[2m[36m(launch_and_fit pid=23004)[0m Epoch 1/5[32m [repeated 10x across cluster][0m
[2m[36m(launch_and_fit pid=6692)[0m [32m [repeated 159x across cluster][0m
[2m[36m(launch_and_evaluate pid=27464)[0m Client  0 Evaluating...[32m [repeated 4x across cluster][0m
[2m[36m(launch_and_fit pid=16512)[0m   40/1420 [..............................] - ETA: 3s - loss: 0.0082 - accuracy: 0.9953[32m [repeated 35x across cluster][0m
[2m[36m(launch_and_fit pid=16512)[0m   80/1420 [>.............................] - ETA: 3s - loss: 0.0174 - accuracy: 0.9922[32m [repeated 28x across cluster][0m
[2m[36m(launch_and_fit pid=16512)[0m  140/1420 [=>............................] - ETA: 3s - loss: 0.0353 - accuracy

DEBUG flwr 2023-07-13 08:55:43,330 | server.py:232 | fit_round 7 received 10 results and 0 failures


[2m[36m(launch_and_fit pid=27464)[0m [32m [repeated 219x across cluster][0m
[2m[36m(launch_and_fit pid=23004)[0m Client  2 Training complete...[32m [repeated 9x across cluster][0m
Server Evaluating... Evaluation Count:7


INFO flwr 2023-07-13 09:07:46,834 | server.py:119 | fit progress: (7, 0.018535353243350983, {'accuracy': 0.9933767914772034}, 4890.645572900001)
DEBUG flwr 2023-07-13 09:07:46,836 | server.py:168 | evaluate_round 7: strategy sampled 5 clients (out of 10)


Prediction:  [[3.6635240e-19 1.0000000e+00]
 [3.6239058e-14 1.0000000e+00]
 [1.8262631e-01 8.1737369e-01]
 ...
 [2.5221147e-09 1.0000000e+00]
 [9.7031118e-13 1.0000000e+00]
 [3.5437362e-13 1.0000000e+00]] (10340161, 2)
[33mServer evaluation complete - Accuracy: 0.9934, Loss: 0.0185[0m
[2m[36m(launch_and_evaluate pid=17840)[0m Client ID: 1
[2m[36m(launch_and_evaluate pid=23004)[0m Client 
[2m[36m(launch_and_evaluate pid=23004)[0m  
[2m[36m(launch_and_evaluate pid=23004)[0m 2 Evaluating...
[2m[36m(launch_and_evaluate pid=23004)[0m 
[2m[36m(launch_and_evaluate pid=17840)[0m Client  1 Evaluating...
[2m[36m(launch_and_evaluate pid=17840)[0m    1/1420 [..............................] - ETA: 6:11 - loss: 0.0960 - accuracy: 0.9688
[2m[36m(launch_and_evaluate pid=17840)[0m   39/1420 [..............................] - ETA: 1s - loss: 0.0173 - accuracy: 0.9936  
[2m[36m(launch_and_evaluate pid=15660)[0m   72/1420 [>.............................] - ETA: 1s - loss: 0.01

DEBUG flwr 2023-07-13 09:07:49,397 | server.py:182 | evaluate_round 7 received 5 results and 0 failures
DEBUG flwr 2023-07-13 09:07:49,398 | server.py:218 | fit_round 8: strategy sampled 10 clients (out of 10)


[2m[36m(launch_and_evaluate pid=17840)[0m [33mClient 1 evaluation complete - Accuracy: 0.993837, Loss: 0.014251[0m
[2m[36m(launch_and_fit pid=17300)[0m Client  6 Training...
[2m[36m(launch_and_fit pid=23004)[0m Epoch 1/5
[2m[36m(launch_and_fit pid=6692)[0m Client ID: 4[32m [repeated 14x across cluster][0m
[2m[36m(launch_and_fit pid=14840)[0m [32m [repeated 187x across cluster][0m
[2m[36m(launch_and_evaluate pid=15660)[0m Client  0 Evaluating...[32m [repeated 3x across cluster][0m
[2m[36m(launch_and_fit pid=14840)[0m   38/1420 [..............................] - ETA: 3s - loss: 0.0231 - accuracy: 0.9893[32m [repeated 38x across cluster][0m
[2m[36m(launch_and_fit pid=14840)[0m   89/1420 [>.............................] - ETA: 3s - loss: 0.0180 - accuracy: 0.9905[32m [repeated 31x across cluster][0m
[2m[36m(launch_and_fit pid=14840)[0m  125/1420 [=>............................] - ETA: 3s - loss: 0.0154 - accuracy: 0.9920[32m [repeated 30x across clus

DEBUG flwr 2023-07-13 09:08:17,765 | server.py:232 | fit_round 8 received 10 results and 0 failures


[2m[36m(launch_and_fit pid=14840)[0m   33/1420 [..............................] - ETA: 4s - loss: 0.0123 - accuracy: 0.9934[32m [repeated 13x across cluster][0m
[2m[36m(launch_and_fit pid=14840)[0m   91/1420 [>.............................] - ETA: 4s - loss: 0.0119 - accuracy: 0.9942[32m [repeated 12x across cluster][0m
Server Evaluating... Evaluation Count:8


INFO flwr 2023-07-13 09:20:13,559 | server.py:119 | fit progress: (8, 0.019133921712636948, {'accuracy': 0.9933241009712219}, 5637.367675599991)
DEBUG flwr 2023-07-13 09:20:13,559 | server.py:168 | evaluate_round 8: strategy sampled 5 clients (out of 10)


Prediction:  [[1.6806195e-20 1.0000000e+00]
 [3.6717862e-15 1.0000000e+00]
 [2.0678297e-01 7.9321700e-01]
 ...
 [1.9905120e-09 1.0000000e+00]
 [8.7998879e-13 1.0000000e+00]
 [1.2644477e-13 1.0000000e+00]] (10340161, 2)
[33mServer evaluation complete - Accuracy: 0.9933, Loss: 0.0191[0m
[2m[36m(launch_and_evaluate pid=14840)[0m Client ID: 9
[2m[36m(launch_and_fit pid=14840)[0m  137/1420 [=>............................] - ETA: 4s - loss: 0.0124 - accuracy: 0.9943[32m [repeated 9x across cluster][0m
[2m[36m(launch_and_fit pid=14840)[0m  187/1420 [==>...........................] - ETA: 4s - loss: 0.0138 - accuracy: 0.9935[32m [repeated 14x across cluster][0m
[2m[36m(launch_and_fit pid=14840)[0m  234/1420 [===>..........................] - ETA: 4s - loss: 0.0146 - accuracy: 0.9935[32m [repeated 24x across cluster][0m
[2m[36m(launch_and_fit pid=14840)[0m  280/1420 [====>.........................] - ETA: 3s - loss: 0.0158 - accuracy: 0.9934[32m [repeated 28x across clus

DEBUG flwr 2023-07-13 09:20:16,056 | server.py:182 | evaluate_round 8 received 5 results and 0 failures
DEBUG flwr 2023-07-13 09:20:16,057 | server.py:218 | fit_round 9: strategy sampled 10 clients (out of 10)


[2m[36m(launch_and_evaluate pid=14840)[0m [33mClient 9 evaluation complete - Accuracy: 0.994123, Loss: 0.013892[0m
[2m[36m(launch_and_fit pid=17300)[0m Client  7 Training...
[2m[36m(launch_and_fit pid=23004)[0m Epoch 1/5
[2m[36m(launch_and_fit pid=17840)[0m Client ID: 8[32m [repeated 14x across cluster][0m
[2m[36m(launch_and_fit pid=12088)[0m  125/1420 [=>............................] - ETA: 3s - loss: 0.0258 - accuracy: 0.9925[32m [repeated 29x across cluster][0m
[2m[36m(launch_and_fit pid=12088)[0m  185/1420 [==>...........................] - ETA: 3s - loss: 0.0215 - accuracy: 0.9931[32m [repeated 38x across cluster][0m
[2m[36m(launch_and_fit pid=12088)[0m  207/1420 [===>..........................] - ETA: 3s - loss: 0.0199 - accuracy: 0.9935[32m [repeated 30x across cluster][0m
[2m[36m(launch_and_fit pid=31708)[0m  252/1420 [====>.........................] - ETA: 3s - loss: 0.0143 - accuracy: 0.9931[32m [repeated 20x across cluster][0m
[2m[36m(la

DEBUG flwr 2023-07-13 09:20:40,452 | server.py:232 | fit_round 9 received 10 results and 0 failures


Server Evaluating... Evaluation Count:9


INFO flwr 2023-07-13 09:32:18,437 | server.py:119 | fit progress: (9, 0.02009219117462635, {'accuracy': 0.9934859871864319}, 6362.245104699992)
DEBUG flwr 2023-07-13 09:32:18,438 | server.py:168 | evaluate_round 9: strategy sampled 5 clients (out of 10)


Prediction:  [[6.8080801e-22 1.0000000e+00]
 [1.2812358e-16 1.0000000e+00]
 [2.1302700e-01 7.8697300e-01]
 ...
 [6.9130918e-10 1.0000000e+00]
 [7.4113366e-13 1.0000000e+00]
 [3.5007323e-14 1.0000000e+00]] (10340161, 2)
[33mServer evaluation complete - Accuracy: 0.9935, Loss: 0.0201[0m
[2m[36m(launch_and_evaluate pid=14840)[0m Client ID: 2
[2m[36m(launch_and_evaluate pid=14840)[0m Client  2 Evaluating...
[2m[36m(launch_and_fit pid=17840)[0m Client  2 Evaluating...
[2m[36m(launch_and_fit pid=17840)[0m [32m [repeated 181x across cluster][0m
[2m[36m(launch_and_fit pid=17840)[0m Client  8 Training complete...[32m [repeated 9x across cluster][0m
[2m[36m(launch_and_evaluate pid=14840)[0m    1/1420 [..............................] - ETA: 6:11 - loss: 0.0170 - accuracy: 1.0000
[2m[36m(launch_and_evaluate pid=17840)[0m   79/1420 [>.............................] - ETA: 1s - loss: 0.0146 - accuracy: 0.9941
[2m[36m(launch_and_evaluate pid=14840)[0m  113/1420 [=>.......

DEBUG flwr 2023-07-13 09:32:20,915 | server.py:182 | evaluate_round 9 received 5 results and 0 failures
DEBUG flwr 2023-07-13 09:32:20,917 | server.py:218 | fit_round 10: strategy sampled 10 clients (out of 10)


[2m[36m(launch_and_evaluate pid=14840)[0m [33mClient 2 evaluation complete - Accuracy: 0.993815, Loss: 0.018376[0m
[2m[36m(launch_and_fit pid=14840)[0m Client  6 Training...
[2m[36m(launch_and_fit pid=15660)[0m Epoch 1/5
[2m[36m(launch_and_fit pid=17300)[0m Client ID: 8[32m [repeated 14x across cluster][0m
[2m[36m(launch_and_evaluate pid=27464)[0m Client  6 Evaluating...[32m [repeated 4x across cluster][0m
[2m[36m(launch_and_fit pid=15660)[0m [32m [repeated 242x across cluster][0m
[2m[36m(launch_and_fit pid=6692)[0m   37/1420 [..............................] - ETA: 4s - loss: 0.0105 - accuracy: 0.9941[32m [repeated 39x across cluster][0m
[2m[36m(launch_and_fit pid=6692)[0m   76/1420 [>.............................] - ETA: 3s - loss: 0.0123 - accuracy: 0.9938[32m [repeated 29x across cluster][0m
[2m[36m(launch_and_fit pid=6692)[0m  125/1420 [=>............................] - ETA: 3s - loss: 0.0116 - accuracy: 0.9948[32m [repeated 31x across cluste

DEBUG flwr 2023-07-13 09:32:45,408 | server.py:232 | fit_round 10 received 10 results and 0 failures


Server Evaluating... Evaluation Count:10


INFO flwr 2023-07-13 09:44:02,875 | server.py:119 | fit progress: (10, 0.02126791700720787, {'accuracy': 0.9933499097824097}, 7066.682228199992)
DEBUG flwr 2023-07-13 09:44:02,876 | server.py:168 | evaluate_round 10: strategy sampled 5 clients (out of 10)


Prediction:  [[3.4326790e-23 1.0000000e+00]
 [9.3653885e-18 1.0000000e+00]
 [2.5980499e-01 7.4019504e-01]
 ...
 [7.4470707e-10 1.0000000e+00]
 [2.4229750e-12 1.0000000e+00]
 [5.1353033e-15 1.0000000e+00]] (10340161, 2)
[33mServer evaluation complete - Accuracy: 0.9933, Loss: 0.0213[0m
[2m[36m(launch_and_evaluate pid=27464)[0m Client ID: 6
[2m[36m(launch_and_fit pid=27464)[0m Client ID: 6
[2m[36m(launch_and_fit pid=27464)[0m [32m [repeated 127x across cluster][0m
[2m[36m(launch_and_fit pid=27464)[0m Client  7 Training complete...[32m [repeated 9x across cluster][0m
[2m[36m(launch_and_evaluate pid=14840)[0m Client  0 Evaluating...
[2m[36m(launch_and_evaluate pid=14840)[0m    1/1420 [..............................] - ETA: 5:30 - loss: 1.6428e-06 - accuracy: 1.0000
[2m[36m(launch_and_evaluate pid=14840)[0m   45/1420 [..............................] - ETA: 1s - loss: 0.0154 - accuracy: 0.9910      
[2m[36m(launch_and_evaluate pid=14840)[0m   87/1420 [>.........

DEBUG flwr 2023-07-13 09:44:05,398 | server.py:182 | evaluate_round 10 received 5 results and 0 failures
INFO flwr 2023-07-13 09:44:05,399 | server.py:147 | FL finished in 7069.206771500001
INFO flwr 2023-07-13 09:44:05,400 | app.py:218 | app_fit: losses_distributed [(1, 0.01831797994673252), (2, 0.01668887920677662), (3, 0.016572396084666253), (4, 0.015234833396971226), (5, 0.015562286227941513), (6, 0.0153903191909194), (7, 0.014799277298152447), (8, 0.014341126382350921), (9, 0.015447058714926243), (10, 0.015311380103230476)]
INFO flwr 2023-07-13 09:44:05,401 | app.py:219 | app_fit: metrics_distributed_fit {}
INFO flwr 2023-07-13 09:44:05,401 | app.py:220 | app_fit: metrics_distributed {}
INFO flwr 2023-07-13 09:44:05,402 | app.py:221 | app_fit: losses_centralized [(0, 0.7927687168121338), (1, 0.019425245001912117), (2, 0.01820513978600502), (3, 0.017775995656847954), (4, 0.017621871083974838), (5, 0.017770841717720032), (6, 0.01838020607829094), (7, 0.018535353243350983), (8, 0.019

[2m[36m(launch_and_evaluate pid=14840)[0m [33mClient 0 evaluation complete - Accuracy: 0.993705, Loss: 0.013951[0m
Total time taken:  2:08:43.487250
[33m SIMULATION COMPLETE. Method = STRATIFIED - Binary (2) Classifier
Number of Clients = 10[0m

CPU times: total: 2h 11min 29s
Wall time: 2h 8min 43s
