Multi-Initial-Center Federated Learning with Data Distribution Similarity-Aware Constraint

Research code that accompanies the paper [Multi-Initial-Center Federated Learning with Data Distribution Similarity-Aware Constraint). Federated Learning (FL) has recently attracted high attention since it allows clients to collaboratively train a model while the training data remains local. However, due to the inherent heterogeneity of local data distributions, the trained model usually fails to perform well on each client. Clustered FL has emerged to tackle this issue by clustering clients with similar data distributions. However, these model-dependent clustering methods tend to be costly and perform poorly. In this work, we propose a distribution similarity-based clustered federated learning framework FedDSMIC, which clusters clients by detecting the client-level underlying data distribution based on the model's memory of training data. Furthermore, we extend the assumption about data distribution to a more realistic(complicated) cluster structure. The center models are learned as good initial points to obtain common data properties in the cluster. Each client in a cluster gets a more personalized model by performing one step of gradient descent from the initial point. The empirical evaluation on real-world datasets shows that FedDSMIC outperforms popular state-of-the-art federated learning algorithms while keeping the lowest communication overhead.

It contains implementation of the following algorithms:

FedDSMIC (the proposed algorithm) (code).
FedAvg (paper and code).
FedProx (paper and code).
IFCA (paper and code).
FedSEM (paper and code).
Per-FedAvg (paper and code).

Install Requirements:

pip3 install -r requirements.txt

Datasets

We provide five federated benchmark datasets spanning a wide range of machine learning tasks: image classification (CIFAR10 and CIFAR100), handwritten character recognition (EMNIST and FEMNIST), and language modelling (Shakespeare).

For non-iid setting, We provide 3 non-iid settings: label_swapped_non_iid_split, dirichlet_non_iid_split, pathological_non_iid_split, in addition to a iid_split.

Shakespeare dataset (resp. FEMNIST) was naturally partitioned by assigning all lines from the same characters (resp. all images from the same writer) to the same client.

The following table summarizes the datasets and models

Dataset	Task	Model
FEMNIST	Handwritten character recognition	2-layer CNN + 2-layer FFN
EMNIST	Handwritten character recognition	2-layer CNN + 2-layer FFN
CIFAR10	Image classification	MobileNet-v2
CIFAR100	Image classification	MobileNet-v2
Shakespeare	Next character prediction	Stacked LSTM

See the README.md files of respective dataset, i.e., data/$DATASET, for instructions on generating data. To generate non-iid Mnist Dataset for 100 clients using 50% of the total available training samples (s_frac), and 80% training samples (tr_frac):

python generate_data.py \
    --n_users 100 \
    --split dirichlet_non_iid_split\
    --n_components 3 \
    --alpha 0.5 \
    --s_frac 0.5 \
    --tr_frac 0.8 \
    --unseen_tasks_frac 0.2 \
    --seed 12345

python generate_data.py \
    --n_users 100 \
    --split split_iid\
    --s_frac 0.5 \
    --tr_frac 0.8 \
    --seed 12345

python generate_data.py \
    --n_users 100 \
    --split pathological_non_iid_split\
    --s_frac 0.5 \
    --tr_frac 0.8 \
    --n_shards 2 \
    --seed 12345

python generate_data.py \
    --n_users 100 \
    --split label_swapped_non_iid_split \
    --n_components 4 \
    --s_frac 0.5 \
    --tr_frac 0.8 \
    --seed 12345

Run Experiments:

There is a main file "main.py" which allows running all experiments.

Run experiments on the Mnist Dataset:

nohup python -u main.py --dataset Mnist-alpha0.5-ratio1.0-u100 --algorithm FedAvg \
  --batch_size 32 --num_users 8 --learning_rate 0.01 --num_glob_iters 500 --E 1 --times 1 --gpu 1 > ./acc_loss_record/mnist/E=1/r=0.1/FedAvg.out 2>&1 &

We provide example scripts to run paper experiments under experiments/ directory.

Plot

For the input attribute algorithms, list the name of algorithms and separate them by comma, e.g. --algorithms FedAvg,FedGen,FedProx

  python main_plot.py --dataset Mnist-alpha0.5-ratio1.0-u100 --algorithms FedDSMIC,FedAvg,FedProx,PerFedavg,FedSEM,IFCA\
  --batch_size 32 --E 1 --num_users 80 --num_glob_iters 200 --plot_legend 1 --test_acc True

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
FLAlgorithms		FLAlgorithms
data		data
experiments		experiments
utils		utils
.DS_Store		.DS_Store
README.md		README.md
main.py		main.py
main_plot.py		main_plot.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Initial-Center Federated Learning with Data Distribution Similarity-Aware Constraint

Install Requirements:

Datasets

Run Experiments:

Run experiments on the Mnist Dataset:

Plot

About

Releases

Packages

Languages

lazyJane/MyLittleFL

Folders and files

Latest commit

History

Repository files navigation

Multi-Initial-Center Federated Learning with Data Distribution Similarity-Aware Constraint

Install Requirements:

Datasets

Run Experiments:

Run experiments on the Mnist Dataset:

Plot

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages