Skip to content

sun-umn/federated-multi-modality-learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

72 Commits
 
 
 
 
 
 
 
 

Repository files navigation

An Implementation of Federated Learning in NLP using NVFlare

Quick Start

Installation


Pull Git Repo

git clone git@github.com:PL97/federated-multi-modality-learning.git

cd federated-multi-modality-learning

Pull Docker Image

docker pull pytorch/pytorch:1.12.0-cuda11.3-cudnn8-runtime

run docker image inside of a container

export dockerImage=pytorch/pytorch:1.12.0-cuda11.3-cudnn8-runtime

docker run -it --rm --shm-size=1G --gpus '"device=0"' \
--ulimit memlock=-1 --ulimit stack=67108864 --ipc=host --net=host\
    --mount type=bind,source=[PATH_TO_YOUR_DATA] \
--mount type=bind,source=[PATH_TO_CODE],target=/workspace/src \
$dockerImage /bin/bash

Setup Nvflare Environment

## install NVDlare in a virtual environment
sudo apt update
sudo apt-get install python3-venv

source nvflare-env/bin/activate || python -m venv nvflare-env && source nvflare-env/bin/activate 

python3 -m pip install -U pip
python3 -m pip install -U setuptools

python3 -m pip install nvflare==2.2.1
python3 -m pip install tensorboard
python3 -m pip install torch torchvision transformers
python3 -m pip install pandas
python3 -m pip install seqeval

Now you are ready to run the scripts! Simply run

sudo chmod +x simulate.sh
./simulate.sh

NER using FedAvg with NVFlare


2018 Track 2 ADE and medication extraction challenge

download the data here 👉 🔗. processed data can be found here 👉 🔗

python3 -m pip install nvflare==2.2.1 python3 -m pip install tensorboard python3 -m pip install torch torchvision transformers python3 -m pip install pandas python3 -m pip install seqeval

**Now you are ready to run the scripts!**
Simply run 
```bash
sudo chmod +x simulate.sh
./simulate.sh

NER using FedAvg with NVFlare


Model

We use bert-base-uncase in this example, download the model 👉 🔗

In BERT uncased, the text has been lowercased before WordPiece tokenization step while in BERT cased, the text is same as the input text (no changes).

2018 Track 2 ADE and medication extraction challenge

download the data here 👉 🔗. processed data can be found here 👉 🔗


Model

We use bert-base-uncase in this example, download the model 👉 🔗

In BERT uncased, the text has been lowercased before WordPiece tokenization step while in BERT cased, the text is same as the input text (no changes).


Evaluation Metric

adapted from seqeval

⚠️ seqeval supports the two evaluation modes. You can specify the following mode to each metrics: default, strict ⚠️

precision

$$precision = \frac{TP}{TP + FP}$$

recall

$$recall=\frac{TP}{TP + FN}$$

f1-score

$$F_1 = \frac{2 precision\times recall}{precision + recall}$$

micro average

average samples (e.g. accuracy) to maximize the number of correct predictions the classifier makes

macro average

average the metric (e.g. balanced accuracy) suggests

weighted average

each classes’s contribution to the average is weighted by its size, lies in between micro and maroc average


Experiment Setup

  • algorithm: fedavg
  • random splits into two clients

    client 1: train size 27408, val size 3046

    client 2: train size 27409, val size 3046

  • learning rate $5\times10^{-5}$
  • batch size 32
  • epoches 50 (set larger as fedavg coverge slower than pooled training)
  • aggregation weights: uniform (1:1)

FedAvg

Acknowledgment

We would like to express our gratitude to Cisco Research for their support of this work. Additionally, we acknowledge the Minnesota Supercomputing Institute (MSI) at the University of Minnesota for computation resources that aided in producing the results presented in this work. We extend our appreciation to Dr. Rui Zhang and his student Sicheng Zhou for their insightful discussions!

How to cite this work


If you find this gitrepo useful, please consider citing the associated paper using the snippet below:

@article{peng2023systematic,
  title={A Systematic Evaluation of Federated Learning on Biomedical Natural Language Processing},
  author={Peng, Le and Zhang, Rui and Xu, Ziyue and Sun, Ju and others},
  journal={arXiv preprint arXiv:2307.11254},
  year={2023}
}

About

A multi-modality learning system for healthcare using federated learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published