An Implementation of Federated Learning in NLP using NVFlare

Quick Start

Installation

Pull Git Repo

git clone git@github.com:PL97/federated-multi-modality-learning.git

cd federated-multi-modality-learning

Pull Docker Image

docker pull pytorch/pytorch:1.12.0-cuda11.3-cudnn8-runtime

run docker image inside of a container

export dockerImage=pytorch/pytorch:1.12.0-cuda11.3-cudnn8-runtime

docker run -it --rm --shm-size=1G --gpus '"device=0"' \
--ulimit memlock=-1 --ulimit stack=67108864 --ipc=host --net=host\
    --mount type=bind,source=[PATH_TO_YOUR_DATA] \
--mount type=bind,source=[PATH_TO_CODE],target=/workspace/src \
$dockerImage /bin/bash

Setup Nvflare Environment

## install NVDlare in a virtual environment
sudo apt update
sudo apt-get install python3-venv

source nvflare-env/bin/activate || python -m venv nvflare-env && source nvflare-env/bin/activate 

python3 -m pip install -U pip
python3 -m pip install -U setuptools

python3 -m pip install nvflare==2.2.1
python3 -m pip install tensorboard
python3 -m pip install torch torchvision transformers
python3 -m pip install pandas
python3 -m pip install seqeval

Now you are ready to run the scripts! Simply run

sudo chmod +x simulate.sh
./simulate.sh

NER using FedAvg with NVFlare

2018 Track 2 ADE and medication extraction challenge

download the data here 👉 🔗. processed data can be found here 👉 🔗

python3 -m pip install nvflare==2.2.1 python3 -m pip install tensorboard python3 -m pip install torch torchvision transformers python3 -m pip install pandas python3 -m pip install seqeval

**Now you are ready to run the scripts!**
Simply run 
```bash
sudo chmod +x simulate.sh
./simulate.sh

NER using FedAvg with NVFlare

Model

We use bert-base-uncase in this example, download the model 👉 🔗

In BERT uncased, the text has been lowercased before WordPiece tokenization step while in BERT cased, the text is same as the input text (no changes).

2018 Track 2 ADE and medication extraction challenge

download the data here 👉 🔗. processed data can be found here 👉 🔗

Model

We use bert-base-uncase in this example, download the model 👉 🔗

In BERT uncased, the text has been lowercased before WordPiece tokenization step while in BERT cased, the text is same as the input text (no changes).

Evaluation Metric

adapted from seqeval

⚠️ seqeval supports the two evaluation modes. You can specify the following mode to each metrics: default, strict ⚠️

precision

$$precision = \frac{TP}{TP + FP}$$

recall

$$recall=\frac{TP}{TP + FN}$$

f1-score

$$F_1 = \frac{2 precision\times recall}{precision + recall}$$

micro average

average samples (e.g. accuracy) to maximize the number of correct predictions the classifier makes

macro average

average the metric (e.g. balanced accuracy) suggests

weighted average

each classes’s contribution to the average is weighted by its size, lies in between micro and maroc average

Experiment Setup

algorithm: fedavg
random splits into two clients

client 1: train size 27408, val size 3046

client 2: train size 27409, val size 3046
learning rate $5\times10^{-5}$
batch size 32
epoches 50 (set larger as fedavg coverge slower than pooled training)
aggregation weights: uniform (1:1)

FedAvg

Acknowledgment

We would like to express our gratitude to Cisco Research for their support of this work. Additionally, we acknowledge the Minnesota Supercomputing Institute (MSI) at the University of Minnesota for computation resources that aided in producing the results presented in this work. We extend our appreciation to Dr. Rui Zhang and his student Sicheng Zhou for their insightful discussions!

How to cite this work

If you find this gitrepo useful, please consider citing the associated paper using the snippet below:

@article{peng2023systematic,
  title={A Systematic Evaluation of Federated Learning on Biomedical Natural Language Processing},
  author={Peng, Le and Zhang, Rui and Xu, Ziyue and Sun, Ju and others},
  journal={arXiv preprint arXiv:2307.11254},
  year={2023}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

An Implementation of Federated Learning in NLP using NVFlare

Quick Start

Installation

Pull Git Repo

Pull Docker Image

Setup Nvflare Environment

NER using FedAvg with NVFlare

2018 Track 2 ADE and medication extraction challenge

NER using FedAvg with NVFlare

Model

2018 Track 2 ADE and medication extraction challenge

Model

Evaluation Metric

Experiment Setup

FedAvg

Acknowledgment

How to cite this work

Files

README.md

Latest commit

History

README.md

File metadata and controls

An Implementation of Federated Learning in NLP using NVFlare

Quick Start

Installation

Pull Git Repo

Pull Docker Image

Setup Nvflare Environment

NER using FedAvg with NVFlare

2018 Track 2 ADE and medication extraction challenge

NER using FedAvg with NVFlare

Model

2018 Track 2 ADE and medication extraction challenge

Model

Evaluation Metric

Experiment Setup

FedAvg

Acknowledgment

How to cite this work