The repository accompanies our publication in the Data and Policy Journal titled "Overcoming Intergovernmental Data Sharing Challenges with Federated Learning". The repository contains a series of experiments around federated learning, primarily leveraging BERT for multi-class text classification tasks. Our main script run.sh
runs a set of experiments: centralized learning, per-country learning, federated learning, and privacy-preserving federated learning. Please contact the repo owner to obtain the used dataset, as we do not share it due to privacy and ethical reasons.
- Clone the repository
git clone https://github.com/sprenkamp/federated_learning_data_and_policy.git
cd federated_learning_data_and_policy
- Create a Python virtual environment and activate it (optional but recommended)
python3 -m venv venv
source venv/bin/activate # On Windows, use `.\venv\Scripts\activate`
- Install the required Python packages
pip install -r requirements.txt
To run the experiments, use the run.sh
script. Ensure that it has the right permissions.
bash run.sh
The run.sh
script will run the following experiments in order:
- Centralized Learning: This experiment uses BERT for multi-class text classification in a centralized manner.
python src/multi_class_text_classifier/train_bert_based_centralized.py
- Per-country Learning: This experiment takes a look at learning BERT-based multi-class text classifiers per country.
python src/multi_class_text_classifier/train_bert_based_per_country.py
- Federated Learning: In this experiment, we use BERT for multi-class text classification in a federated learning setup.
python src/multi_class_text_classifier/train_bert_based_federated.py
- Privacy-Preserving Federated Learning: Lastly, this experiment adds a privacy-preserving layer to the federated learning setup.
python src/multi_class_text_classifier/train_bert_based_federated_private.py