Skip to content

microsoft/Multi-Adapter-Fused-Inclusive-Language-Models

Repository files navigation

Introduction

Modular Transfer Learning Approaches for Debiasing LLMs :: Paper

Getting Started

We recommend creating a virtual environment before installing the package (optional):

$ {sudo} pip install virtualenv
$ virtualenv -p python3 modraienv
$ source modraienv/bin/activate

Install the Inclusivity toolkit provided in the repository. Please refer to InclusivityToolkit/README.md for further instructions.

Note: If you are using an A100 GPU we recommend installing pytorch first seperately {Optional}:

Install the package requirements

pip install -r requirements.txt

Running the Code

Preparing Data

Data preparation involves generating counterfactuals and tokenizing the generated data.

Counterfactual augmented data for gender dimension can be generated by running the following command:

$ python -m src.cda.cda_generate --output_file {path_to_store_cda_data}/{output_filename} --wikipedia_data_dir {wiki_data_dir} 

python -m src.cda.cda_generate --output_file ~/project/data/wikipedia/wiki_cda/religion/raw.txt --wikipedia_data_dir ~/project/data/wikipedia/20200501.en.hf/ --bias_type religion

Generated data can then be tokenized to avoid repeated tokenization during training by running:

$ python -m src.data_utils.tokenize_lm_corpus --txt_file {path_to_store_cda_data}/{output_filename} --tokenizer_variant {tokenizer_to_use} --block_size {max_length_of_text}

python -m src.data_utils.tokenize_lm_corpus --txt_file ~/project/data/wikipedia/wiki_cda/religion/raw.txt --tokenizer_variant bert-base-uncased --block_size 128

Train and evaluate the model

Use src/pretrain_dba.py to train a DBA for individual bias dimensions. See scripts/run_dba.sh for more info.

Each of these debiased adapters can be used for task specific fine-tuning with src/finetune_on_downstream_task.py. See scripts/run_ft.sh for more info.

Finally, to fuse multiple adapters on a downstream task, use src/minimal_task_ft.py. See scripts/mafia_all_biases.sh for more info.

Transparency Note

Overview

Multi-Adapter Fused Inclusive Language Models (MAFIA) is a method for combining debiasing adapters to reduce unintended bias in the outputs of pre-trained language models (PLMs). The full details of our research are available in our paper. The project includes code for inserting and training adapters into transformer blocks and for generating a dataset of counterfactual (CF) pairs and CDA. It also includes the dataset of counterfactual pairs we generate and an adaptation of STS-B dataset evaluating LLMs for race and religion bias (see related dataset documentation in paper section 3.2).

Objective

In this study, we explored several methods for debiasing PLMs and evaluated them on various end tasks and languages. We are sharing our datasets and methods to facilitate further research and development in the debiasing of PLMs.

Intended Uses

With further testing and development, we expect MAFIA to be especially useful in an enterprise setting where product-specific teams can easily add (or remove) DBAs for newly identified (or obsolete) bias dimensions to the base model which is often shared across different products. At this time, the code and datasets are being released for experimental purposes to facilitate further research before deploying in a real-world setting.

Out of Scope Uses

While this project has significant potential benefits in regulated domains such as medical, legal, and financial sectors, it should not be used in real-world settings without additional testing and development to validate the accuracy of the model. Additionally, it's important to ensure that any usage scenarios align with the regulations governing these domains.

Evaluation

We evaluate our models on various intrinsic and extrinsic (downstream) bias evaluation benchmarks and demonstrate its superior debiasing ability over related baselines. We use Stereoset and Crowdsourced Stereoset Pairs (CrowS-Pairs) to evaluate intrinsic bias in models. We use STS-B i.e., the Semantic Textual Similarity Benchmark from GLUE (Wang et al., 2018) as our downstream task for extrinsic evaluation. We use Bias-STS-B and its adaptation for gender, race and religion bias evaluation

Limitations

  • These methods are primarily designed for the English language and have only been tested on a limited set of high- and low-resource languages for zero-shot evaluation. The model may not perform equally well in non-English languages.

  • We only explored the interplay between a limited set of biases, i.e., gender, race, religion, and profession, recognizing that numerous other biases such as cultural and psychological biases are also important but have not been addressed.

  • Our CF pairs are limited by the knowledge of text-davinici-003 and presence in WikiData. For computational efficiency, the number of CF pairs are further reduced on the basis of the frequency of the occurrence of the entities in the pair in Google Book Corpus.

  • We acknowledge that our AdapterFusion is tuned on the downstream task, which makes it task-specific and not generic.

  • We only investigated the effect of fusion on a few downstream tasks, and replicating these findings on other tasks like Bias-NLI would be an interesting study.

  • Lastly, we were also constrained by our limited computational resources, as “pretraining” the debiasing adapters consumed a significant time for larger models like RoBERTa and XLM-R.

Safe and Responsible Use

This project is primarily designed for research and experimental purposes. We strongly recommend conducting further testing and validation before considering its application in industrial or real-world scenarios.

Project

This repo has been populated by an initial template to help get you started. Please make sure to update the content to build a great experience for community-building.

As the maintainer of this project, please make a few updates:

  • Improving this README.MD file to provide a great experience
  • Updating SUPPORT.MD with content about this project's support experience
  • Understanding the security reporting process in SECURITY.MD
  • Remove this section from the README

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

About

Code for EACL 2024 paper - MAFIA: Multi-Adapter Fused Inclusive Language Models (https://aclanthology.org/2024.eacl-long.37/)

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published