BalancingGroups

Code to replicate the experimental results from Simple data balancing baselines achieve competitive worst-group-accuracy.

Replicating the main results

Installing dependencies

Easiest way to have a working environment for this repo is to create a conda environement with the following commands

conda env create -f environment.yaml
conda activate balancinggroups

If conda is not available, please install the dependencies listed in the requirements.txt file.

Download, extract and Generate metadata for datasets

This script downloads, extracts and formats the datasets metadata so that it works with the rest of the code out of the box.

python setup_datasets.py --download --data_path data

Launch jobs

To reproduce the experiments in the paper on a SLURM cluster :

# Launching 1400 combo seeds = 50 hparams for 4 datasets for 7 algorithms
# Each combo seed is ran 5 times to compute error bars, totalling 7000 jobs
python train.py --data_path data --output_dir main_sweep --num_hparams_seeds 1400 --num_init_seeds 5 --partition <slurm_partition>

If you want to run the jobs localy, omit the --partition argument.

Parse results

The parse.py script can generate all of the plots and tables from the paper. By default, it generates the best test worst-group-accuracy table for each dataset/method. This script can be called while the experiments are still running.

python parse.py main_sweep

License

This source code is released under the CC-BY-NC license, included here.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
datasets.py		datasets.py
environment.yaml		environment.yaml
models.py		models.py
parse.py		parse.py
plot_toy_scatter.py		plot_toy_scatter.py
requirements.txt		requirements.txt
setup_datasets.py		setup_datasets.py
train.py		train.py
train_toy.py		train_toy.py

License

facebookresearch/BalancingGroups

Folders and files

Latest commit

History

Repository files navigation

BalancingGroups

Replicating the main results

Installing dependencies

Download, extract and Generate metadata for datasets

Launch jobs

Parse results

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages