Disparate Censorship Expectation-Maximization (DCEM)

This is the official code release for "From Biased Selective Labels to Pseudo-Labels: An Expectation-Maximization Framework for Learning from Biased Decisions" (ICML '24).

Sometimes, when training a machine learning model, labels are missing/noisy. Sometimes, they're even missing/noisy in biased ways. We proposed Disparate Censorship Expectation-Maximization (ICML '24) to help mitigate impacts on model performance and mitigate bias.

This algorithm was inspired by a real-world problem in machine learning for healthcare: often, we assume that individuals that didn't get a diagnostic test are negative. This has been documented in papers analyzing COVID-19 testing, sepsis diagnostic definitions, and more.

If you're interested in reproducing our results from our paper, please check out the README in the legacy folder, which contains all of our experimental code. If you're interested in applying DCEM to your own problems/learning how to implement DCEM yourself, we recommend consulting this repo.

Quickstart

Run pip install -r requirements.txt to get the required dependencies.

Here's how you can apply DCEM to your own data:

    from dcem import DCEM

    # setup (what you should initialize)
    train_data, test_data = ... # load this however you like
    X_tr, A_tr, T_tr, Y_obs_tr = train_data
    propensity_model = ... # must be a nn.Module or implement `.fit()`
    outcome_model = ... # must be a nn.Module
    model = DCEM(propensity_model, outcome_model)

    # training
    model.fit(X_tr, A_tr, T_tr, Y_obs_tr, Y=Y_tr) # optionally pass in y in synthetic data

    # inference
    X_tr, *_ = test_data
    preds = model.predict_proba(X_tr)[:, 1]

We also provide a full demo of DCEM with example synthetic data in demo.py.

How DCEM works (informally)

DCEM is designed for situations where labeling decisions are noisy and potentially biased (in a fairness/equity sense). In such situations, if we fit a model to simply predict the observed outcome, we'll probably also learn to replicate these labeling biases. That's often undesirable.

Enter DCEM: our method leverages variables that we assume do not affect the outcome of interest (such as "protected attributes") to learn a model that "compensates" for labeling biases. For a comprehensive and formal treatment of DCEM, please see our paper.

Contributing/reporting issues

Contributions. We absolutely welcome contributions. This is a fairly bare-bones implementation of DCEM, but we hope to grow the functionality. Please raise an issue to discuss potential extensions or features you'd like to see before submitting a pull request.

Issues/bugs. All models are wrong; some are useful. Sadly, the same is not true of code. Please open an issue to discuss any potential bugs!

Special thanks to Gregory Kondas for help with testing the code!

Contact

Please reach out to ctrenton at umich dot edu or file a Github issue if you have any questions about our work. Thank you!

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
legacy		legacy
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
dcem.py		dcem.py
demo.py		demo.py
losses.py		losses.py
requirements.txt		requirements.txt
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disparate Censorship Expectation-Maximization (DCEM)

Quickstart

How DCEM works (informally)

Contributing/reporting issues

Contact

About

Releases

Packages

Languages

License

MLD3/DCEM

Folders and files

Latest commit

History

Repository files navigation

Disparate Censorship Expectation-Maximization (DCEM)

Quickstart

How DCEM works (informally)

Contributing/reporting issues

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages