Welcome to our GitHub repository! Here you will find more information about our method of estimating upper and lower bounds for the performance of weak supervision models introduced in
Programmatic Weak Supervision (PWS) enables supervised model training without direct access to ground truth labels by leveraging weak labels from heuristics, crowdsourcing, or pre-trained models. However, evaluating these models is challenging because traditional metrics such as accuracy, precision, and recall require labeled data. This repository introduces our implementation of a novel method for evaluating weakly supervised models by framing the task as a partial identification problem. Using Fréchet bounds, we estimate reliable performance bounds for key metrics—such as accuracy, precision, recall, and F1-score—without requiring labeled data. Our approach leverages scalable convex optimization to compute these bounds efficiently, even in high-dimensional settings. This framework provides a robust and practical solution for assessing model quality in weak supervision scenarios, overcoming core limitations in existing evaluation techniques.
To use the code in this repository, clone the repo and create a conda environment using:
conda env create --file=wsbounds.yaml
conda activate wsbounds
Please check our demo on how to use our method to evaluate a classifier trained using PWS to hate speech detection.
- Please download Wrench data from wrench_class.zip, unzip the folder, and place it inside the data folder. The data in
wrench_classis processed usingwsbounds/process_data.pyin case you need to re-process it. - Please clone
https://github.com/Vicomtech/hate-speech-dataset.gitinto the data folder. - Run
python experiments.py --exp1 --exp2 --exp3 --exp4to re-run all experiments and the plots are generated using the notebooks inside the foldernotebooks. The fileexperiments.pycan be found inside thewsboundsfolder. In case, you need to re-generate the weak labels for thespamexperiment, please take a look atwsbounds/generate_weak_labels.py.
@inproceedings{
polo2024weak,
title={Weak Supervision Performance Evaluation via Partial Identification},
author={Felipe Maia Polo and Subha Maity and Mikhail Yurochkin and Moulinath Banerjee and Yuekai Sun},
booktitle={The Thirty-eighth Annual Conference on Neural Information Processing Systems},
year={2024},
url={https://openreview.net/forum?id=VOVyeOzZx0}
}