Skip to content

AL for rare class strategies compared in the paper "Transfer and Active Learning for Dissonance Detection: Addressing the Rare Class Challenge" adapted, to be used out-of-the-box.

License

humanlab/rare-class-AL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Active Learning for Rare Class

Details

This code accompanies the AL strategies used in our paper Transfer and Active Learning for Dissonance Detection: Addressing the Rare Class Challenge. The five AL methods:

These are implemented separately from the dataset introduced in our paper so that they can be used out-of-the-box on any dataset.

Active learning with needle in haystack

The PRC approach is a simple approach that picks examples most likely to be classified as the rare class by the model in each round of the active learning loop. This helps alleviate the needle-in-haystack problem which is prevalent in cases of absolute rarity (data scarcity combined with rare classes -- leading to almost no examples being found when data is collected from scratch.)

Setup

Ideally, use Python>=3.8.

pip install -r requirements.txt

The demo for each AL strategy can be found in demo.ipynb.

Citation

If you use our code, please cite our paper using the following bibtex:


@inproceedings{varadarajan2023transfer,
    title={Transfer and Active Learning for Dissonance Detection: Addressing the Rare-Class Challenge},
    author={Varadarajan, Vasudha and Juhng, Swanie and Mahwish, Syeda and Liu, Xiaoran and Luby, Jonah and Luhmann, Christian and Schwartz, H Andrew},
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Long Papers)",
    month = july,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    abstract = "While transformer-based systems have enabled greater accuracies with fewer training examples, data acquisition obstacles still persist for rare-class tasks -- when the class label is very infrequent (e.g. < 5% of samples). Active learning has in general been proposed to alleviate such challenges, but choice of selection strategy, the criteria by which rare-class examples are chosen, has not been systematically evaluated. Further, transformers enable iterative transfer-learning approaches. We propose and investigate transfer- and active learning solutions to the rare class problem of dissonance detection through utilizing models trained on closely related tasks and the evaluation of acquisition strategies, including a proposed probability-of-rare-class (PRC) approach. We perform these experiments for a specific rare class problem: collecting language samples of cognitive dissonance from social media. We find that PRC is a simple and effective strategy to guide annotations and ultimately improve model accuracy while transfer-learning in a specific order can improve the cold-start performance of the learner but does not benefit iterations of active learning.",
}

About

AL for rare class strategies compared in the paper "Transfer and Active Learning for Dissonance Detection: Addressing the Rare Class Challenge" adapted, to be used out-of-the-box.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published