Cartography Active Learning

This repository contains the code and data for the paper:

Mike Zhang and Barbara Plank. 2021. Cartography Active Learning. In Findings of the Association for Computational Linguistics: EMNLP 2021.

Repository

In this repository you will find:

project/src/*: all the code for the experiments.
project/resources/data/*: the data used in our paper.
run.sh: all commands necessary to rerun the experiments in the paper.
requirements.txt: all packages necessary for reproducibility.
.env: all environment variables.

Important Note: if you don't want to run all the scripts sequentially, at least use the command

mkdir -p project/{resources/{cartography_plots,embeddings,indices,mapping},results/{agnews,trec},plots/{agnews,trec}}

to make sure that there are folders available for the files to go into.

Citation

@inproceedings{zhang-plank-2021-cartography-active,
    title = "Cartography Active Learning",
    author = "Zhang, Mike  and
      Plank, Barbara",
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2021.findings-emnlp.36",
    pages = "395--406",
    abstract = "We propose Cartography Active Learning (CAL), a novel Active Learning (AL) algorithm that exploits the behavior of the model on individual instances during training as a proxy to find the most informative instances for labeling. CAL is inspired by data maps, which were recently proposed to derive insights into dataset quality (Swayamdipta et al., 2020). We compare our method on popular text classification tasks to commonly used AL strategies, which instead rely on post-training behavior. We demonstrate that CAL is competitive to other common AL methods, showing that training dynamics derived from small seed data can be successfully used for AL. We provide insights into our new AL method by analyzing batch-level statistics utilizing the data maps. Our results further show that CAL results in a more data-efficient learning strategy, achieving comparable or better results with considerably less training data.",
}

Contact

If there is any issue, please reach out to Mike Zhang (mikz@itu.dk) or create an issue in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
project		project
.env		.env
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
run.sh		run.sh
run_agnews.sh		run_agnews.sh
run_trec.sh		run_trec.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Cartography Active Learning

Repository

Citation

Contact

About

Releases

Packages

Languages

License

jjzha/cartography-al

Folders and files

Latest commit

History

Repository files navigation

Cartography Active Learning

Repository

Citation

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages