STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction

This is the implementation of our work for COLING-2022.

Overview

Self-training has proven effective for improving NLP tasks. The common practice is to construct confident synthetic data via a probability threshold. In this work, we propose a novel self-training framework on low-resource relation extraction with modeling ambiguous data. In detail, we first propose a method to identify ambiguous but useful instances from the uncertain instances and then divide the relations into candidate-label set and negative-label set for each ambiguous instance. Next, we propose a set-negative training method on the negative-label sets for the ambiguous instances and a positive training method for the confident instances. Finally, a joint-training method is proposed to build the final relation extraction system on all data. Experimental results on SemEval-2010 Task8 and Re-TACRED demonstrate the effectiveness of the proposed method. Extensive anayses on top-N evaluation provide a deeper understanding of how the proposed methods learn from the ambiguous data.

Preparation

Download SemEval-2010 Task8 dataset and Re-TACRED datasets.

Split data into our low-resource setting using scripts in src/preprocess/building_low_resource.py

Dependencies

python >= 3.6
pytorch >= 1.0.1

Running

Modify the exp_data_dir for output and model_name_or_path for load BERT in running_for_re_tacred_exclude_NA_cross_data.py

train_small_base part is for training the SUPERVISED model on small labeled dataset.
train_self_training part is for training the SELF-TRAINING model by tagging the unlabeled data with above SUPERVISED model, and then merge confident data with small labeled data.
train_self_training_partial_negative_and_ablation part is for training our STAD model and its ablation models.

Research Citation

If the code is useful for your research project, we appreciate if you cite the following paper:

@article{yu2022stad,
  title={STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction},
  author={Yu, Junjie and Wang, Xing and Zhao, Jiangjiang and Yang, Chunjie and Chen, Wenliang},
  journal={arXiv preprint arXiv:2209.01431},
  year={2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src		src
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
environment.yml.v0		environment.yml.v0
requirements.txt		requirements.txt
sentence_with_answer_size_v2.png		sentence_with_answer_size_v2.png
topN_results_comparison.png		topN_results_comparison.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction

Overview

Preparation

Dependencies

Running

Research Citation

About

Releases

Packages

Languages

jjyunlp/STAD

Folders and files

Latest commit

History

Repository files navigation

STAD: Self-Training with Ambiguous Data for Low-Resource Relation Extraction

Overview

Preparation

Dependencies

Running

Research Citation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages