NoiseBench serves for benchmarking the impact of real label noise on named entity recognition. It is based on a subset of the English CoNLL-03 dataset and consists of 1 ground-truth label set and 6 variants of noisy labels:
- Clean
- Expert noise
- Crowd noise
- Crowd noise (best-case)
- Distant supervision noise
- Weak supervision noise
- LLM noise
We provide the annotation-only files in data/annotations
. The annotations follow the IOB2 scheme. We masked the tokens in the included sentences with [TOK] due to the license of the Reuters Corpus that CoNLL-03 is based on. We take the CleanCoNLL annotations as a ground truth.
This script generates the NoiseBench dataset variants in data/noisebench
.
- Run the script:
bash create_noisebench.sh
(if the git clone
command from Option 1 is not available)
-
Download the full CleanCoNLL dataset in the
data/cleanconll
folder according to the instructions in https://github.com/flairNLP/CleanCoNLL.git. -
Create noisy datasets
python scripts/generate_data_files.py
- Requirements
conda create -n noisebench python=3.10
conda activate noisebench
pip install -r requirements.txt
- Run main experiment script
python main.py --config configs/exp1_real_noise.json
- Run simulated noise generation
python scripts/calculate_data_overviews.py
python scripts/create_simulated_noisy_sets.py
- Run main experiment script
python main.py --config configs/exp1_simulated_noise.json