This is the code and data for the paper Entity-Switched Datasets: An Approach to Auditing the In-Domain Robustness of Named Entity Recognition Models.
Currently it is limited to replacing all types of entities in CoNLL '03 data to produce the exact dataset used in the paper. We'll soon release the code for creating the rest of the datasets described in the paper as well. We'll also release the code to susbtitute any entities of your choice.
Following are the steps to create the data -
- Follow the instructions here to obtain the CoNLL'03 data and generate eng.testb
- Run the script as follows - python3 switch_entities.py <path-to-eng.testb> <path-to-entity-list> <path-to-ouput-file>. Entity lists are under the entities folder.