Besides the DISAPERE dataset, this repository contains code for:
- Running the DISAPERE annotation server
- Processing the output of the DISAPERE annotation server
- Training the classification model on the DISAPERE dataset (Section ?? in the paper)
- Training the alignment models on the DISAPERE dataset (Section ?? in the paper)
- Producing the analysis plots from Section ?? in the paper
- Running the DISAPERE browser
We also include raw annotations from which DISAPERE was produced (anonymized)
conda create --name disapere_env python=3.8
conda activate disapere_env
python -m pip install -r data_requirements.txt
python handle_database.py -a <annotation_file> -t <text_file>
python clean_examples.py
This produces the DISAPERE dataset from the original server output. For new server output, it might need some changes to data/prep/subset_map.json
.