Dataset and baselines for paper "MUSIED: A Benchmark for Event Detection from Multi-Source Heterogeneous Informal Texts".
The dataset can be obtained from the “data” folder. The data format is introduced in this document.
Run preprocessing.py
to obtain the sentence-level input of model. The result is saved in data
directory.
├── data
│ └── train_sentence.json
│ └── dev_sentence.json
│ └── test_sentence.json
We release the source codes for the baselines, including
sentence-level models:
--DMCNN
--BiLSTM
--BERT
--C-BiLSTM
--DMBERT
document-level models
--HBTNGMA
--MLBiNet.