Skip to content

Latest commit

 

History

History
62 lines (51 loc) · 1.88 KB

README.md

File metadata and controls

62 lines (51 loc) · 1.88 KB

Dataset Processing

Dataset preparation

We follow the preprocessing methods listed below and sincerely thank their previous work.

Dataset Preprocessing
ACE05 HMEAE
MAVEN MAVEN
ERE OmniEvent

Please store the preprocessed data in ./data folder with the structure below

data
  ├── ACE05_processed
  │   ├── train.json
  │   ├── dev.json
  │   └── test.json
  ├── MAVEN
  │   ├── train.jsonl
  │   ├── valid.jsonl
  │   └── test.jsonl
  └── ERE
      ├── processed
      │   ├── LDC2015E29.unified.jsonl
      │   ├── LDC2015E68.unified.jsonl 
      │   └── LDC2015E78.unified.jsonl 
      └── splits
          ├── train.doc.txt
          ├── dev.doc.txt
          └── test.doc.txt 

Then further preprocessing procedure for ERE dataset is necessary. Run

cd ./ERE
python data_split.py

The preprocessed data is then stored in ./ERE/[train|dev|test].jsonl

Few-shot Dataset Construction

We conduct our empirical study on two task settings, (1) low-resource setting and (2) class-transfer setting. You could find detailed definition about them in our paper.

Low-resource Setting

cd ./k_shot
bash run.sh [ACE|MAVEN|ERE]

You could find constructed few-shot dataset in ./k_shot/fewshot_set

Class-transfer Setting

cd ./class_transfer
bash run.sh [ACE|MAVEN|ERE]

You could find constructed few-shot dataset in ./class_transfer/fewshot_set