Skip to content

Latest commit

 

History

History

dataset_processing

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 

Dataset Processing

Dataset preparation

We follow the preprocessing methods listed below and sincerely thank their previous work.

Dataset Preprocessing
ACE05 HMEAE
MAVEN MAVEN
ERE OmniEvent

Please store the preprocessed data in ./data folder with the structure below

data
  ├── ACE05_processed
  │   ├── train.json
  │   ├── dev.json
  │   └── test.json
  ├── MAVEN
  │   ├── train.jsonl
  │   ├── valid.jsonl
  │   └── test.jsonl
  └── ERE
      ├── processed
      │   ├── LDC2015E29.unified.jsonl
      │   ├── LDC2015E68.unified.jsonl 
      │   └── LDC2015E78.unified.jsonl 
      └── splits
          ├── train.doc.txt
          ├── dev.doc.txt
          └── test.doc.txt 

Then further preprocessing procedure for ERE dataset is necessary. Run

cd ./ERE
python data_split.py

The preprocessed data is then stored in ./ERE/[train|dev|test].jsonl

Few-shot Dataset Construction

We conduct our empirical study on two task settings, (1) low-resource setting and (2) class-transfer setting. You could find detailed definition about them in our paper.

Low-resource Setting

cd ./k_shot
bash run.sh [ACE|MAVEN|ERE]

You could find constructed few-shot dataset in ./k_shot/fewshot_set

Class-transfer Setting

cd ./class_transfer
bash run.sh [ACE|MAVEN|ERE]

You could find constructed few-shot dataset in ./class_transfer/fewshot_set