any details to setup the dataset? #2

SeekPoint · 2018-11-23T07:58:14Z

No description provided.

VictorSanh · 2018-11-25T19:06:43Z

Hello, thanks for raising this question.

We used pre-trained word embeddings (Glove and ELMo). You can use the script scripts/data_setup.sh to download them and place them in a data folder.

Other datasets are also expected to be in the data folder (see the paths in the configuration files configs/*.json).
For instance, we compile the CoNLL2012 coreference data using this script from AllenNLP: https://github.com/allenai/allennlp/blob/master/scripts/compile_coref_data.sh
It compiles the CoNLL2012 data, and dump the coreference annotations into a single file.
For NER CoNLL, it is basically the same data as coreference which have not been dumped into the same single file (we can probably do something quick to avoid this data duplication).
Concerning the ACE data, we pre-process them so that the Mention Detection data match a CoNLL-NER format and the Relation Extraction task match a CoNLL-SRL format. Both are saved in a data/ace2005 folder.

If you want to use other datasets, it seems coherent to place them in the data folder, and use (if not modify) the dataset_readers classes.

Victor

Evpok · 2018-11-28T16:49:56Z

Any hope of releasing the ACE → CoNLL preprocessor ?

VictorSanh · 2018-11-30T16:23:07Z

Hey,
I am just attaching a really basic script I used for pre-processing: https://gist.github.com/VictorSanh/6cfce8bad8a80d3ba1cd1c95aba2216d
It is a simple adaptation of this data processor from Miwa and Bansal: https://github.com/tticoin/LSTM-ER/tree/master/data/ace2005

Evpok · 2018-11-30T17:03:43Z

Thanks !

djshowtime · 2019-04-10T03:28:39Z

Hello, thanks for raising this question.

We used pre-trained word embeddings (Glove and ELMo). You can use the script scripts/data_setup.sh to download them and place them in a data folder.

Other datasets are also expected to be in the data folder (see the paths in the configuration files configs/*.json).
For instance, we compile the CoNLL2012 coreference data using this script from AllenNLP: https://github.com/allenai/allennlp/blob/master/scripts/compile_coref_data.sh
It compiles the CoNLL2012 data, and dump the coreference annotations into a single file.
For NER CoNLL, it is basically the same data as coreference which have not been dumped into the same single file (we can probably do something quick to avoid this data duplication).
Concerning the ACE data, we pre-process them so that the Mention Detection data match a CoNLL-NER format and the Relation Extraction task match a CoNLL-SRL format. Both are saved in a data/ace2005 folder.

If you want to use other datasets, it seems coherent to place them in the data folder, and use (if not modify) the dataset_readers classes.

Victor

Hi,
I want to reproduce your NER result. However, I met a problem when I set up conll2012 data.

I used this script https://github.com/allenai/allennlp/blob/master/scripts/compile_coref_data.sh. But it warned that there is no .parse file in the folder.

could not find the gold parse [.//data/files/data/english/annotations/bc/cctv/00/cctv_0001.parse] in the ontonotes distribution ... exiting ...

cat: 'conll-2012/v4/data/development/data/english/annotations/*/*/*/*.v4_gold_conll': No such file or directory
cat: 'conll-2012/v4/data/train/data/english/annotations/*/*/*/*.v4_gold_conll': No such file or directory
cat: 'conll-2012/v4/data/test/data/english/annotations/*/*/*/*.v4_gold_conll': No such file or directory

VictorSanh closed this as completed Nov 30, 2018

VictorSanh mentioned this issue Dec 17, 2018

Chinese is not supported？ #5

Closed

parap1uie-s mentioned this issue Jan 22, 2019

Any details to setup the ACE2005 dataset? #8

Closed

djshowtime mentioned this issue Apr 10, 2019

conll2012 setup issue #13

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

any details to setup the dataset? #2

any details to setup the dataset? #2

SeekPoint commented Nov 23, 2018

VictorSanh commented Nov 25, 2018 •

edited

Loading

Evpok commented Nov 28, 2018

VictorSanh commented Nov 30, 2018

Evpok commented Nov 30, 2018

djshowtime commented Apr 10, 2019 •

edited

Loading

any details to setup the dataset? #2

any details to setup the dataset? #2

Comments

SeekPoint commented Nov 23, 2018

VictorSanh commented Nov 25, 2018 • edited Loading

Evpok commented Nov 28, 2018

VictorSanh commented Nov 30, 2018

Evpok commented Nov 30, 2018

djshowtime commented Apr 10, 2019 • edited Loading

VictorSanh commented Nov 25, 2018 •

edited

Loading

djshowtime commented Apr 10, 2019 •

edited

Loading