Creating custom dataset from scratch #27

appledora · 2021-09-13T08:52:48Z

I was wondering if you download the training data, in original sciERC format (as shown here) and then reformat again automatically internally before training the model? Asking this, because a little confused whether to format my custom dataset like sciERC or your input data format, as shown on the Readme.md . Also, sciERC annotates sentence wise , as per the aforementioned link. How does pure handle multisentence passages? The pretrained models that are downloaded by the repo-provided Readme.md link, are also labeled as single sentence model.

The text was updated successfully, but these errors were encountered:

a3616001 · 2021-09-13T14:10:46Z

Hi, we downloaded the processed SciERC (link), which is in the same format as shown on the README.

In our experiments, we do not consider cross-sentence relations. All the datasets (ACE04, ACE05, SciERC) are annotated in the sentence level. So we consider each sentence individually if there are multiple in a passage. For cross-sentence relations, you can concatenate sentences and apply our approach on concatenated inputs to predict cross-sentence relations.

appledora · 2021-09-14T04:06:45Z

Sort of a complementary question, the processed dataset has an additional clusters node for each document. What does it refer to or is it important?

serenalotreck · 2021-09-14T13:17:54Z

I think those are coreference clusters, as the model that SciERC was created for also incorporates coreference resolution. I believe this pipeline doesn't do that, so they probably aren't used?

a3616001 · 2021-09-14T21:40:49Z

The clusters field contains the coreference annotations. Our approach doesn't use those annotations, so you may ignore this field when using our code.

a3616001 closed this as completed Sep 19, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Creating custom dataset from scratch #27

Creating custom dataset from scratch #27

appledora commented Sep 13, 2021

a3616001 commented Sep 13, 2021

appledora commented Sep 14, 2021

serenalotreck commented Sep 14, 2021

a3616001 commented Sep 14, 2021

Creating custom dataset from scratch #27

Creating custom dataset from scratch #27

Comments

appledora commented Sep 13, 2021

a3616001 commented Sep 13, 2021

appledora commented Sep 14, 2021

serenalotreck commented Sep 14, 2021

a3616001 commented Sep 14, 2021