New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating custom dataset from scratch #27
Comments
Hi, we downloaded the processed SciERC (link), which is in the same format as shown on the README. In our experiments, we do not consider cross-sentence relations. All the datasets (ACE04, ACE05, SciERC) are annotated in the sentence level. So we consider each sentence individually if there are multiple in a passage. For cross-sentence relations, you can concatenate sentences and apply our approach on concatenated inputs to predict cross-sentence relations. |
Sort of a complementary question, the processed dataset has an additional |
I think those are coreference clusters, as the model that SciERC was created for also incorporates coreference resolution. I believe this pipeline doesn't do that, so they probably aren't used? |
The |
I was wondering if you download the training data, in original
sciERC
format (as shown here) and then reformat againautomatically
internally before training the model? Asking this, because a little confused whether to format my custom dataset like sciERC or your input data format, as shown on theReadme.md
. Also, sciERC annotates sentence wise , as per the aforementioned link. How does pure handle multisentence passages? The pretrained models that are downloaded by the repo-providedReadme.md
link, are also labeled assingle sentence model
.The text was updated successfully, but these errors were encountered: