Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where to get Support Documents for Cross-Domain Test? #1

Closed
davidleejy opened this issue Jun 30, 2022 · 2 comments
Closed

Where to get Support Documents for Cross-Domain Test? #1

davidleejy opened this issue Jun 30, 2022 · 2 comments

Comments

@davidleejy
Copy link

davidleejy commented Jun 30, 2022

Hello Nicholas & Michael,

Your paper presents an thoughtful benchmark for doc-level RE and I look forward to trying it out. Would be great if you could clarify a few simple questions:

  1. Are the support documents for the cross-domain test set (comprising solely of SciERC samples) sampled from your Train+Dev set (62+16 relation types from DocRed)?

  2. If the answer to question (1) is yes, is this not zero-shot with the setting that shot is defined as relation types trained/encountered?

  3. Just to be certain, where are the support docs for the in-domain test set (16 RT from DocRED) sampled from? (Train+Dev set or in-domain test set)?

Figure illustrating my understanding of the train-test setup in this work:

Screenshot 2022-06-30 at 2 08 07 PM

@davidleejy davidleejy changed the title Support Documents for Cross-Domain Test? and other questions Where to get Support Documents for Cross-Domain Test? Jun 30, 2022
@nicpopovic
Copy link
Owner

Hi David,

Thank you for your interest and your questions!

  1. Are the support documents for the cross-domain test set (comprising solely of SciERC samples) sampled from your Train+Dev set (62+16 relation types from DocRed)?

The support documents for the cross-domain test set are sampled from the cross-domain test set itself. Sampling support documents from the in-domain set would not be possible (at least for the task format we chose) because the support documents need to contain instances of the relations which are to be extracted from the query documents. Since there is no overlap1 between relation types in the in-domain sets and the cross-domain set, a task in which the support and query documents are sampled from separate sets is not possible for the two datasets.

  1. If the answer to question (1) is yes, is this not zero-shot with the setting that shot is defined as relation types trained/encountered?

The answer to (1) is no, but if it where yes, it would sound to me more like a zero-shot setting than a few-shot setting, yes. However, I am not sure the support documents would be useful/usable input for such a task, as they would contain information about neither the query domain nor the relation types which are to be extracted.

  1. Just to be certain, where are the support docs for the in-domain test set (16 RT from DocRED) sampled from? (Train+Dev set or in-domain test set)?

The support documents for the in-domain test set are sampled from the in-domain test set. In general, support and query documents will always be sampled from the same set. This is because the support and query documents need to be annotated with the same relation types. Note that during testing, the model does not perform any (persistent) learning on the test episodes.

I hope that I was able to answer your questions?

Nicholas

Footnotes

  1. As mentioned in the paper, there technically are 2 relation types which are contained in both DocRED and SciERC, but we remove these from the in-domain set.

@davidleejy
Copy link
Author

These are very clear answers. Thank you for the elaboration, and congrats on the acceptance to NAACL.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants