You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As mentioned in #14, article are truncated at 512 tokens. In some case, if the oracle sentences were located at the end of the article, this will produce samples with no gold labels.
So for these "empty" samples, the network will be trained to classify all of the article's sentence as not salient.
This process raises several questions :
Is it useful for the performance to keep such "empty" samples ?
Did you compare the performance of the actual network with a network trained without "empty" samples (even empirically) ?
It seems similar to SQuAD 2.0 : teaching the network that there is not always 3 salient sentences in the fed input (sometimes there is 2, sometimes 1, sometimes 0).
Yet at test time, you invariably pick the 3 best sentences, no matter their score (= no matter if the network decided that only 2/1/0 sentence was really salient).
It seems to be an important difference between training and inference. Is my intuition wrong ?
If I'm wrong, can you (quickly) explain where I misunderstood.
If I'm right, isn't it going to hurt the performance (maybe there is too little of these empty sample to really hurt the performance) ?
Thank you for your answer !
The text was updated successfully, but these errors were encountered:
Thank you for sharing such a great codebase :)
I have a question about truncated article.
As mentioned in #14, article are truncated at 512 tokens. In some case, if the oracle sentences were located at the end of the article, this will produce samples with no gold labels.
So for these "empty" samples, the network will be trained to classify all of the article's sentence as not salient.
This process raises several questions :
Is it useful for the performance to keep such "empty" samples ?
Did you compare the performance of the actual network with a network trained without "empty" samples (even empirically) ?
It seems similar to SQuAD 2.0 : teaching the network that there is not always 3 salient sentences in the fed input (sometimes there is 2, sometimes 1, sometimes 0).
Yet at test time, you invariably pick the 3 best sentences, no matter their score (= no matter if the network decided that only 2/1/0 sentence was really salient).
It seems to be an important difference between training and inference. Is my intuition wrong ?
If I'm wrong, can you (quickly) explain where I misunderstood.
If I'm right, isn't it going to hurt the performance (maybe there is too little of these empty sample to really hurt the performance) ?
Thank you for your answer !
The text was updated successfully, but these errors were encountered: