-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to slice <eos> token with different sentence length #23
Comments
|
Thanks for your quick reply! But if I remove eos from array, how can model learn to stop generating sentence without encountering the eos token? |
Model itself will predict the eos token.
If the model doesn't predict eos token, and the entire sentence is
gibberish, then the model isn't generalized well or data is insufficient.
…On Wed, 2 Feb, 2022, 7:38 am Rui Shao, ***@***.***> wrote:
Thanks for your quick reply!
But if I remove from array, how can model learn to stop generating
sentence without encountering the token?
—
Reply to this email directly, view it on GitHub
<#23 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALJ7DKFWWHLASC3EPZFGGODUZCG2XANCNFSM5NJLJS4Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
But we should let trg[:-1] have eos token when we calculate the loss, right? |
Depends on your training dataset.
If your dataset have special tokens like <bos> and <eos>, then yes, these
should be considered in loss.
While <pad> tokens do not contribute to loss.
…On Thu, 3 Feb, 2022, 7:47 am Rui Shao, ***@***.***> wrote:
But we should let trg[:-1] have eos token when we calculate the loss,
right?
like this:
trg[:-1] = [sos, x_1, x_2, x_3, eos, pad, pad]
or
trg[:-1] = [sos, x_1, x_2, x_3, eos]
—
Reply to this email directly, view it on GitHub
<#23 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALJ7DKBJ734ASWNV7LUJFFLUZHQVPANCNFSM5NJLJS4Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
Thanks. In all, I just want to create a dataset with sequences of different lengths. In such a dataset, I insert bos, eos in into the beginning and end of each sequence as the ground-truth. like this:
In such a case,
This is what we want for the loss calculation.
However, given different lengths, I have to further insert pad tokens to make them consistent, such as:
In such case,
The input of model (caps[:, :-1]) will contain the eos token, which we want to remove. Considering this, I just further replace the eos token with pad token as pad token will not be calculated for the loss, like this:
And I remain the caps[:, 1:] as
May I ask does this make sense? |
you should consider eos token in loss. Because you want your model to learn
when to stop generating a sentence.
…On Thu, 3 Feb, 2022, 1:03 pm Rui Shao, ***@***.***> wrote:
Thanks.
In all, I just want to create a dataset with sequences of different
lengths. In such a dataset, I insert bos, eos in into the beginning and end
of each sequence as the ground-truth. like this:
caps = [sos, x_1, x_2, x_3, eos]
In such a case,
caps[:, :-1] = [sos, x_1, x_2, x_3]
caps[:, 1:] = [x_1, x_2, x_3, eos]
This is what we want for the loss calculation.
outputs = model(samples, caps[:, :-1], cap_masks[:, :-1])
loss = criterion(outputs.permute(0, 2, 1), caps[:, 1:])
However, given different lengths, I have to further insert pad tokens to
make them consistent, such as:
caps = [sos, x_1, x_2, x_3, eos, pad, pad, pad]
In such case,
caps[:, :-1] = [sos, x_1, x_2, x_3, eos, pad, pad]
caps[:, 1:] = [x_1, x_2, x_3, eos, pad, pad, pad]
The input of model (caps[:, :-1]) will contain the eos token, which we
want to remove.
Considering this, I just further replace the eos token with pad token as
pad token will not be calculated for the loss, like this:
caps[:, :-1] = [sos, x_1, x_2, x_3, pad, pad, pad]
And I remain the caps[:, 1:] as
caps[:, 1:] = [x_1, x_2, x_3, eos, pad, pad, pad].
May I ask does this make sense?
—
Reply to this email directly, view it on GitHub
<#23 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALJ7DKAEX266V3MZ7V4YBDTUZIVTTANCNFSM5NJLJS4Q>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
As I want the model to predict the end token by excluding it from the input into the model, I simply slice the token off the end of the sequence. Thus:
trg = [sos, x_1, x_2, x_3, eos]
trg[:-1] = [sos, x_1, x_2, x_3]
This is also same as your implementation.
But actually many datasets collect sentences with different length, ans thus the last elements of sentences are tokens, such as:
trg = [sos, x_1, x_2, x_3, eos, pad, pad, pad]
trg[:-1] = [sos, x_1, x_2, x_3, eos, pad, pad]
In such a case, I can’t slice the token, may I ask how can I solve this issue?
The text was updated successfully, but these errors were encountered: