You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First, thanks for publishing your implementation of this technique. It's been very helpful!
While stepping through the code, I think I may have found a small issue. The goal of this code seems to be: re-weight the loss contributed by each batch by the fraction of unmasked tokens it contains.
If that's the case, shouldn't curr_num_elements count all elements != constants.PAD_TOKEN_ID (-100), instead of -1?
Thanks!
The text was updated successfully, but these errors were encountered:
First, thanks for publishing your implementation of this technique. It's been very helpful!
While stepping through the code, I think I may have found a small issue. The goal of this code seems to be: re-weight the loss contributed by each batch by the fraction of unmasked tokens it contains.
If that's the case, shouldn't
curr_num_elements
count all elements != constants.PAD_TOKEN_ID (-100), instead of -1?Thanks!
The text was updated successfully, but these errors were encountered: