You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I was wondering what's the rationale for having [PAD] tokens in masked spans of length more than one instead of just removing the remaining tokens? Here:
Hi @bminixhofer, thanks for expressing your interest in Splinter :)
The main reason to do that is to keep the same span start and end indices. Otherwise, masking the span changes the indices of all other spans.
Note that these pad tokens aren't attended to, and that position embeddings are only calculated w.r.t "valid" (non-pad) tokens, so overall this implementation is equivalent to removing them.
Hope my answer is clear :)
Hi, I was wondering what's the rationale for having
[PAD]
tokens in masked spans of length more than one instead of just removing the remaining tokens? Here:splinter/pretraining/masking.py
Lines 316 to 323 in 1df4c13
Is the reason just computational efficiency?
The text was updated successfully, but these errors were encountered: