Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why have [PAD] tokens in the masked spans? #8

Closed
bminixhofer opened this issue Oct 21, 2021 · 3 comments
Closed

Why have [PAD] tokens in the masked spans? #8

bminixhofer opened this issue Oct 21, 2021 · 3 comments

Comments

@bminixhofer
Copy link

Hi, I was wondering what's the rationale for having [PAD] tokens in masked spans of length more than one instead of just removing the remaining tokens? Here:

new_tokens[j] = "[QUESTION]"
masked_spans.append(MaskedSpanInstance(index=j,
begin_label=unmasked_span_beginning,
end_label=unmasked_span_ending))
num_predictions += 1
else:
new_tokens[j] = "[PAD]"
input_mask[j] = 0

Is the reason just computational efficiency?

@oriram
Copy link
Owner

oriram commented Oct 21, 2021

Hi @bminixhofer, thanks for expressing your interest in Splinter :)
The main reason to do that is to keep the same span start and end indices. Otherwise, masking the span changes the indices of all other spans.
Note that these pad tokens aren't attended to, and that position embeddings are only calculated w.r.t "valid" (non-pad) tokens, so overall this implementation is equivalent to removing them.
Hope my answer is clear :)

@bminixhofer
Copy link
Author

Thanks for the quick answer! I didn't know tokens which are not attended to are ignored by positional embeddings, it makes sense then.

@oriram
Copy link
Owner

oriram commented Oct 22, 2021

It's not by default, I implemented it this way for this reason :) @bminixhofer

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants