You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here, the corrupted tokens are produced in generator as fake data. I can understand why we should deal with the duplicated positions and only appy it once. However, I am confused about the below implementation take average value of corrupted token ids of duplicated, and what's the intuition behind it?
scatter_nd sums up values with the same index, but we want to just pick a single value per index. If the values for an index are the same (e.g., we are just scattering a mask tensor of 1s), then dividing the summed values by the number of occurrences at that index fixes the issue. That's what is implemented in the code you linked to.
However, this does NOT fully fix having duplicate mask positions. It does stop there from being errors due to overflowing above the vocab size, but if (1) the same position is sampled twice and (2) the generator samples different tokens for the position then the replaced token will be a "random" token obtained by averaging the two sampled token ids. I didn't bother fixing this issue when developing ELECTRA because this occurs for a pretty small fraction of masked positions. But there actually is an easy fix: replace the sampling step (masked_lm_positions = tf.random.categorical... in pretrain_helpers.mask) with
Here, the corrupted tokens are produced in generator as fake data. I can understand why we should deal with the duplicated positions and only appy it once. However, I am confused about the below implementation take average value of corrupted token ids of duplicated, and what's the intuition behind it?
electra/pretrain/pretrain_helpers.py
Lines 101 to 103 in 19175a1
The text was updated successfully, but these errors were encountered: