You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I believe that, although using learnable positional embeddings is the trend nowadays, it would help to use fixed embeddings (sinusoidal, as in the original implementation), in relatively small dataset scenarios, where it would be hard to learn a meaningful embedding. At least, it would be interesting to compare both methods.
I see you included fixed embeddings in the reformer implementation, but don't you think it would be more efficient to calculate them once during the initialization? (like here)
Btw, I read a cool paper that compares fixed positional ambeddings and the ones learned by BERT, GPT2 and roBERTa.
If you prefer, I could do a PR on this adding the implementation in the above pytorch tutorial but it is no big deal.
The text was updated successfully, but these errors were encountered:
@gulnazaki yea sure, I would welcome a PR on that :D I'll check out the paper you recommended tonight, thank you! Another good one I read recently is https://arxiv.org/abs/2006.15595
I believe that, although using learnable positional embeddings is the trend nowadays, it would help to use fixed embeddings (sinusoidal, as in the original implementation), in relatively small dataset scenarios, where it would be hard to learn a meaningful embedding. At least, it would be interesting to compare both methods.
I see you included fixed embeddings in the reformer implementation, but don't you think it would be more efficient to calculate them once during the initialization? (like here)
Btw, I read a cool paper that compares fixed positional ambeddings and the ones learned by BERT, GPT2 and roBERTa.
If you prefer, I could do a PR on this adding the implementation in the above pytorch tutorial but it is no big deal.
The text was updated successfully, but these errors were encountered: