You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, the method we used here is applied on already pre-trained model during fine-tuning stage. I am not sure if it can be applied during pre-training stage. Maybe it could be better to train the models with architecture modified for longer texts like BigBird or Longformer if you want to pre-train it from scratch.
Is it possible to apply the same logic on Roberta for the MaskedLM model? I need it for pretraining on a custom dataset that has long texts - Thanks
The text was updated successfully, but these errors were encountered: