You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What if any issues would occur if bert-large was used? For example gpu requirements and training time? would it be too costly? Any reason why bert-base was used instead of bert-large?
The text was updated successfully, but these errors were encountered:
I'm also guessing that Yang Liu used bert-base instead of bert-large because bert-large would require more gpu, memory, and training time. Maybe using bert-large wouldn't result in greater improvements in performance, but I don't think the original paper talks about that. There aren't ablation studies about this in particular, but just my guess.
What if any issues would occur if bert-large was used? For example gpu requirements and training time? would it be too costly? Any reason why bert-base was used instead of bert-large?
The text was updated successfully, but these errors were encountered: