You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for releasing bart.base!
When I download open it, I see that the embedding shape implies a vocab size of 51,201, whereas bart.large had embedding size 50,264.
The first 50,260 lines of the dict.txt files are identical, but then they diverge. bart.large/dict.txt ends in madeupword0002 0
whereas bart.base/dict.txt continues to madeupword0938 0.
I'm assuming that these extra entries are identical, the same tokenizer can be used for both models, and that the bart.base embeddings can be truncated to 50,264. Is that correct? Thanks!
The text was updated successfully, but these errors were encountered:
Hey, yes, those extra symbols are just dummy tokens for making the matrix more efficient on GPU. you can delete those tokens and adjust the embed_tokens matrix manually.
Thanks for releasing bart.base!
When I download open it, I see that the embedding shape implies a vocab size of 51,201, whereas
bart.large
had embedding size 50,264.The first 50,260 lines of the
dict.txt
files are identical, but then they diverge.bart.large/dict.txt
ends inmadeupword0002 0
whereas
bart.base/dict.txt
continues tomadeupword0938 0
.I'm assuming that these extra entries are identical, the same tokenizer can be used for both models, and that the bart.base embeddings can be truncated to 50,264. Is that correct? Thanks!
The text was updated successfully, but these errors were encountered: