Are the weights of the lm head of the model tied with the word embeddings? #138

lvcc2018 · 2023-03-06T16:02:58Z

Thanks for the amazing work.
I wonder whether the weights of the lm head of the model are tied with the word embeddings of the model. From the code, it seems that they are not tied.

lvcc2018 · 2023-03-06T16:44:52Z

And I am curious about the difference between tied or untied word embeddings. Does it have an effect on training stability?

benob · 2023-03-08T12:14:29Z

Long ago at the time of LSTMs, I used to train a lot of LMs and tying embedding weights was a game changer: much faster convergence and less memory usage. I don't know about nowadays though.

To answer your first question, I looked at the weights of the tok_embeddings and output tensors in the checkpoint files and they differ. So tying was not used for training.

glample · 2023-03-10T14:44:19Z

if by "lm head" you are referring to the output layer on top of the transformer (the Linear(hidden_dim, vocab_size)), then no, they are not shared with the input word embeddings.

wuwuwuxxx · 2024-02-23T08:58:49Z

I want to know if there are some particular reasons to not use weight tying? @glample

lvcc2018 closed this as completed Mar 11, 2023

zigzagcai mentioned this issue Jul 11, 2024

[QA] Internevo是否支持tied_embedding? InternLM/InternEvo#267

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Are the weights of the lm head of the model tied with the word embeddings? #138

Are the weights of the lm head of the model tied with the word embeddings? #138

lvcc2018 commented Mar 6, 2023

lvcc2018 commented Mar 6, 2023

benob commented Mar 8, 2023

glample commented Mar 10, 2023

wuwuwuxxx commented Feb 23, 2024

Are the weights of the lm head of the model tied with the word embeddings? #138

Are the weights of the lm head of the model tied with the word embeddings? #138

Comments

lvcc2018 commented Mar 6, 2023

lvcc2018 commented Mar 6, 2023

benob commented Mar 8, 2023

glample commented Mar 10, 2023

wuwuwuxxx commented Feb 23, 2024