Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

weight init in embedding_config should depend on embedding_dim #1824

Open
di-wnd opened this issue Mar 24, 2024 · 8 comments
Open

weight init in embedding_config should depend on embedding_dim #1824

di-wnd opened this issue Mar 24, 2024 · 8 comments

Comments

@di-wnd
Copy link

di-wnd commented Mar 24, 2024

What

Currently by default, the min and max weight init for EmbeddingConfig is "sqrt(1 / self.num_embeddings)" instead of "sqrt(1 / self.embedding_dim)":

https://github.com/zainhuda/torchrec/blob/4ea42e2f44f4485fd35865323236a0b277873216/torchrec/modules/embedding_configs.py#L175

I believe it should be the latter, from embedding vector normalization perspective?
Screenshot 2024-04-18 at 10 09 03 PM

@PaulZhang12
Copy link
Contributor

Hmm, fair point. Put out a PR?

@di-wnd
Copy link
Author

di-wnd commented Apr 19, 2024

Hmm, fair point. Put out a PR?

Sg. Any unit test scripts that I should be changing at the same time? @PaulZhang12

@di-wnd
Copy link
Author

di-wnd commented Apr 19, 2024

PR here: #1902

@henrylhtsang
Copy link
Contributor

for context I think DLRM init it with 1 / num_embeddings

https://github.com/facebookresearch/dlrm/blob/main/dlrm_s_pytorch.py#L281

@di-wnd
Copy link
Author

di-wnd commented Apr 22, 2024

for context I think DLRM init it with 1 / num_embeddings

https://github.com/facebookresearch/dlrm/blob/main/dlrm_s_pytorch.py#L281

Thanks for the cross ref. Does the switch to embedding_dim make sense to you in general? @henrylhtsang

@henrylhtsang
Copy link
Contributor

@di-wnd I think @PaulZhang12 brought it up, and it seems the result of that discussion is that its better to change user input than to change the default

@di-wnd
Copy link
Author

di-wnd commented Apr 22, 2024

@di-wnd I think @PaulZhang12 brought it up, and it seems the result of that discussion is that its better to change user input than to change the default

@henrylhtsang Sry but I actually disagree. The default behavior should be at least reasonable such that any training program that leverages this default implementation won't suffer from a "bad init" issue that could potentially degrade model performance. Imagine that in practice most of the users won't bother manually setting "weight_init_max" and "weight_init_min" for this dataclass, and in case when num_embeddings become large (say beyond 1M) all the corresponding vector values will be initialized with a 'tiny' number, which might have an impact on the training efficiency.

@henrylhtsang
Copy link
Contributor

henrylhtsang commented Apr 22, 2024

@di-wnd I actually don't know if it would affect training efficiency. Maybe worth checking if it would affect accuracy and metrics. The DLRM model leads me to lead it needs more evidence

EDIT: fixing typo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants