-
Notifications
You must be signed in to change notification settings - Fork 24.7k
DOC Improve documentation for LayerNorm #63144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
In this [commit](pytorch@7026995), the [Line 134](https://github.com/deniskokarev/pytorch/blob/47e286d024c183cb26a464447b34fde88b80d17d/torch/nn/modules/normalization.py#L134) will overwrite the "embedding" variate which would cause an error when initiating `nn.LayerNorm` function. I suggest renaming the "embedding" in [Line 133](https://github.com/deniskokarev/pytorch/blob/47e286d024c183cb26a464447b34fde88b80d17d/torch/nn/modules/normalization.py#L133) to "embedding_dim". The final example is: ``` batch, sentence_length, embedding_dim = 20, 5, 10 embedding = torch.randn(batch, sentence_length, embedding_dim) layer_norm = nn.LayerNorm(embedding_dim) ```
Hi @aspenstarss! Thank you for your pull request and welcome to our community. Action RequiredIn order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you. ProcessIn order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA. Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks! |
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit db20e44 (more details on the Dr. CI page):
🕵️ 4 new failures recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
Job | Step | Action |
---|---|---|
Run mypy | 🔁 rerun | |
Fail if there were any warnings | 🔁 rerun | |
Unzip test reports | 🔁 rerun |
ci.pytorch.org: 1 failed
This comment was automatically generated by Dr. CI (expand for details).
Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group.
I have signed the corporate CLA. |
Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks! |
I suggest, docs should also specify what is the shape of the used mean/var and affine weights. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution! I like @vadimkantorov's suggestion above to add shape information for the learnable affine weights.
Ideally there should be an Attributes:
section in the docs mentioning that weight
and bias
are defined when elementwise_affine=True
with shape normalized_shape
. (see Linear's docs as an example)
add an `Attributes` section in the docs to specify the shape of `weight` and `bias`
Greate! I have added an This is my first-time commit code to an open-source repository. Although I improve documentation only, it's a great process. Thanks for your reply! @vadimkantorov @jbschlosser |
delete a blank line that contains whitespace which seems can't pass the automatic testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the fix :)
@jbschlosser has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
>>> # Activate module | ||
>>> layer_norm(embedding) | ||
>>> | ||
>>> # Image Example | ||
>>> N, C, H, W = 20, 5, 10, 10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, is this image example realistic? I think almost all the time, LayerNorm is applied over only the embedding dim (and aggregates away the temporal / spatial dims)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is. As the figure mentioned in here, LayerNorm normalizes over the last three (C, H, W) dimensions in compute vision. In my opinion, LayerNorm should normalize over the last two (seq_len, emb_dim) dimensions in NLP following the original paper. However, the implementation of LayerNorm in the most popular NLP project (such as AllenNLP is applied over only the embedding dim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the GroupNorm paper isn't a good reference, because LayerNorm seems to be used differently these days (as you mention, usually in NLP only normalising over embedding_dim).
I think for image example, some realistic example is needed from some established / representative / recent model / paper. Otherwise, it is confusing.
Problem with normalising over height and width as well is that the affine parameters are then tied to fixed height and width and won't support other dimensions. I don't think this is the most common scenario. Or at least this should be discussed.
@jbschlosser merged this pull request in 72bc6dc. |
Summary: In this [commit](7026995) and [issue](#59178 (comment)), the [Line 134](https://github.com/deniskokarev/pytorch/blob/47e286d024c183cb26a464447b34fde88b80d17d/torch/nn/modules/normalization.py#L134) will overwrite the "embedding" variate which would cause an error when initiating `nn.LayerNorm` function. I suggest renaming the "embedding" in [Line 133](https://github.com/deniskokarev/pytorch/blob/47e286d024c183cb26a464447b34fde88b80d17d/torch/nn/modules/normalization.py#L133) to "embedding_dim". The final example is: ``` batch, sentence_length, embedding_dim = 20, 5, 10 embedding = torch.randn(batch, sentence_length, embedding_dim) layer_norm = nn.LayerNorm(embedding_dim) ``` Fixes #{59178} Pull Request resolved: #63144 Reviewed By: bdhirsh Differential Revision: D30288778 Pulled By: jbschlosser fbshipit-source-id: e74b11430e302dae5661bf6e830ee5ac6c1838c4
In this commit and issue, the Line 134 will overwrite the "embedding" variate which would cause an error when initiating
nn.LayerNorm
function.I suggest renaming the "embedding" in Line 133 to "embedding_dim".
The final example is:
Fixes #{59178}