Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LayerNorm #9

Closed
jingweiz opened this issue Nov 26, 2020 · 1 comment
Closed

LayerNorm #9

jingweiz opened this issue Nov 26, 2020 · 1 comment

Comments

@jingweiz
Copy link

Dear Juho,
Thanks for making the code public!
One quick question, if I read the code correctly, LayerNorm was never used in any of the three examples you opensourced here in this repo is that correct?
If so, is it because they give bit inferior performances? And have you tried moving the LayerNorm layer inside the skip connections instead of before/after the skip connections like done in several more recent papers such that you have an connection directly from output to input?
Thanks in advance and looking forward to your reply!

@juho-lee
Copy link
Owner

We empirically found that the multiple stacks of ISABs won't train with LayerNorm for some data, and the results didn't degrade too much without LayerNorm even for the ones working with it, so decided not to include it.
The recent result (https://arxiv.org/abs/2002.04745, actually not really recent anymore) says that moving the position of the LayerNorm before the attention improves the performance, so if you are to apply it you might consider this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants