Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transformer实现默认用的Pre-Norm? #51

Open
KKIverson opened this issue Mar 5, 2024 · 1 comment
Open

Transformer实现默认用的Pre-Norm? #51

KKIverson opened this issue Mar 5, 2024 · 1 comment

Comments

@KKIverson
Copy link

2.1.5中EncoderLayer里涉及到add&norm的操作,貌似都是先对原始输入做norm,然后将norm后的数据输入MHA/FFN,然后再和原始输入相加,这不是pre-norm的操作吗?按理说原生的Transformer不都是post-norm嘛~

@YingxuanW
Copy link

YingxuanW commented Mar 21, 2024

看到这里同样有疑问,个人会这样实现:
x = norm(x + attention(x,x,x,mask))
x =norm(x + ffn(x))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants