The Encoder implementation is different from the original "Attention is all need" paper? #5

chaoyanghe · 2020-11-19T06:47:09Z

Hi, I checked your code at

Line 154 in 878ebc5

class Block(nn.Module):

.

Your implementation is: Attention(LayerNorm(x)) + x, but the original Transformer is: LayerNorm(x +Attention(x)). Is this an error or deliberately implemented like this?

jeonsworld · 2020-11-20T15:29:52Z

As in paper On Layer Normalization in the Transformer Architecture, the position of Layer Normalization in Transformer implementation is used as pre-LN and post-LN. For example, Transformer Encoder-based BERT uses post-LN, but Vision Transformer uses pre-LN.
In conclusion, that implementation is correct.

jeonsworld · 2020-11-20T15:33:31Z

Additional comments.
Attention is all you need uses post-LN.

jeonsworld · 2020-11-20T15:36:52Z

I believe the issue has been answered and close the issue.

chaoyanghe · 2020-11-21T06:19:54Z

Thank you!

jeonsworld closed this as completed Nov 20, 2020

superxiaoying mentioned this issue Feb 22, 2021

Errors when use custom data to retrain the Vit-transformer #17

Open

mucunwuxian mentioned this issue Jun 8, 2021

ADD vit axinc-ai/ailia-models#275

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Encoder implementation is different from the original "Attention is all need" paper? #5

The Encoder implementation is different from the original "Attention is all need" paper? #5

chaoyanghe commented Nov 19, 2020 •

edited

jeonsworld commented Nov 20, 2020 •

edited

jeonsworld commented Nov 20, 2020

jeonsworld commented Nov 20, 2020

chaoyanghe commented Nov 21, 2020

The Encoder implementation is different from the original "Attention is all need" paper? #5

The Encoder implementation is different from the original "Attention is all need" paper? #5

Comments

chaoyanghe commented Nov 19, 2020 • edited

jeonsworld commented Nov 20, 2020 • edited

jeonsworld commented Nov 20, 2020

jeonsworld commented Nov 20, 2020

chaoyanghe commented Nov 21, 2020

chaoyanghe commented Nov 19, 2020 •

edited

jeonsworld commented Nov 20, 2020 •

edited