Transformer tutorial multiplying with sqrt(d_model)

https://github.com/pytorch/tutorials/blob/5e772fa2bf406598103e61e628a0ca0b8e471bfa/beginner_source/translation_transformer.py#L135

src = self.embedding(src) * math.sqrt(self.d_model)

shouln't this be

src = self.embedding(src) / math.sqrt(self.d_model)

at least that is the impression I got when reading the "Attention is all you need" paper.
Or is there some new research finding that multiplying is better?




cc @sekyondaMeta @svekars @kit1980 @subramen @albanD

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transformer tutorial multiplying with sqrt(d_model) #2849

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Transformer tutorial multiplying with sqrt(d_model) #2849

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions