Skip to content

When using multihead_attention, why does the queries are normalized while keys and values are not ? #33

@YTZ01

Description

@YTZ01

for i in range(len(self.attention_layers)):
seqs = torch.transpose(seqs, 0, 1)
Q = self.attention_layernormsi
mha_outputs, _ = self.attention_layers[i](Q, seqs, seqs,
attn_mask=attention_mask)
# key_padding_mask=timeline_mask
# need_weights=False) this arg do not work?
seqs = Q + mha_outputs
seqs = torch.transpose(seqs, 0, 1)
In the SASRec paper, Ⅲ. Methodology part B.Self-Attention Block, the formula uses the same embedding object as queries, keys and values, then converts it through linear projections. Why does queries are normalized, while keys and values are not in the code?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions