Skip to content

Bug Report for multi-headed-self-attention #5130

@JanaksinhVen

Description

@JanaksinhVen

Bug Report for https://neetcode.io/problems/multi-headed-self-attention

Please describe the bug below and include any steps to reproduce the bug or screenshots if possible.

Your multihead attention implementation is wrong (you should first calculate complete q, k, v from embedding using w_k, w_q, w_v, and then divide them into the heads and calculate the attention)
In your solution, embeddings are divided into heads, and then weights are multiplied, which is not the standard transformer implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions