Bug Report for multi-headed-self-attention

Bug Report for https://neetcode.io/problems/multi-headed-self-attention

Please describe the bug below and include any steps to reproduce the bug or screenshots if possible.

Your multihead attention implementation is wrong (you should first calculate complete q, k, v  from embedding using w_k, w_q, w_v, and then divide them into the heads and calculate the attention)
In your solution, embeddings are divided into heads, and then weights are multiplied, which is not the standard transformer implementation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug Report for multi-headed-self-attention #5130

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Bug Report for multi-headed-self-attention #5130

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions