Skip to content

fix GQA#5

Merged
XZman merged 1 commit into
mainfrom
gqa_fix
Mar 23, 2026
Merged

fix GQA#5
XZman merged 1 commit into
mainfrom
gqa_fix

Conversation

@XZman

@XZman XZman commented Mar 23, 2026

Copy link
Copy Markdown
Collaborator

No description provided.

@XZman XZman requested a review from Copilot March 23, 2026 02:35
@XZman XZman merged commit 4cf9661 into main Mar 23, 2026
8 checks passed
@XZman XZman deleted the gqa_fix branch March 23, 2026 02:39

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to fix/enable GQA (grouped-query attention) modeling by threading a num_kv_heads parameter through the LLM attention op generators, updating attention block tensor shapes to reflect Q-heads vs KV-heads, and relaxing FlashAttention shape assumptions to permit Q_heads != KV_heads.

Changes:

  • Add num_kv_heads plumbing across attention entrypoints (create_multi_head_attention*) and propagate it from LLMOpsGenerator.
  • Update normal/self-attention (fwd/bwd) and flash-attention blocks to use KV-head shapes where applicable.
  • Relax FlashAttention shape assertions to only require K_heads == V_heads (allowing GQA/MQA).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File Description
neusim/npusim/frontend/llm_ops_lib.py Adds num_kv_heads throughout attention construction and introduces GQA-shaped einsums/softmax for attention blocks.
neusim/npusim/frontend/llm_ops_generator.py Passes self.num_kv_heads into attention creation for prefill/decode and fwd/bwd generation paths.
neusim/npusim/backend/npusim_lib.py Relaxes FlashAttention head-count assertion to allow Q-heads to differ from KV-heads.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread neusim/npusim/frontend/llm_ops_lib.py
Comment thread neusim/npusim/frontend/llm_ops_lib.py
Comment thread neusim/npusim/frontend/llm_ops_lib.py
Comment thread neusim/npusim/frontend/llm_ops_lib.py
Comment thread neusim/npusim/backend/npusim_lib.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants