Skip to content

Conversation

@andportnoy
Copy link
Contributor

Proposed changes

Thread block cluster dimensions are not correctly updated by cudaGraphExecUpdate. Therefore, when clusters are used, we reinstantiate a cudaGraphExec rather than updating it.

Checklist

  • I have read the CONTRIBUTING document
  • I have run pre-commit run --all-files to format my code / installed pre-commit prior to committing changes
  • I have added tests that prove my fix is effective or that my feature works
    The change fixes existing tests.
  • I have updated the necessary documentation (if needed)

andportnoy and others added 2 commits November 21, 2025 15:22
Thread block cluster dimensions are not correctly updated by
cudaGraphExecUpdate. Therefore, when clusters are used, we
reinstantiate a cudaGraphExec rather than updating it.
@awni awni force-pushed the fix-cuda-graphs-update-clusters branch from f54794d to 9886dbf Compare November 22, 2025 14:15
@awni
Copy link
Member

awni commented Nov 22, 2025

Some benchmarks on B200

Inference with meta-llama/Meta-Llama-3.1-8B

Pre: prompt_tps=59185.146
Post: prompt_tps=59359.110

Pre: generation_tps=282.392
Post: generation_tps=282.671

Pre: toks_per_sec: 97961.0518
Post: toks_per_sec: 97452.2300

@awni
Copy link
Member

awni commented Nov 22, 2025

@andportnoy I changed this a bit more than I expected. Basically three changes:

  1. we don't bother cache the graph exec if the graph is not updatable since there is no point
  2. The condition for allowing the update are less conservative (if there is a non-singleton cluster in the x dimension of a sub graph with a single kernel node we can still update the graph).
  3. In order to achieve the above I encoded the cluster x dimension in the graph key which took a bit of rearranging to make it work nicely.

Copy link
Member

@awni awni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This resolves outstanding test failures! Thanks for the fix!!

@awni awni merged commit 3e05cea into ml-explore:main Nov 22, 2025
10 checks passed
Jckwind pushed a commit to TheProxyCompany/mlx that referenced this pull request Dec 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants