Skip to content

Conversation

@am17an
Copy link
Collaborator

@am17an am17an commented Dec 1, 2025

It seems like these forward expands were not required for fusion as these operations already arrive in order. Adding these statements causes splits to change in multi-GPU setups causing performance drops (see #16912), removing them should fix these issues

@am17an am17an requested a review from CISC as a code owner December 1, 2025 04:22
@am17an am17an requested review from ggerganov and removed request for CISC December 1, 2025 04:22
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Dec 1, 2025
@am17an am17an requested a review from CISC December 1, 2025 04:22
@ggerganov ggerganov merged commit 6eea666 into ggml-org:master Dec 1, 2025
72 of 74 checks passed
@am17an am17an deleted the cuda-reorder-gemv branch December 1, 2025 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants