Sync master with upstream release b5145 #59

jan-service-account · 2025-04-17T00:08:28Z

Updates dev branch with latest release (b5145) from ggml-org/llama.cpp

…gml-org#12931) The grouped query attention optmization doesn't require a power of two ratio, the only thing relying on it was the modulo operation written as bitwise &. split_k need not depend on gqa_ratio - enable it any time there's only one workgroup in the X dimension. The shader gets the split index from the x coord, and multiple workgroups in the X dimension (pre-split) indicates a larger FA operation that wouldn't need splitting.

noemotiovon and others added 3 commits April 16, 2025 16:21

CANN: Add 310P operator support check (ggml-org#12962)

b43d89e

opencl: fix incorrect local_size index in profiling log (ggml-org#12868)

12b1750

jan-service-account merged commit e9c6088 into dev Apr 17, 2025
9 checks passed

jan-service-account deleted the update-dev-from-master-2025-04-17-00-08 branch April 17, 2025 00:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sync master with upstream release b5145 #59

Sync master with upstream release b5145 #59

Uh oh!

jan-service-account commented Apr 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Sync master with upstream release b5145 #59

Sync master with upstream release b5145 #59

Uh oh!

Conversation

jan-service-account commented Apr 17, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants