CUDA: handle OW > 65535 in im2col (2D and 3D) by CrispStrobe · Pull Request #22944 · ggml-org/llama.cpp

CrispStrobe · 2026-05-11T13:14:11Z

im2col_cuda and im2col_3d_cuda both dispatch with
block_nums.y = OW. CUDA caps grid Y at 65535. Conv1d encoders on
raw 16 kHz audio with T > 65535 (~ 4 s) trip the limit — e.g. SEANet
at 11 s lands at OW = 176000 — and the launch returns
invalid configuration argument.

Fix: clamp block_nums.y to MIN(OW, MAX_GRIDDIM_Y) and loop inside
the kernel with stride MAX_GRIDDIM_Y. Same in-kernel stride pattern
already used for the z axis (MAX_GRIDDIM_Z). Both 2D im2col_kernel
and 3D im2col_3d_kernel need the same fix. Bit-identical for
OW ≤ 65535 (single iteration of the new outer loop).

Verification

Tested on T4 / Jetson Orin with a SEANet encoder running on 11 s /
16 kHz audio (im2col reaching OW ≈ 176000); pre-fix launch returns
invalid configuration argument, post-fix runs to completion.
Existing test-backend-ops im2col cases unchanged.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: AI assisted (mechanical port, formatting, description draft) re-applying existing fork patch to llama.cpp's tree.

`im2col_cuda` and `im2col_3d_cuda` both dispatch with `block_nums.y = OW`. CUDA caps grid Y at 65535. Conv1d encoders on raw 16 kHz audio with T > 65535 (~ 4 s) trip the limit -- e.g. SEANet at 11 s lands at OW = 176000 -- and the launch returns `invalid configuration argument`. Clamp `block_nums.y` to `MIN(OW, MAX_GRIDDIM_Y)` and loop inside the kernel with stride `MAX_GRIDDIM_Y`. Same in-kernel stride pattern already used for the z axis (`MAX_GRIDDIM_Z`). Both 2D `im2col_kernel` and 3D `im2col_3d_kernel` need the same fix. Bit-identical for OW <= 65535 (single iteration of the new outer loop). Tested on T4 / Jetson Orin with a SEANet encoder running on 11 s / 16 kHz audio (im2col reaching OW ~ 176000); pre-fix launch returns `invalid configuration argument`, post-fix runs to completion. Existing test-backend-ops im2col cases unchanged.

@CISC

@CISC closed our ggml-org/ggml#1485 with the steer that CUDA changes belong in ggml-org/llama.cpp instead. The src/ggml-cuda/ commit log on ggml master confirms it — 100% (llama/NNNNN) sync commits, no direct CUDA PRs in months. ggml's own README points to llama.cpp as the development source-of-truth. Refiled the im2col OW > 65535 fix as ggml-org/llama.cpp#22944. Tracker updates: - tools/upstream-prs/README.md: new "Which repo to file against" routing table (CUDA / Vulkan → llama.cpp; CPU + ggml.c → ggml; Metal works either way); per-repo title conventions; flag for llama.cpp's stricter AI-content policy. - UPSTREAM.md: log entry for the redirect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Green-Sky · 2026-05-11T15:21:00Z

leejet/ggml@7f4ab36

This commit is suspiciously close, almost identical, and was committed ~14 hours earlier.

Might be a coincidence, also might not.

JohannesGaessler · 2026-05-11T15:32:02Z

@CrispStrobe can you comment on this?

cc: @leejet

CrispStrobe · 2026-05-11T17:10:46Z

well i had this patch from commit 1552434d at 2026-04-23 13:24 +0200 in CrispASR repo which is MIT licensed so it is perfectly ok for me when others post such patches upstream earlier than me - or just find it indepedently (it is the simplest straightforward application of the MAX_GRIDDIM_Z pattern already in im2col.cu anyway), i don't know which, and am ok with either of course.

Green-Sky · 2026-05-11T17:35:35Z

@CrispStrobe Thanks for clarifying, looks good and significantly older (with dated gh action too).

JohannesGaessler · 2026-05-11T17:49:13Z

Thank you for clarifying. For context, it's usually fine to take commits from other repositories but this should be disclosed.

CrispStrobe · 2026-05-11T17:59:09Z

@JohannesGaessler i am not sure if you mean i should have said that i already have this patch in another of my repos. if so, that is fine by me. i would have sent this earlier but you have that rule for only one PR at a time from newbies, and another patch was only recently accepted... (and i am sitting on a few more...)

JohannesGaessler · 2026-05-11T18:05:53Z

If you did not write the code yourself, just add a short blurb like "I took this code from ...".

CrispStrobe · 2026-05-11T18:07:48Z

ah well but i did write it of course (2 weeks ago).

JohannesGaessler · 2026-05-11T18:14:02Z

Ah sorry, I misread your earlier, clarifying post.

CrispStrobe requested a review from a team as a code owner May 11, 2026 13:14

CrispStrobe mentioned this pull request May 11, 2026

ggml-cuda : handle OW > 65535 in im2col (2D and 3D) ggml-org/ggml#1485

Closed

This comment was marked as resolved.

Sign in to view

JohannesGaessler approved these changes May 11, 2026

View reviewed changes

github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels May 11, 2026

am17an approved these changes May 11, 2026

View reviewed changes

JohannesGaessler merged commit 8e1f9d0 into ggml-org:master May 11, 2026
47 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: handle OW > 65535 in im2col (2D and 3D)#22944

CUDA: handle OW > 65535 in im2col (2D and 3D)#22944
JohannesGaessler merged 1 commit into
ggml-org:masterfrom
CrispStrobe:cuda-im2col-ow

CrispStrobe commented May 11, 2026

Uh oh!

This comment was marked as resolved.

Green-Sky commented May 11, 2026

Uh oh!

JohannesGaessler commented May 11, 2026

Uh oh!

CrispStrobe commented May 11, 2026 •

edited

Loading

Uh oh!

Green-Sky commented May 11, 2026

Uh oh!

Uh oh!

JohannesGaessler commented May 11, 2026

Uh oh!

CrispStrobe commented May 11, 2026

Uh oh!

JohannesGaessler commented May 11, 2026

Uh oh!

CrispStrobe commented May 11, 2026 •

edited

Loading

Uh oh!

JohannesGaessler commented May 11, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

CrispStrobe commented May 11, 2026

Verification

Requirements

Uh oh!

This comment was marked as resolved.

Green-Sky commented May 11, 2026

Uh oh!

JohannesGaessler commented May 11, 2026

Uh oh!

CrispStrobe commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Green-Sky commented May 11, 2026

Uh oh!

Uh oh!

JohannesGaessler commented May 11, 2026

Uh oh!

CrispStrobe commented May 11, 2026

Uh oh!

JohannesGaessler commented May 11, 2026

Uh oh!

CrispStrobe commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

CrispStrobe commented May 11, 2026 •

edited

Loading

CrispStrobe commented May 11, 2026 •

edited

Loading

JohannesGaessler commented May 11, 2026 •

edited

Loading