Skip to content

Comments

Faster copy for col contig to row contig#2917

Merged
awni merged 2 commits intoml-explore:mainfrom
awni:copy_col_row
Dec 18, 2025
Merged

Faster copy for col contig to row contig#2917
awni merged 2 commits intoml-explore:mainfrom
awni:copy_col_row

Conversation

@awni
Copy link
Member

@awni awni commented Dec 16, 2025

Adds a specialized copy from col-contiguous to row-contiguous arrays.

Benchmark on an H100:

Pre: 818 GB/s
Post: 2191.62 GB/s

The benchmark is below:

import time
import mlx.core as mx

t = mx.bfloat16
D = 4096
x = mx.random.normal(shape=(D, D)).astype(t)

def fun(x):
    for _ in range(20):
        x = mx.contiguous(x.T)
    mx.eval(x)

for _ in range(20):
    fun(x)

tic = time.time()
for _ in range(20):
    fun(x)
toc = time.time()

gb = 20 * 20 * x.nbytes * 2 / 1e9
s = toc - tic
gbps = gb / s
print(f"{gbps=:.3f}")

@awni awni requested review from angeloskath and zcbenz December 16, 2025 22:54
Copy link
Member

@angeloskath angeloskath left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Left a couple of comments.

@awni awni force-pushed the copy_col_row branch 3 times, most recently from b9ab897 to 6cacfa8 Compare December 18, 2025 00:14
@awni awni merged commit 116fda6 into ml-explore:main Dec 18, 2025
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants