Fix Vec2/Vec4 UVM performance regression with vectorized at::Half copy by q10 · Pull Request #5491 · pytorch/FBGEMM

q10 · 2026-03-18T18:17:53Z

Summary:
Apply vectorized copy optimization pattern for at::Half types in Vec2 and Vec4 classes for ROCm. This ensures at::Half copy operations use efficient 32-bit or 64-bit memory operations instead of scalar element-by-element access.

With UVM (managed memory), each separate copy can trigger a page fault, causing significant slowdown. Using vectorized operations reduces this overhead.

Reviewed By: spcyppt

Differential Revision: D96381299

Summary: Apply vectorized copy optimization pattern for at::Half types in Vec2 and Vec4 classes for ROCm. This ensures at::Half copy operations use efficient 32-bit or 64-bit memory operations instead of scalar element-by-element access. With UVM (managed memory), each separate copy can trigger a page fault, causing significant slowdown. Using vectorized operations reduces this overhead. Reviewed By: spcyppt Differential Revision: D96381299

meta-codesync · 2026-03-18T18:18:10Z

@q10 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D96381299.

meta-codesync · 2026-03-19T18:22:49Z

This pull request has been merged in fc7c8f2.

meta-cla Bot added the cla signed label Mar 18, 2026

facebook-github-tools Bot added the module: rocm label Mar 18, 2026

meta-codesync Bot added fb-exported meta-exported labels Mar 18, 2026

q10 added fb-exported cla signed module: rocm meta-exported labels Mar 18, 2026

meta-codesync Bot closed this in fc7c8f2 Mar 19, 2026

facebook-github-tools Bot added the Merged label Mar 19, 2026

gchalump added category:fix contributor:Meta feature:better-engineering labels May 14, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Vec2/Vec4 UVM performance regression with vectorized at::Half copy#5491

Fix Vec2/Vec4 UVM performance regression with vectorized at::Half copy#5491
q10 wants to merge 1 commit into
pytorch:mainfrom
q10:export-D96381299

q10 commented Mar 18, 2026

Uh oh!

meta-codesync Bot commented Mar 18, 2026

Uh oh!

meta-codesync Bot commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

q10 commented Mar 18, 2026

Uh oh!

meta-codesync Bot commented Mar 18, 2026

Uh oh!

meta-codesync Bot commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants