native/ops/matmul/
└── matmul.cu (664 lines - all dispatch)
native/ops/matmul/
├── dispatch.cu (main dispatcher, ~100 lines)
├── dispatch_f32.cu (FP32/TF32 dispatch)
├── dispatch_bf16.cu (BF16 dispatch)
├── dispatch_fp8.cu (FP8 dispatch)
└── dispatch_nvf4.cu (NVF4 dispatch)
Problem
native/ops/matmul/matmul.cuis 664 lines with dispatch logic for all dtypes.Current State
Proposed Structure
Benefits
Related