Skip to content

Add native CUDA kernels, quantized matmul, and kernel error handling#3

Merged
erfanzar merged 4 commits into
mainfrom
offdays
Feb 4, 2026
Merged

Add native CUDA kernels, quantized matmul, and kernel error handling#3
erfanzar merged 4 commits into
mainfrom
offdays

Conversation

@erfanzar
Copy link
Copy Markdown
Owner

@erfanzar erfanzar commented Feb 4, 2026

Summary

  • Add native CUDA backends for flash attention (paged-KV), quantized matmul, blocksparse attention, unified attention, and ragged page attention v3 with build/FFI wiring.
  • Add quantization utilities and codegen dequant kernels across group sizes/bits; switch Triton NF4 to exact table lookup.
  • Introduce structured kernel validation/unsupported-feature errors and benchmark harness updates.

Testing

  • Not run (not requested).

… NF4 table lookup

Introduce EjkernelRuntimeError for consistent unsupported-feature
reporting across all kernel backends. The kernel registry now wraps
implementations with a validation layer that detects unsupported
parameters via AST inspection and raises structured errors before
dispatch.

CUDA quantized matmul gains cuBLASLt and CUTLASS GEMM backends
(selectable via EJKERNEL_QMM_CUDA_GEMM env var), group-size-specialized
dequantization kernels for sizes 8-1024, and 4-elements-per-thread
throughput optimization. The Triton NF4 kernel replaces its polynomial
approximation with an exact codebook table lookup.

CUDA flash attention is simplified by removing the CUTLASS autotuning
infrastructure in favor of direct dispatch with paged-KV cache support.
The executor now auto-prefers native CUDA implementations on NVIDIA GPUs
when available.

Additional changes:
- Expand affine quantization group_size support to {8..512}
- Add benchmark platform ignore-list (EJKERNEL_BENCH_IGNORE_PLATFORMS)
- Update tests to expect EjkernelRuntimeError for unsupported features
- Improve docstrings across callib, kernels, ops, types, and utilities
@erfanzar erfanzar merged commit 3273b1a into main Feb 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant