Add native CUDA kernels, quantized matmul, and kernel error handling by erfanzar · Pull Request #3 · erfanzar/ejkernel

erfanzar · 2026-02-04T18:35:57Z

Summary

Add native CUDA backends for flash attention (paged-KV), quantized matmul, blocksparse attention, unified attention, and ragged page attention v3 with build/FFI wiring.
Add quantization utilities and codegen dequant kernels across group sizes/bits; switch Triton NF4 to exact table lookup.
Introduce structured kernel validation/unsupported-feature errors and benchmark harness updates.

Testing

Not run (not requested).

…ttention

… NF4 table lookup Introduce EjkernelRuntimeError for consistent unsupported-feature reporting across all kernel backends. The kernel registry now wraps implementations with a validation layer that detects unsupported parameters via AST inspection and raises structured errors before dispatch. CUDA quantized matmul gains cuBLASLt and CUTLASS GEMM backends (selectable via EJKERNEL_QMM_CUDA_GEMM env var), group-size-specialized dequantization kernels for sizes 8-1024, and 4-elements-per-thread throughput optimization. The Triton NF4 kernel replaces its polynomial approximation with an exact codebook table lookup. CUDA flash attention is simplified by removing the CUTLASS autotuning infrastructure in favor of direct dispatch with paged-KV cache support. The executor now auto-prefers native CUDA implementations on NVIDIA GPUs when available. Additional changes: - Expand affine quantization group_size support to {8..512} - Add benchmark platform ignore-list (EJKERNEL_BENCH_IGNORE_PLATFORMS) - Update tests to expect EjkernelRuntimeError for unsupported features - Improve docstrings across callib, kernels, ops, types, and utilities

…functions

erfanzar added 4 commits January 30, 2026 23:44

feat: add native CUDA backend, quantized matmul, and paged-KV flash a…

75b2d92

…ttention

fix: update FlashBias type to improve type safety in flash attention …

d9babf6

…functions

fix: update FlashBias type to improve type safety in flash attention …

51923db

…functions

erfanzar merged commit 3273b1a into main Feb 4, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add native CUDA kernels, quantized matmul, and kernel error handling#3

Add native CUDA kernels, quantized matmul, and kernel error handling#3
erfanzar merged 4 commits into
mainfrom
offdays

erfanzar commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

erfanzar commented Feb 4, 2026

Summary

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant