Skip to content

[v0.2.5+] Support Operator #59

@m96-chan

Description

@m96-chan

PyGPUkit – Core Operator Coverage

1. Elementwise Operations

Operation FP32 TF32 FP16 BF16 Notes
add 🔜 🔜 Implemented
sub 🔜 🔜 🔜 Planned
mul 🔜 🔜 Implemented
div 🔜 🔜 🔜 Planned
exp 🔜 🔜 🔜 SFU-bound
log 🔜 🔜 🔜 SFU-bound
relu 🔜 🔜 🔜 Can be fused later

2. GEMM Operations

Operation FP32 TF32 FP16 BF16 Notes
matmul 🔜 🔜 TensorCore on Ampere+

Current Performance (v0.2.3):

  • FP32: 18 TFLOPS (RTX 3090 Ti)
  • TF32: 27.38 TFLOPS (RTX 3090 Ti)

3. Reduction Operations

Operation FP32 TF32 FP16 BF16 Notes
sum 🔜 🔜 🔜 Tree-based reduction
mean 🔜 🔜 🔜 sum + scale
max 🔜 🔜 🔜 Warp + block reduction

4. Memory Operations

Operation Status Notes
copy Device↔Device, Host↔Device
reshape / view Zero-copy metadata only
contiguous 🔜 Layout-aware kernel required

Recommended Milestones

Version Focus
v0.2.4 Driver-only runtime (no CUDA Toolkit) ✅
v0.2.5 JIT stabilization, cache persistence
v0.2.6 Elementwise ops (sub, div, exp, log, relu)
v0.2.7 Reductions + contiguous
v0.3.0 FP16 / BF16 mixed precision

Design Principle

GEMM proves performance.
Elementwise + memory prove usability.
Scheduler proves value.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions