[v0.2.5+] Support Operator

# PyGPUkit – Core Operator Coverage

## 1. Elementwise Operations

| Operation | FP32 | TF32 | FP16 | BF16 | Notes |
|----------|------|------|------|------|------|
| add | ✅ | — | 🔜 | 🔜 | Implemented |
| sub | 🔜 | — | 🔜 | 🔜 | Planned |
| mul | ✅ | — | 🔜 | 🔜 | Implemented |
| div | 🔜 | — | 🔜 | 🔜 | Planned |
| exp | 🔜 | — | 🔜 | 🔜 | SFU-bound |
| log | 🔜 | — | 🔜 | 🔜 | SFU-bound |
| relu | 🔜 | — | 🔜 | 🔜 | Can be fused later |

---

## 2. GEMM Operations

| Operation | FP32 | TF32 | FP16 | BF16 | Notes |
|----------|------|------|------|------|------|
| matmul | ✅ | ✅ | 🔜 | 🔜 | TensorCore on Ampere+ |

**Current Performance (v0.2.3):**
- FP32: 18 TFLOPS (RTX 3090 Ti)
- TF32: 27.38 TFLOPS (RTX 3090 Ti)

---

## 3. Reduction Operations

| Operation | FP32 | TF32 | FP16 | BF16 | Notes |
|----------|------|------|------|------|------|
| sum | 🔜 | — | 🔜 | 🔜 | Tree-based reduction |
| mean | 🔜 | — | 🔜 | 🔜 | sum + scale |
| max | 🔜 | — | 🔜 | 🔜 | Warp + block reduction |

---

## 4. Memory Operations

| Operation | Status | Notes |
|----------|--------|------|
| copy | ✅ | Device↔Device, Host↔Device |
| reshape / view | ✅ | Zero-copy metadata only |
| contiguous | 🔜 | Layout-aware kernel required |

---

## Recommended Milestones

| Version | Focus |
|-------|------|
| v0.2.4 | Driver-only runtime (no CUDA Toolkit) ✅ |
| v0.2.5 | JIT stabilization, cache persistence |
| v0.2.6 | Elementwise ops (sub, div, exp, log, relu) |
| v0.2.7 | Reductions + contiguous |
| v0.3.0 | FP16 / BF16 mixed precision |

---

## Design Principle

GEMM proves performance.  
Elementwise + memory prove usability.  
Scheduler proves value.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[v0.2.5+] Support Operator #59

PyGPUkit – Core Operator Coverage

1. Elementwise Operations

2. GEMM Operations

3. Reduction Operations

4. Memory Operations

Recommended Milestones

Design Principle

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Operation	FP32	TF32	FP16	BF16	Notes
add	✅	—	🔜	🔜	Implemented
sub	🔜	—	🔜	🔜	Planned
mul	✅	—	🔜	🔜	Implemented
div	🔜	—	🔜	🔜	Planned
exp	🔜	—	🔜	🔜	SFU-bound
log	🔜	—	🔜	🔜	SFU-bound
relu	🔜	—	🔜	🔜	Can be fused later

Operation	Status	Notes
copy	✅	Device↔Device, Host↔Device
reshape / view	✅	Zero-copy metadata only
contiguous	🔜	Layout-aware kernel required

Version	Focus
v0.2.4	Driver-only runtime (no CUDA Toolkit) ✅
v0.2.5	JIT stabilization, cache persistence
v0.2.6	Elementwise ops (sub, div, exp, log, relu)
v0.2.7	Reductions + contiguous
v0.3.0	FP16 / BF16 mixed precision

[v0.2.5+] Support Operator #59

Description

PyGPUkit – Core Operator Coverage

1. Elementwise Operations

2. GEMM Operations

3. Reduction Operations

4. Memory Operations

Recommended Milestones

Design Principle

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions