refactor(native): Split nn.cu into modular files (aligned with #131)

## Problem

`native/ops/nn/nn.cu` is **2673 lines** with mixed concerns:
- Activations (GELU, SiLU, Sigmoid, Tanh)
- Normalization (LayerNorm, RMSNorm)
- Attention (SDPA)
- Position encoding (RoPE)
- Tensor ops (Transpose, Bias Add)
- Linear layer

This should be split to match the binding structure in #131.

## Current Structure

```
native/ops/nn/
├── nn.cu                    (2673 lines - everything)
├── attention_kernels.cuh
├── elementwise_kernels.cuh
├── flash_attention.cuh
├── memory_kernels.cuh
└── norm_kernels.cuh
```

## Proposed Structure

```
native/ops/nn/
├── activation/
│   ├── gelu.cu
│   ├── silu.cu
│   ├── sigmoid.cu
│   ├── tanh.cu
│   └── relu.cu
├── norm/
│   ├── layernorm.cu
│   └── rmsnorm.cu
├── attention/
│   ├── sdpa_causal.cu
│   ├── sdpa_fixed_cache.cu
│   └── flash_attention.cuh
├── rope/
│   └── rope_inplace.cu
├── linear/
│   └── linear_bias.cu
└── common/
    ├── attention_kernels.cuh
    ├── elementwise_kernels.cuh
    ├── memory_kernels.cuh
    └── norm_kernels.cuh
```

## Contents Mapping

| Current Section | Lines (est.) | New Location |
|-----------------|--------------|--------------|
| GELU | ~50 | `activation/gelu.cu` |
| SiLU | ~60 | `activation/silu.cu` |
| Sigmoid | ~60 | `activation/sigmoid.cu` |
| Tanh | ~60 | `activation/tanh.cu` |
| LayerNorm | ~150 | `norm/layernorm.cu` |
| RMSNorm | ~100 | `norm/rmsnorm.cu` |
| SDPA Causal | ~400 | `attention/sdpa_causal.cu` |
| SDPA Fixed Cache | ~300 | `attention/sdpa_fixed_cache.cu` |
| RoPE | ~200 | `rope/rope_inplace.cu` |
| Linear + Bias | ~100 | `linear/linear_bias.cu` |
| Transpose (3D/4D) | ~200 | Move to `tensor/` |
| Split QKV | ~100 | Move to `tensor/` |

## Notes

- Transpose operations should move to `native/ops/tensor/`
- Split QKV should move to `native/ops/tensor/`
- Each file should be <300 lines
- Shared kernels stay in `common/`

## Benefits

- Faster incremental builds
- Easier to find and modify specific operations
- Aligned with binding structure (#131)
- Clear separation of concerns

## Related

- #131 (ops_bindings.cpp split) - Should be done together
- #130 (naming convention) - Align function names

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(native): Split nn.cu into modular files (aligned with #131) #133

Problem

Current Structure

Proposed Structure

Contents Mapping

Notes

Benefits

Related

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Current Section	Lines (est.)	New Location
GELU	~50	`activation/gelu.cu`
SiLU	~60	`activation/silu.cu`
Sigmoid	~60	`activation/sigmoid.cu`
Tanh	~60	`activation/tanh.cu`
LayerNorm	~150	`norm/layernorm.cu`
RMSNorm	~100	`norm/rmsnorm.cu`
SDPA Causal	~400	`attention/sdpa_causal.cu`
SDPA Fixed Cache	~300	`attention/sdpa_fixed_cache.cu`
RoPE	~200	`rope/rope_inplace.cu`
Linear + Bias	~100	`linear/linear_bias.cu`
Transpose (3D/4D)	~200	Move to `tensor/`
Split QKV	~100	Move to `tensor/`

refactor(native): Split nn.cu into modular files (aligned with #131) #133

Description

Problem

Current Structure

Proposed Structure

Contents Mapping

Notes

Benefits

Related

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions