Overview
Implement PoPE (Positional Encoding) as an alternative to the existing RoPE (Rotary Position Embedding).
Current State
native/ops/nn/
└── nn.cu (contains rope_inplace, rope_inplace_f32table)
src/pygpukit/ops/nn.py
└── rope_inplace(), rope_inplace_f32table()
Proposed Addition
Native Kernels
native/ops/nn/
├── rope/
│ └── rope_inplace.cu (existing, moved)
└── pope/
├── pope_inplace.cu (NEW)
└── pope_kernels.cuh (NEW)
Python API
# Existing
rope_inplace(q, k, cos_table, sin_table, position)
rope_inplace_f32table(q, k, cos_table, sin_table, position)
# New
pope_inplace(q, k, position_encoding, position)
pope_init_encoding(max_seq_len, head_dim) -> GPUArray
PoPE vs RoPE
| Aspect |
RoPE |
PoPE |
| Encoding |
Rotary (sin/cos rotation) |
Additive positional |
| Memory |
cos/sin tables |
Position encoding matrix |
| Compute |
Complex multiply |
Simple add |
| Extrapolation |
Good |
Limited |
Implementation Tasks
API Design
# Initialize position encoding
pope_encoding = ops.pope_init_encoding(
max_seq_len=2048,
head_dim=128,
encoding_type="sinusoidal" # or "learned"
)
# Apply in-place
ops.pope_inplace(q, k, pope_encoding, position)
Related
Overview
Implement PoPE (Positional Encoding) as an alternative to the existing RoPE (Rotary Position Embedding).
Current State
Proposed Addition
Native Kernels
Python API
PoPE vs RoPE
Implementation Tasks
API Design
Related
rope_inplace,rope_inplace_f32tablenn/pope.cppnn/pope/