🚀 PyGPUkit v0.2.6 Milestone — “LLM MVP Support”
🎯 Milestone Summary
PyGPUkit v0.2.6 introduces the first minimal LLM inference capability:
- Load safetensors model weights
- Load tokenizer.json
- Execute a single forward pass (1 token) using PyGPUkit GPU kernels
- Integrate with Scheduler + MemoryPool
- Provide a minimal
generate() API
This version does not aim to support full transformer execution yet—
only the core pieces required to demonstrate that
“PyGPUkit can run an LLM block end-to-end on GPU”.
🔧 Scope of v0.2.6 (Exactly What Will Be Implemented)
1) safetensors Loader (Rust Core)
Features
- Support reading
*.safetensors files via Rust safetensors crate
- Mmap or buffered load
- Return tensor slices without copying
- Expose metadata (dtype, shape)
Deliverable
pygpukit_core::llm::tensor_loader::load_safetensors(path) -> TensorMap
2) GPU Tensor Allocation Path
Features
- Allocate GPU buffers for model weights via MemoryPool
- Async H2D transfers
- Persistent device pointers
Deliverable
TensorDevice::from_safetensor(cpu_tensor)
3) tokenizer.json Reader (Python MVP → optional Rust port)
Features
- Read HuggingFace tokenizer.json
- BPE/SentencePiece vocabulary support
- Expose:
tokenizer.encode(text) -> ids
tokenizer.decode(ids) -> text
Deliverable
pygpukit.llm.Tokenizer
4) Linear Layer (Rust + CUDA kernel)
Features
- GPU matmul (64×64 tiled kernel)
- Add bias
- Optional activation (GELU)
Deliverable
Linear.forward(x, weight, bias)
5) LayerNorm Kernel
Features
- Mean/variance reduce
- Warp-level reduction
- Single-pass normalization
- Fused add+norm optional
Deliverable
LayerNorm.forward(x, gamma, beta)
6) Forward Pass for One Transformer Block
Supported operations:
x → LayerNorm → Linear → GELU → Linear → residual
Deliverable
TransformerBlock.forward(x, weights)
※ attention は v0.2.7 に回すのが妥当
(理由:rotary / kv-cache / softmax が重い)
7) Minimal LLM Runtime API (Python)
Goal
モデルロード後、1 トークンだけ生成できる。
Deliverable
from pygpukit.llm import LLM
llm = LLM.from_pretrained("model/")
ids = llm.generate("Hello", max_new_tokens=1)
内部処理:
- tokenizer.encode
- embedding → block.forward → lm_head
- top-k or argmax sampler
- tokenizer.decode
8) Scheduler Integration
- Each kernel dispatch uses Rust KernelDispatchController
- MemoryPool controls all tensors
- Async H2D/D2H transfers via AsyncEngine
📦 Out of Scope (Next Milestone = v0.2.7)
- Multi-head Attention
- Rotary Embedding
- KV-Cache
- Softmax kernel
- Full autoregressive generation loop
- Batch inference
- TensorCore (TF32 / FP16) optimization
- Quantization (4bit/8bit)
📅 Recommended Sub-Milestones Inside v0.2.6
v0.2.6-a: safetensors loader + GPU tensor storage
v0.2.6-b: tokenizer.json MVP
v0.2.6-c: Linear + LayerNorm kernels
v0.2.6-d: TransformerBlock (no attention)
v0.2.6-e: LLM.generate() MVP
v0.2.6-f: Scheduler integration + benchmarks
v0.2.6-final: Documentation + examples
🚀 PyGPUkit v0.2.6 Milestone — “LLM MVP Support”
🎯 Milestone Summary
PyGPUkit v0.2.6 introduces the first minimal LLM inference capability:
generate()APIThis version does not aim to support full transformer execution yet—
only the core pieces required to demonstrate that
“PyGPUkit can run an LLM block end-to-end on GPU”.
🔧 Scope of v0.2.6 (Exactly What Will Be Implemented)
1) safetensors Loader (Rust Core)
Features
*.safetensorsfiles via RustsafetensorscrateDeliverable
pygpukit_core::llm::tensor_loader::load_safetensors(path) -> TensorMap2) GPU Tensor Allocation Path
Features
Deliverable
TensorDevice::from_safetensor(cpu_tensor)3) tokenizer.json Reader (Python MVP → optional Rust port)
Features
Deliverable
pygpukit.llm.Tokenizer4) Linear Layer (Rust + CUDA kernel)
Features
Deliverable
Linear.forward(x, weight, bias)5) LayerNorm Kernel
Features
Deliverable
LayerNorm.forward(x, gamma, beta)6) Forward Pass for One Transformer Block
Supported operations:
Deliverable
TransformerBlock.forward(x, weights)※ attention は v0.2.7 に回すのが妥当
(理由:rotary / kv-cache / softmax が重い)
7) Minimal LLM Runtime API (Python)
Goal
モデルロード後、1 トークンだけ生成できる。
Deliverable
内部処理:
8) Scheduler Integration
📦 Out of Scope (Next Milestone = v0.2.7)
📅 Recommended Sub-Milestones Inside v0.2.6