-
Notifications
You must be signed in to change notification settings - Fork 4
Home
Welcome to the spine-triton support status page for SpacemiT SOCs.
spine-triton is forked from microsoft/triton-shared, which provides a shared middle layer for Triton compilation. It implements a CPU backend for Triton targeting SpacemiT's RISC-V based AI processors (X60, X100, A60, A100), enabling Triton kernels to be compiled and executed on SpacemiT hardware without requiring a GPU.
The project bridges Triton's high-level kernel language to efficient RISC-V vector (RVV) 、SpacemiT IME、SpacemiT AME code through a multi-stage MLIR-based compilation pipeline, with custom dialects for SpacemiT-specific hardware features including packed tensor cores, descriptor-based memory access, and thread synchronization primitives.
-
Current version:
3.6.0+spacemit.a5
┌─────────────────────────────────────────────────┐
│ Triton Kernel (Python) │
├─────────────────────────────────────────────────┤
│ language/smt │ language/tle │ language/cpu │
│ (XSMT builtins)│ (Tile Ops) │ (CPU utils) │
├─────────────────────────────────────────────────┤
│ Triton IR (TTIR) │
├─────────────────────────────────────────────────┤
│ spine-triton-opt (MLIR Passes) │
│ TTIR → Structured → Unstructured → Memref │
│ → Linalg (with XSMT/TLE dialect lowering) │
├─────────────────────────────────────────────────┤
│ spine-opt (spine-mlir) │
│ Linalg MLIR → LLVM MLIR → LLVM IR │
├─────────────────────────────────────────────────┤
│ LLVM opt/llc → .so (RISC-V RVV) │
├─────────────────────────────────────────────────┤
│ CPUDriver / CPULauncher │
│ (Dynamic loading & execution) │
└─────────────────────────────────────────────────┘
The compilation flows through four stages defined in backend/compiler.py:
| Stage | Input | Output | Tool |
|---|---|---|---|
ttir |
Triton IR | Optimized TTIR | Triton pass manager |
linalgdir |
TTIR | Linalg MLIR | spine-triton-opt --triton-to-linalg-experimental |
llir |
Linalg MLIR | LLVM IR |
spine-opt --spine-triton-e2e-pipeline + mlir-translate
|
so |
LLVM IR | Shared object |
opt + llc + g++
|
spine-triton defines five custom MLIR dialects for SpacemiT-specific operations:
| Dialect | Namespace | Purpose |
|---|---|---|
| XSMT | xsmt |
Core SpacemiT ops: pack/unpack/repack, mmt4d, alloc, barriers |
| XSMTAsync | xsmt_async |
Async memory barrier lifecycle (alloc/arrive/wait/release) |
| TLE | tle |
Triton Language Extension: extract_tile, insert_tile |
| TritonTilingExt | ttx |
Extended tiling interface (cumsum) with TilingInterface |
| TritonStructured | tts |
Structured operations dialect |
| Operation | Description |
|---|---|
xsmt.pack |
Pack 2D tensor → 4D packed layout (tile decomposition) |
xsmt.unpack |
Unpack 4D packed tensor → 2D |
xsmt.repack |
Change 4D packed tile size (unpack + repack) |
xsmt.subview |
Create pointer subview preserving packing |
xsmt.subview_pack |
Create subview with new packed tile layout |
xsmt.mmt4d |
4D matrix multiplication with packed tensors |
xsmt.alloc |
Allocate tensor in specified memory (l2/shared) |
xsmt.alloc_copies |
Allocate multi-copy buffer tensor |
xsmt.mbarrier_copies |
Allocate multiple memory barrier instances |
xsmt.descriptor_load_view |
Fused descriptor load + view operation |
The language/ directory provides Python-level APIs for kernel authors:
| Module | Key Functions |
|---|---|
smt (SpacemiT Triton) |
descriptor_load, view (pack/unpack/repack/subview), alloc, alloc_copies, dot (mmt4d), mbarrier, barrier_arrive/barrier_wait, parallel, compile_hint, get_num_of_thread
|
tle (Triton Lang Extension) |
extract_tile, insert_tile
|
cpu |
utils, libdevice (CPU-specific math functions) |
-
4D Packed Tensor Layout: 2D matrices are packed into 4D
[M/m, N/n, m, n]layout for efficient tensor core operations. pack/unpack/repack operations handle layout transformations. -
Destination-Passing Style (DPS): Operations support optional destination tensors to avoid intermediate allocations, enabling memory-efficient operation chaining.
-
Descriptor-Based Load:
descriptor_loadoperation provides efficient block memory access with boundary checking. -
Memory Barriers: Hardware memory barriers (
mbarrier) for thread synchronization in multi-core execution, supporting double/triple buffering patterns. -
Multi-Copy Buffers:
alloc_copiesandmbarrier_copiessupport software pipelining with multiple buffer copies. -
Tile Operations:
extract_tile/insert_tilesupport fine-grained tile manipulation with both static (compile-time) and dynamic (runtime) indexing. -
Proton Profiling: RISC-V
rdtimeinstruction-based kernel profiling with Chrome Trace and Hatchet format output. -
RISC-V Vector Extension: Targets RVV 1.0 with
vextension, includingzfh(half-precision float),zvfh(vector half-precision),zicbop(cache block operations),xsmtvdotii(SpacemiT IME2).
| Arch ID | CPU Name | Target |
|---|---|---|
0x503C |
spacemit-x60 | K1 |
0x5064 |
spacemit-x100 | K3 |
0xA03C |
spacemit-a60 | K1 |
0xA064 |
spacemit-a100 | K3 |
| Component | Submitted time | Status | Link | Owner | Comments |
|---|---|---|---|---|---|
| TLE dialect (extract/insert tile) | - | WIP | - | zuoweixia497 | Triton Language Extension dialect |
| RISC-V target support (AME) | - | WIP | - | alex-spacemit | |
| TLE language module | - | WIP | - | zuoweixia497 | extract_tile, insert_tile |
| Proton CPU profiling | - | WIP | - | zuoweixia497 | rdtime-based timing, Chrome Trace output |
| Month | Summary | Updated by |
|---|---|---|
| 2026-05 | Initial wiki created from spine-triton internal source documentation | alex-spacemit |