Home

spine-FlagGems Support Status

Welcome to the spine-FlagGems support status page for SpacemiT SoCs.

Overview

FlagGems is a high-performance generic operator library implemented in Triton language, part of FlagOS. It registers with the ATen backend of PyTorch to provide seamless operator replacement, enabling "develop once, run anywhere" across diverse AI accelerators.

The SpacemiT backend (_spacemit) is a CPU backend powered by Spine Triton (triton.backends.spine_triton.driver.CPUDriver). It leverages SpacemiT's SMT (SpacemiT Tensor) instruction set extensions — including smt.descriptor_load, smt.view, smt.dot, smt.alloc, and smt.mbarrier — to deliver high-performance GEMM and pointwise operations on K1/K3 SoCs.

Repo: flagos-ai/FlagGems
Triton dependency: triton==3.6.0+spacemit.a5 (SpacemiT Triton package)
Support SOCs:
- SpacemiT-K1
- SpacemiT-K3

Architecture

Key Design Decisions

Aspect	Detail
Vendor	`SPACEMIT = 13` in `vendors` enum (`runtime/common.py`)
Device	CPU via Spine Triton; BF16 and INT64 marked unsupported in `device.py`
Driver	`CPUDriver` from `triton.backends.spine_triton.driver` — set once at backend init
GEMM strategy	4 execution paths: `EVEN_K` (fast SMT dot), `SPLIT_M`, `SPLIT_N` (pipelined with `smt.mbarrier`), `SPLIT_K` (tiled reduction)
Autotuning	`TunedConfigLoader` (migrated from `ConfigLoader` in PR #3793) with `pre_hook` validation
Arch configs	`LEGAL_CONFIGS` keyed by `arch_id` (`0x503C`, `0xA03C`, `0xA064`, `0xF000`) with per-op, per-dtype `MICRO_M/K/N` constraints
Pointwise codegen	Custom `KernelGenerator` / `ModuleGenerator` overriding block pointer stride `order` for SpacemiT memory layout
Heuristics	`HEURISTICS_CONFIGS` for 20+ operators — block sizes adapted to tensor dimensions and SM count

Upstream Component Status

Component	Submitted time	Status	Link	Owner	Comments
Add SpacemiT runtime backend	Apr 19, 2026	Merged (May 14, 2026)	#2527	alex-spacemit / zuoweixia497	20 operators, vendor detection, codegen config, tuning infra
Fix mm/bmm/gelu/argmax & bump Triton to a5	Jun 5, 2026	Open	#3793	alex-spacemit	out-variant ops, in-place GELU, empty-tensor argmax, TunedConfigLoader migration

Operator Support Status

BLAS / GEMM Operators

Operator	Status	SMT dot	EVEN_K fast path	Comments
`mm`	✔️	✔️	✔️	4-path GEMM (EVEN_K / SPLIT_M / SPLIT_N / SPLIT_K)
`bmm`	✔️	✔️	✔️	K==1 fast path via `outer`/`mul`
`addmm`	✔️	✔️	✔️	Implemented but commented out in `__init__.py`
`mv`	✔️	—	—	Matrix-vector multiply

Activation Operators

Operator	Forward	Backward	In-place	Comments
`gelu`	✔️	✔️	✔️	`tanh`/`none` approximate; uses `geluTanh`/`geluNone` SMT intrinsics
`silu`	✔️	✔️	—	Uses `_silu` SMT intrinsic
`sigmoid`	✔️	✔️	—	Pointwise dynamic codegen

Normalization Operators

Operator	Status	Comments
`softmax`	✔️	Spacemit-specific kernel with 2-pass max+exp; backward delegates to common
`layer_norm`	✔️	Implemented but commented out in `__init__.py`
`batch_norm`	✔️	Supports 2D–4D input via 3D view conversion

Reduction Operators

Operator	Status	Comments
`argmax`	✔️	2-stage kernel + multi-dim kernel; empty tensor early-return (PR #3793)
`argmin`	✔️	Mirrors argmax structure
`mean_dim`	✔️	2-stage reduction
`global_avg_pool`	✔️	Fixed block size config

Pointwise Operators

Operator	Status	Variants
`where`	✔️	`where_self`, `where_self_out`, `where_scalar_self`, `where_scalar_other`
`pow`	✔️	`pow_tensor_tensor`, `pow_tensor_scalar`, `pow_scalar`, + in-place variants
`rsqrt`	✔️	Forward + in-place (`rsqrt_`)

Convolution Operators (code complete, not exported)

Operator	Status	Comments
`conv2d`	✔️ (code)	Fused `im2col` + `bmm` kernel; not yet exported
`conv1d`	✔️ (code)	Reshapes to 2D → delegates to `conv2d`
`conv_depthwise2d`	✔️ (code)	Delegates to `conv2d` with `groups=C`
`thnn_conv2d`	✔️ (code)	THNN-compatible wrapper

Attention Operators (code complete, not exported)

Operator	Status	Comments
`flash_attention`	✔️ (code)	SMT-accelerated; supports causal masking
`scaled_dot_product_attention`	✔️ (code)	Wraps flash_attention

Tuning Infrastructure

Architecture-Specific GEMM Configs (`config_pre_hook.py`)

The LEGAL_CONFIGS table maps CPU arch_id → operator → dtype → valid (MICRO_M, MICRO_K, MICRO_N) tuples. At kernel launch, a pre_hook validates the autotuner's chosen config and fixes illegal combinations to the arch-legal default.

arch_id	SoC	float32 MICRO (M, K, N)	float16 MICRO (M, K, N)
`0x503C`	K1 / X60	(8, 32, 32)	(16, 8, 32)
`0xA03C`	K1 / A60	(8, 32, 32)	(8, 16, 16)
`0xA064`	K3 / A100	(8, 32, 32)	(16, 8, 32)

Known Limitations

BF16 / INT64 unsupported: Marked as unsupported dtypes in device.py
Backward passes incomplete: Most backward tests skipped; only GELU, SiLU, sigmoid, and softmax have backward kernels
Convolution & attention not exported: Code exists but commented out in ops/__init__.py
CPU-only: No GPU or other accelerator support; targets SpacemiT RISC-V CPUs
Operator count gap: ~20 exported operators vs 200+ in upstream FlagGems; significant coverage gap remains
TunedConfigLoader migration in progress (PR #3793): Still uses fallback ConfigLoader import

Monthly Update Log

Month	Summary	Updated by
2026-06	Initial wiki created	alex-spacemit yutingnie

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

spine-FlagGems Support Status

Overview

Architecture

Key Design Decisions

Upstream Component Status

Operator Support Status

BLAS / GEMM Operators

Activation Operators

Normalization Operators

Reduction Operators

Pointwise Operators

Convolution Operators (code complete, not exported)

Attention Operators (code complete, not exported)

Tuning Infrastructure

Architecture-Specific GEMM Configs (`config_pre_hook.py`)

Known Limitations

Monthly Update Log

Clone this wiki locally

Home

spine-FlagGems Support Status

Overview

Architecture

Key Design Decisions

Upstream Component Status

Operator Support Status

BLAS / GEMM Operators

Activation Operators

Normalization Operators

Reduction Operators

Pointwise Operators

Convolution Operators (code complete, not exported)

Attention Operators (code complete, not exported)

Tuning Infrastructure

Architecture-Specific GEMM Configs (config_pre_hook.py)

Known Limitations

Monthly Update Log

Clone this wiki locally

Architecture-Specific GEMM Configs (`config_pre_hook.py`)