Context
Flecs ECS stores entities sharing the same component set together in columnar (Structure of Arrays) tables — achieving massive cache-friendly iteration speedups. CERN ROOT uses the same columnar layout for petabyte-scale data analysis. Rapier Physics achieved 5-8x speedups over its predecessor primarily through data layout changes, not algorithmic changes.
Problem
During optimization passes (constant folding, CSE, DCE, etc.), Loom iterates over IR nodes. If nodes are stored in AoS (Array of Structures) layout, each pass touches memory containing irrelevant fields, polluting the cache.
Proposal
Restructure Loom's IR storage to use SoA (Structure of Arrays) layout grouped by instruction type:
Before (AoS)
[Instr{opcode, operands, type, result, ...}, Instr{...}, Instr{...}, ...]
After (SoA, grouped by opcode class)
arithmetic_opcodes: [Add, Mul, Sub, ...]
arithmetic_operands: [(r1,r2), (r3,r4), ...]
arithmetic_types: [I32, I64, I32, ...]
arithmetic_results: [r5, r6, r7, ...]
control_flow_opcodes: [Br, BrIf, Block, ...]
control_flow_targets: [label0, label1, ...]
...
Benefits
- Constant folding iterates only arithmetic instructions — no cache pollution
- CSE can compare operand arrays with SIMD
- DCE can scan result-use arrays without touching instruction details
- Pattern matching (ISLE) can use pre-grouped instruction categories
Cached query pattern (from Flecs)
Cache which ISLE optimization rules match which instruction categories. On subsequent passes, walk the pre-matched list instead of re-evaluating all patterns.
Estimated Impact
2-5x speedup on optimization passes through better cache utilization. Loom already processes large modules — SoA layout scales better.
Effort
High — requires IR redesign. Consider doing this incrementally: start with the most iterated-over passes (constant folding, DCE).
References
Context
Flecs ECS stores entities sharing the same component set together in columnar (Structure of Arrays) tables — achieving massive cache-friendly iteration speedups. CERN ROOT uses the same columnar layout for petabyte-scale data analysis. Rapier Physics achieved 5-8x speedups over its predecessor primarily through data layout changes, not algorithmic changes.
Problem
During optimization passes (constant folding, CSE, DCE, etc.), Loom iterates over IR nodes. If nodes are stored in AoS (Array of Structures) layout, each pass touches memory containing irrelevant fields, polluting the cache.
Proposal
Restructure Loom's IR storage to use SoA (Structure of Arrays) layout grouped by instruction type:
Before (AoS)
After (SoA, grouped by opcode class)
Benefits
Cached query pattern (from Flecs)
Cache which ISLE optimization rules match which instruction categories. On subsequent passes, walk the pre-matched list instead of re-evaluating all patterns.
Estimated Impact
2-5x speedup on optimization passes through better cache utilization. Loom already processes large modules — SoA layout scales better.
Effort
High — requires IR redesign. Consider doing this incrementally: start with the most iterated-over passes (constant folding, DCE).
References