feat: Parallel ternary conversion + rayon matmul by shift · Pull Request #62 · shift/FerrisRes

shift · 2026-04-22T10:10:24Z

Parallel Ternary Speedup

Changes

1. Parallel layer conversion (rayon)

gemma4_to_block_attnres() now processes all 35 layers in parallel using rayon::par_iter(). Each layers weight quantization is independent.

Before: 472s (7.9 min) sequential
After: ~120s (2 min) estimated on 4-core Skylake

2. Parallel ternary matmul

ternary_matmul_parallel(): processes sequence positions (or output rows for single-token) in parallel.

CpuLinear::forward_parallel(): multi-threaded forward for large matrices.

Expected speedup: ~4-8x on Skylake (4 cores/8 threads)

3. Ternary matmul optimization

Split inner loop into pos_sum - neg_sum instead of i8 as f32 * input. This is more branch-predictor friendly and enables better SIMD auto-vectorization.

Dependencies

Added rayon = "1.10" to Cargo.toml

Tests

1596 passing (0 new failures)

[e6e5afb8]

gemma4_to_block_attnres() now parallelizes layer conversion with rayon. Expected: 8 min → ~2 min on 4-core Skylake. ternary_matmul_parallel(): processes seq positions (or output rows) in parallel. CpuLinear::forward_parallel(): multi-threaded forward for large matrices. Added rayon dependency. [e6e5afb8]

shift merged commit a9d826e into main Apr 22, 2026
4 checks passed

shift deleted the feat/simd-ternary-speedup branch April 22, 2026 10:14

shift mentioned this pull request Apr 22, 2026

fix: Revert rayon from weight conversion (63% slower) #63

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Parallel ternary conversion + rayon matmul#62

feat: Parallel ternary conversion + rayon matmul#62
shift merged 1 commit intomainfrom
feat/simd-ternary-speedup

shift commented Apr 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

shift commented Apr 22, 2026

Parallel Ternary Speedup

Changes

1. Parallel layer conversion (rayon)

2. Parallel ternary matmul

3. Ternary matmul optimization

Dependencies

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant