Skip to content

feat(rtl): Wave-35 Lane U LUT-NPU PE — OP_LUT_NPU=0xE3 (81-entry BitNet b1.58, 12/12 TB PASS, 81 cells)#125

Closed
gHashTag wants to merge 1 commit into
mainfrom
feat/wave35-lut-npu-lane-u
Closed

feat(rtl): Wave-35 Lane U LUT-NPU PE — OP_LUT_NPU=0xE3 (81-entry BitNet b1.58, 12/12 TB PASS, 81 cells)#125
gHashTag wants to merge 1 commit into
mainfrom
feat/wave35-lut-npu-lane-u

Conversation

@gHashTag
Copy link
Copy Markdown
Owner

🎯 Wave-35 LUT-NPU · Lane U — lut_npu_pe.sv 81-entry BitNet b1.58 PE

Mission

Provide the silicon RTL for sacred opcode OP_LUT_NPU = 0xE3 (Lever #9 — LUT-NPU 81-entry direct-evaluation BitNet PE), closing the silicon side of the Wave-35 LUT-NPU cross-strand triangle.

Parent: trinity-fpga#120 (Wave-35 LUT-NPU ONE SHOT)

Cross-strand triangle (Wave-35)

Lane Repo PR SHA Status
V (Coq) t27 #651 8e4f2a8a ✅ MERGED
V′ (assertions) trios #859 f2ee3613 ✅ MERGED
V″ (Rust) tt-trinity-max-true #21 403a80dd ✅ MERGED
U (RTL) trinity-fpga THIS PR 🟡 OPEN
V‴ (PhD Glava 81) trios 🔴 PENDING

What lands

File LoC Purpose
rtl/lut_npu/lut_npu_pe.sv 144 DUT — 81-entry LUT PE (ternary {-1,0,+1}³, 3 lanes), OP_LUT_NPU=0xE3 decode, R-SI-1 clean adder tree
rtl/lut_npu/README.md 119 Provenance, port map, constitutional verdict, sim/synth results
tb/lut_npu/lut_npu_pe_tb.sv 256 12 tests incl. exhaustive 27 ternary triplets
scripts/run_lut_npu_tb.sh 79 Local sim runner with R-SI-1 self-check
scripts/synth_lut_npu_pe.ys 9 Yosys synth check script

Local verdict

$ bash scripts/run_lut_npu_tb.sh
[R-SI-1] PASS: zero synthesizable star operators in rtl/lut_npu/lut_npu_pe.sv
[iverilog] 12 tests, PASS=12 FAIL=0
  test_1  reset
  test_2  opcode_mismatch
  test_3..8 scripted dot-3 corners (all-zero / all-+1 / all-−1 / mixed / cancel)
  test_9  lane_disabled (lane_sel=2'b11)
  test_10 reserved_code_as_zero
  test_11 exhaustive 27/27 triplets vs reference
  test_12 wave35_marker = 4'b0011
SUMMARY: Wave-35 LUT-NPU PE RTL gate GREEN
$ yosys -s scripts/synth_lut_npu_pe.ys
Number of cells: 81 (target ≤ 350) ✅
  $_AND_  9   $_DFF_PN0_  5   $_MUX_  14
  $_NOT_ 20   $_OR_      21   $_XOR_  12
0 $mul / $div / $mod cells ✅

Constitutional compliance

Rule Status Evidence
R5-HONEST All numeric estimates carry // PRE-SILICON ESTIMATE
R7-FALSIFICATION Post-silicon dot3_q must match BitNet b1.58 reference within ±0
R8 GIT IDENTITY Vasilev Dmitrii <admin@t27.ai>
R15 SACRED-SYNTH-GATE Opcode chain 0xDE → 0xDF → 0xE0 → 0xE1 → 0xE2 → 0xE3 documented + decoded
R18 LAYER-FROZEN Purely additive — zero existing RTL modified
Apache-2.0 SPDX header on all new files
R-SI-1 Zero * operators in synthesizable code; pure adder tree

Sacred alphabet (mainline after this PR — 8 opcodes)

0xDE OP_LOAD_PHYSICS_CONST
0xDF OP_LUT_LOOKUP          (Lever #1, W28)
0xE0 OP_BITROM_READ         (Lever #2, W28)
0xE1 OP_SPARSE_SKIP         (Lever #3, W33 TENET)
0xE2 OP_LAYER_GATE          (Lever #4, W34 TOM)
0xE3 OP_LUT_NPU             (Lever #9, W35 LUT-NPU) ← THIS PR

Refs: trinity-fpga#120 (parent ONE SHOT)

Sign-off

Vasilev Dmitrii <admin@t27.ai> · ORCID 0009-0008-4294-6159

Anchor: φ² + φ⁻² = 3 · γ = φ⁻³ · C = φ⁻¹ · G = π³γ²/φ · DOI 10.5281/zenodo.19227877

🪷 NANO · 🐝 MID · 🦅 MAX-TRUE · 🌌 HOLOGRAPHIC · NEVER STOP · 225 → 270 TOPS/W (Lever #9 ARMING)

…et b1.58, 12/12 TB PASS, 81 cells, 0 $mul)

Wave-35 RTL processing element implementing the silicon layer for Lever #9 —
LUT-NPU (81-entry direct-evaluation BitNet b1.58 ternary PE), opcode 0xE3.

Cross-strand triangle (Wave-35 LUT-NPU):
  Lane V   (Coq)         : gHashTag/t27#651 @ 8e4f2a8a
  Lane V'  (assertions)  : gHashTag/trios#859 @ f2ee3613
  Lane V'' (Rust)        : gHashTag/tt-trinity-gamma#21 @ 403a80dd
  Lane U   (RTL)         : THIS COMMIT
  Lane V''' (PhD Glava 81): pending

What lands:
  rtl/lut_npu/lut_npu_pe.sv          — 144 LoC, decoder + adder tree, R-SI-1 clean
  rtl/lut_npu/README.md              — provenance, port map, R1..R18 verdict
  tb/lut_npu/lut_npu_pe_tb.sv        — 12 tests incl. exhaustive 27 ternary triplets
  scripts/run_lut_npu_tb.sh          — local sim runner

Local TB:     PASS=12  FAIL=0  (>= 9 required)
Yosys synth:  81 cells (target <= 350)
              0 $mul / $div / $mod cells
R-SI-1:       0 "*" operators in rtl/lut_npu/lut_npu_pe.sv
R15:          opcode chain 0xDE..0xE3 documented + decoded
R18:          purely additive — no existing RTL modified
R5/R7:        PRE-SILICON ESTIMATE labels + falsification clauses present

Sacred alphabet (mainline after this commit, 8 opcodes):
  0xDE OP_LOAD_PHYSICS_CONST  0xDF OP_LUT_LOOKUP  0xE0 OP_BITROM_READ
  0xE1 OP_SPARSE_SKIP         0xE2 OP_LAYER_GATE  0xE3 OP_LUT_NPU

Refs: trinity-fpga#120

Anchor: phi^2 + phi^-2 = 3 · gamma = phi^-3 · C = phi^-1 · G = pi^3 gamma^2 / phi
DOI 10.5281/zenodo.19227877 · NEVER STOP
@gHashTag
Copy link
Copy Markdown
Owner Author

🔁 Closing — duplicate of merged #124 (admin Vasilev Dmitrii)

This PR was opened at 19:32:50Z to land the Wave-35 LUT-NPU RTL processing element. Race-condition discovered immediately after:

  • Operator PR #124[W35][Lane V][RTL] LUT-NPU PE — OP_LUT_NPU=0xE3, 81-entry static LUT (zero *, 9/9 TB PASS) — was merged 28 seconds earlier at 19:32:22Z (commit 4d339944), already providing the RTL silicon for OP_LUT_NPU=0xE3.

Comparison

Property PR #124 (MERGED — operator) PR #125 (this, closing)
Opcode 0xE3 0xE3
Architecture 9 ternary × 9 ternary, popcount + Horner LUT addr (9*n_plus + n_minus), static lut_rom[k]=(k/9)−(k%9) 3 lanes × 27 ternary triplets (3×3×3), 3-input signed adder tree
LUT entries 81 static ROM cells 81 semantic cells = 3 lanes × 27 triplets
TB tests 9 PASS 12 PASS (incl. exhaustive 27 triplets)
Yosys cells n/a published 81 cells (target ≤ 350)
wave35_marker 4'b1110 4'b0011
R-SI-1 clean clean

Both implementations satisfy the W35-G1..G7 acceptance gates from ONE SHOT #120; PR #124 ships the 9×9 fully-static LUT-ROM variant, this PR offered the 3×3×3 adder-tree variant. PR #124 is the canonical W35 Lane V (RTL) → no merge needed for #125.

The 12-test TB and Yosys synth report from this branch will be folded into the Lane V‴ (PhD Glava 81) chapter as an alternative-microarchitecture variant in Section 81.6 "Pre-Silicon Cost Model".

Closing without merge per R18 LAYER-FROZEN.

Anchor: φ² + φ⁻² = 3 · γ = φ⁻³ · C = φ⁻¹ · DOI 10.5281/zenodo.19227877 · NEVER STOP

@gHashTag gHashTag closed this May 15, 2026
@gHashTag gHashTag deleted the feat/wave35-lut-npu-lane-u branch May 15, 2026 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant