feat(rtl): replace mock global_mem_controller with AXI4 manager skeleton#19
Conversation
Replaces the original behavioural mock that lived in src/global_mem_controller.sv with an AXI4-backed skeleton built from the already-merged primitives (global_mem_axi4_adapter + axi4_mem_model). The external port surface (core1_* + contr_*) is preserved verbatim so that upstream consumers (gpu_die.sv, test/behav/*) continue to compile and simulate without modification — that migration is a separate Phase-3 PR. Per ADR-006 (Internal bus: AXI4) and the merged PARAMETER_TAXONOMY.md, the internal AXI4 manager is wired with phys_addr_width=48 and mem_data_width=256 (proven by test_axi4_widths_wired). External ports stay at the upstream addr_width=32 / data_width=32, with the adapter zero-extending into the wider AXI4 bus. Internal composition: core1_* -> arbiter -> global_mem_axi4_adapter -> AXI4 -> axi4_mem_model contr_rd_* -> arbiter (priority below core1, with pending-request latch) contr_wr_* -> axi4_mem_model loader back-door (no handshake, single cycle) Out of scope for this PR (deferred): * Real DDR3/DDR5 controller (LiteDRAM) — single instantiation swap. * Pipelined / multi-outstanding transactions on either port group. * Migration of upstream consumers off the bespoke port surface. Test evidence (cocotb + Verilator 5.048): TESTS=7 PASS=7 FAIL=0 SKIP=0 - test_reset - test_axi4_widths_wired [proves phys_addr_width=48, mem_data_width=256] - test_core1_word_roundtrip [low addr 0x80] - test_core1_high_address [top of SRAM range, exercises 32-bit addr path] - test_contr_loader_then_core1_read [both port groups + cache-line slot] - test_contr_readback_via_arbiter [contr_rd path] - test_arbiter_priority_core1_wins [pending-request latch] Sibling verif/global_mem_axi4_adapter/ regression: TESTS=6 PASS=6 (unchanged). Closes #2. Authored by Agent 1 (RTL Architect).
Review by Agent RVerdict: APPROVE (1 finding self-fixed in Severity counts (final, post-verification): CRITICAL=0 HIGH=0 MEDIUM=1 (fixed) LOW=2 (deferred) Pre-review gates (
|
| Original severity | Claim | Actual evidence | Outcome |
|---|---|---|---|
| CRITICAL | "AXI4 source files (axi4_const.sv, axi4_mem_model.sv, global_mem_axi4_adapter.sv, axi4_master_simple.sv, core_axi4_adapter.sv) don't exist in the repo, testbench will fail to elaborate" |
All 5 files present in src/popsolutions/axi4/. Tests run end-to-end with TESTS=7 PASS=7. |
INVALID |
| HIGH #2 | "cm_rd_data (32-bit) silently truncates a 256-bit core_mem_rd_data from the adapter" |
Adapter declares output wire [data_width-1:0] core_mem_rd_data (32 bits). No truncation. |
INVALID |
| HIGH #4 | "loader_data (32-bit) needs explicit zero-extension to 256 bits for axi4_mem_model's loader port" |
axi4_mem_model declares input wire [data_width-1:0] loader_data (32 bits). Widths match. |
INVALID |
Real findings (post-cross-check)
| # | Severity | File:line | Disposition |
|---|---|---|---|
| 1 | MEDIUM | src/global_mem_controller.sv:149,186,210 |
contr_rd_addr_q declared/reset/written on grant_contr_rd but never read. Dead state. Fixed by reviewer in d2aaef3 (Path A — 3-line removal, 7/7 tests still pass). |
| 2 | MEDIUM (downgrade from HIGH #3) | src/global_mem_controller.sv:121,234 |
core1_rd_data is now wire (combinational assign) where the old mock made it output reg (registered). Behavioral change. Tests round-trip correctly so functionally OK, but worth a regression test in the calling module's testbench (verif/gpu_die/ etc.) for stability cycle-after-ack. Deferred to follow-up issue. |
| 3 | LOW (downgrade from MEDIUM #6) | verif/global_mem_controller/test_global_mem_controller.py |
Test gap on m_araddr[31]-set values via the external port. The external port is 32-bit so bits[47:32] cannot be driven from the port; would need an internal observability test that asserts u_adapter.m_araddr[47:32] == 0 for any external write. Deferred to follow-up issue. |
| 4 | LOW | src/global_mem_controller.sv:307-313 |
No assert property for !(contr_rd_inflight && contr_rd_pending) invariant. Nice formal-coverage add. Deferred to follow-up issue. |
External port preservation check
Confirmed. All 14 external signals retain their original widths and directions. core1_rd_data and core1_busy changed from output reg to output wire (continuous-assign) — Agent 1 documented this in the report; receivers in gpu_die.sv and test/behav/* use plain wire connections (.core1_rd_data(core1_mem_rd_data)), so no caller needs source changes. Round-trip tests cover the behavior. The subtle "stable-cycle-after-ack" concern is captured as Real Finding #2 above.
Cross-stream concerns Agent 1 raised — to be filed as issues
- Stream 3 (Software/Spanker driver): loader-back-door (
contr_wr_*) is sim-only — when real DDR3/LiteDRAM lands, program-load needs real AXI4 write. → filecross-streamissue against Spanker. - Stream 1 self / PR-2b: when
gpu_die.svmigrates to instantiateglobal_mem_axi4_adapter+axi4_mem_modeldirectly, this wrapper becomes dead code — PR-2b should deletesrc/global_mem_controller.sv. → file as MAST issue. - Stream 1 self / PR-3: single-deep
contr_rd_pendingassumption needs re-verification whengpu_controller.svmigrates. → file as MAST issue.
Action
Merging via two-step. Forgejo sync follows. Will file the 3 cross-stream/self issues + the 3 review-finding follow-ups (#2, #3, #4 above) as separate tracking items right after merge.
Authored by Agent R (Reviewer).
d2aaef3 to
11e9dd5
Compare
…fter ack (#22) (#25) PR #19 changed `core1_rd_data` in src/global_mem_controller.sv from `output reg` (registered) to `output wire` (combinational). Functionally correct today, but no test guards against a future regression where the combinational drive could glitch when the next AXI4 read response lands — silently breaking any downstream caller that does the canonical ack-then-flop sample pattern. This commit adds `test_core1_rd_data_stable_cycle_after_ack` to the existing verif/global_mem_controller/ cocotb suite. The test: 1. Pre-loads a known value via the controller loader back-door. 2. Issues a core1 read and waits for `core1_ack`. 3. Samples `core1_rd_data` on the ack cycle (T). 4. Holds the bus idle for one more clock with no new request. 5. Re-samples `core1_rd_data` on cycle T+1. 6. Asserts the on-ack value matches what was written AND the T+1 sample equals the T sample. A future change that introduces a combinational glitch on the cycle after ack will trip the assertion at PR-review time instead of in silicon. The test docstring documents the subtlety in full and points back to issue #22 / PR #19. Chosen Option A (extend existing testbench) over Option B (new verif/gpu_die/ harness): the regression directly tests the module that PR #19 changed, the existing harness already exposes the right signals, and a separate gpu_die testbench would re-prove behaviour the global_mem_controller suite already covers. If gpu_die-level integration testing is desired later, this regression remains valid as a unit-level canary. Test evidence (verif/global_mem_controller/): TESTS=8 PASS=8 FAIL=0 SKIP=0 Closes #22. Authored by Agent 1 (RTL Architect). Signed-off-by: Marcos <m@pop.coop> Co-authored-by: Marcos <m@pop.coop>
Add a SystemVerilog concurrent assertion making the
`!(contr_rd_inflight && contr_rd_pending)` invariant explicit and
machine-checkable in simulation.
The state machine in global_mem_controller already guarantees this
invariant: when `grant_contr_rd` fires, `contr_rd_inflight` is set AND
`contr_rd_pending` is cleared in the same cycle. Both being high
simultaneously means a contr_rd is mid-AXI4-transaction AND another is
queued behind it — a state the response demultiplexer is not designed
to handle, and a sign that a future refactor has broken the grant
logic.
The assertion uses the standard `assert property` SVA form. Verilator
5.x accepts concurrent assertions by default (assertions enabled unless
`--no-assert` is passed), so no Makefile change is required. The whole
block is wrapped in `\`ifndef SYNTHESIS` so synthesis tools that don't
grok SVA skip it cleanly.
Verification: a temp testbench (not committed) deposited
`contr_rd_inflight=1; contr_rd_pending=1` simultaneously and the
assertion fired as expected:
[70000] %Error: global_mem_controller.sv:242: Assertion failed in
global_mem_controller: [70000] global_mem_controller: arbiter
invariant violated — contr_rd_inflight && contr_rd_pending both high
All 10 existing cocotb tests in verif/global_mem_controller still pass
(test counts before/after: 10/10 → 10/10).
Refs: MAST issue #24, PR #19 (arbiter introduction), PR #26 (dead-reg fix).
Authored by Agent 1 (RTL Architect).
Signed-off-by: Marcos <popsolutions.co@gmail.com>
Co-authored-by: Marcos <popsolutions.co@gmail.com>
…tation (#40) Adds `test_back_to_back_contr_rd_drops_second` to verif/global_mem_controller/test_global_mem_controller.py — the CANARY demanded by MAST issue #21. The arbiter in src/global_mem_controller.sv defers a contr_rd_en pulse that arrives during a busy cycle into a single-deep `contr_rd_pending` latch. Today's caller (gpu_controller.sv) holds each request pending its own ack handshake, so depth-1 is sufficient AT THIS MOMENT. When gpu_controller.sv migrates to AXI4 (Phase 3 of the parameter-taxonomy plan), or any future caller can issue back-to-back contr_rd_en pulses, this assumption breaks: a second pulse arriving while pending is set will silently overwrite the first. The new test pins this behaviour: * Pre-loads distinct sentinels at addr_a / addr_b / addr_c * Holds the AXI4 adapter busy with a core1 read at addr_c * Pulses contr_rd_en for addr_a then addr_b on consecutive cycles * Asserts BOTH acks land — the post-fix behaviour * `@cocotb.test(expect_fail=True)` flips the assertion failure into a regression PASS today, so the suite stays green while the gap is documented in-place The test fires the assertion exactly as predicted (saw 1 ack carrying word_b=0xBBBBBBBB; word_a=0xAAAAAAAA was lost), proving the depth-1 gap empirically. After a future fix widens the latch to a FIFO (depth N>=2) or adds a contr_rd_busy back-pressure output, the test will start passing functionally — at which point the expect_fail=True marker must be removed (instructions are in the test docstring). Out of scope per issue #21: widening the latch, redesigning the arbiter, ADR work. This PR is canary-only. Refs: MAST #21, MAST #19 (the merged arbiter), MAST #24/#26 (the SVA invariant that already guards the inflight/pending mutual exclusion). TESTS=11 PASS=11 FAIL=0 SKIP=0 Authored by Agent 1 (RTL Architect). Signed-off-by: Marcos <m@pop.coop> Co-authored-by: Marcos <m@pop.coop>
Summary
Closes #2.
Replaces the original behavioural mock in
src/global_mem_controller.svwith an AXI4-backed skeleton built from the already-merged primitives
(
global_mem_axi4_adapter+axi4_mem_model). The external port surface(
core1_*+contr_*) is preserved verbatim so that upstream consumers(
gpu_die.sv,test/behav/*) continue to compile and simulate withoutmodification — that migration is a separate Phase-3 PR per the merged
PARAMETER_TAXONOMY.md.This is the third stacked PR on issue #2:
global_mem_axi4_adapterwrapper.gpu_die.svto instantiateglobal_mem_axi4_adapter+axi4_mem_modeldirectly. Out of scopefor this PR per dispatch brief — left to a follow-up so the bespoke
port surface can serve as the migration bridge.
global_mem_controller.sv's body to beAXI4-backed while preserving the bespoke external surface.
Internal composition
Width wiring (per PARAMETER_TAXONOMY.md)
addr_width = 32,data_width = 32— unchangedfrom upstream so callers don't need to migrate in this PR.
phys_addr_width = 48,mem_data_width = 256,axi4_strb_width = 32— proven by thetest_axi4_widths_wiredcocotb test which inspects actual wire widths on
u_adapter.m_*.core1_addr/contr_*_addrare zero-extendedinto the 48-bit AXI4 address bus inside
global_mem_axi4_adapterand the loader-back-door wiring respectively.
Out of scope (deferred to follow-ups)
the LiteDRAM wrapper lands. Tracked separately per ADR-006.
gpu_die.sv,core.sv, etc.) offthe bespoke port surface — Phase-3 of the PARAMETER_TAXONOMY.md
migration plan.
Test evidence
cocotb 2.0.1 + Verilator 5.048, run from
verif/global_mem_controller/:Tests:
test_reset— quiescent state after resettest_axi4_widths_wired— provesphys_addr_width=48,mem_data_width=256,axi4_strb_width=32on the internal AXI4 netstest_core1_word_roundtrip— low addr0x80write+read roundtriptest_core1_high_address— top of SRAM range (32-bit addr path)test_contr_loader_then_core1_read— loader back-door write +AXI4 read; both port groups exercised
test_contr_readback_via_arbiter—contr_rd_*round-trip witharbiter granting (core1 idle)
test_arbiter_priority_core1_wins— simultaneous request:core1 wins, contr_rd's pending-request latch defers it correctly
Sibling regression:
verif/global_mem_axi4_adapter/stillTESTS=6 PASS=6 FAIL=0 SKIP=0.Test plan
global_mem_controllerbody — all 7 greenglobal_mem_axi4_adaptercocotb tests — still greensrc/gpu_die.svandtest/behav/{core,compute_unit}_and_mem.svbefore the rewrite)
gpu_die.svto skip this wrapper andinstantiate
global_mem_axi4_adapter+axi4_mem_modeldirectlyaxi4_mem_modelinstantiation with theLiteDRAM AXI4 wrapper once that PR lands
External port signature preserved
The new module's external ports are bit-identical to the upstream:
Two non-functional changes vs. the original:
core1_rd_data/core1_busy/core1_ackarenow
wireinstead ofreg(driven by continuous-assign demuxes).The
gpu_die.svinstantiation pattern (.core1_rd_data(core1_mem_rd_data)with the receiving net being a
wire) is unaffected.wire(the upstream relied on theimplicit default). No semantic change.
Authored by Agent 1 (RTL Architect).