Skip to content

feat(axi4_mem_model): back-door loader port for testbench / boot ROM#8

Merged
marcos-mendez merged 2 commits intomainfrom
feat/mem-loader
May 5, 2026
Merged

feat(axi4_mem_model): back-door loader port for testbench / boot ROM#8
marcos-mendez merged 2 commits intomainfrom
feat/mem-loader

Conversation

@marcos-mendez
Copy link
Copy Markdown
Member

Adds 3 input ports (loader_en, loader_addr, loader_data) to axi4_mem_model for testbench-side and on-chip boot-ROM-style program loading. Production synthesis ties loader_en to 0 and the loader logic drops out.

Why now

InnerJib7EA needs to pre-populate instruction memory with a RISC-V program before releasing reset on the upstream core (so the core can fetch real instructions). Upstream achieves this via global_mem_controller.contr_wr_*; we need the equivalent on our AXI4 mem model. This PR is the last MAST piece required before InnerJib7EA can run sum_ints.asm end-to-end through the AXI4 chain (Sprint C, follow-up PR in InnerJib7EA repo).

Verification

Same Verilator + cocotb harness, all 16 tests green:

Module Before After
axi4_mem_model 5 ✓ 6 ✓ (+ test_loader_then_axi4_read)
axi4_master_simple 5 ✓ 5 ✓
core_axi4_adapter 5 ✓ 5 ✓

The new test pre-loads 4 distinct 32-bit words across two consecutive 32-byte cache lines via loader_en, then reads them back via AXI4. Verifies slot indexing within a cache line and cross-line write behaviour.

Profile

  • Loader is synchronous, single-cycle, no handshake. Caller holds loader_en + loader_addr + loader_data for one rising edge.
  • Address is byte-addressed; must be 4-byte aligned.
  • Datum is 32 bits (matches the upstream RISC-V instruction width and the data path of core.sv); the mem model places it in the correct slot of the 256-bit cache line.
  • Wrappers that don't need the back-door (axi4_master_mem_wrapper, core_axi4_chain_wrapper) tie loader_en to 0 — no behaviour change for those tests.

Self-review

Will be added as a comment after CI goes green and the code-reviewer agent runs.

Adds three input ports to axi4_mem_model:
  loader_en    : 1-bit pulse-or-hold
  loader_addr  : phys_addr_width-bit byte address (must be 4-byte aligned)
  loader_data  : 32-bit datum

When loader_en is high on a posedge clk, the 32-bit datum is written into
the correct slot of the targeted 256-bit cache line, by byte address. No
handshake (caller holds the inputs stable for one cycle).

The loader path runs in parallel with the AXI4 write FSM and never
interferes with normal AXI4 traffic when loader_en is low. Production
synthesis can tie loader_en to 0 to drop the loader logic entirely.

Use case: cocotb / FPGA testbench pre-populates instruction memory with
a RISC-V program before releasing reset on the core, exactly the pattern
upstream uses with global_mem_controller's contr_wr_* port. This is the
last MAST piece needed before InnerJib7EA can run a real program
(sum_ints.asm) end-to-end through the AXI4 chain.

Verification:
  TESTS=6 PASS=6 FAIL=0 SKIP=0  (axi4_mem_model)
    + new test_loader_then_axi4_read — pre-load 4 words across two cache
      lines via loader_en, then read them back via AXI4. Verifies slot
      indexing within the cache line and cross-line behaviour.
  TESTS=5 PASS=5 FAIL=0 SKIP=0  (axi4_master_simple, no regression)
  TESTS=5 PASS=5 FAIL=0 SKIP=0  (core_axi4_adapter, no regression)

The two existing testbench wrappers (axi4_master_mem_wrapper and
core_axi4_chain_wrapper) tie loader_en to 0; they don't need the
back-door for their own purposes.

Signed-off-by: Marcos <m@pop.coop>
Self-review by code-reviewer agent flagged 3 HIGH findings:

  HIGH 1. Write-write conflict between loader and AXI4 FSM was
          undefined behaviour per SV NBA semantics across separate
          always blocks. Verilator happens to make the loader win
          but other simulators may differ.

          Fix: collapsed the standalone loader always block into the
          AXI4 write FSM's always block, sequenced as the LAST NBA in
          the procedural step. SV guarantees the last NBA in a single
          procedural block wins on conflicts — deterministic loader-
          wins-on-conflict semantics across all SV-compliant tools.

  HIGH 2. Misaligned loader_addr silently corrupted adjacent 32-bit
          slots (or wrote past bit 255 of the cache line at offsets
          29-31).

          Fix: added 'assert (loader_addr[1:0] == 2'b00) else $error'
          inside the loader branch. Misaligned addresses now produce
          a clear simulator error instead of silent corruption.
          Synthesis ignores the assertion.

  HIGH 3. Loader always block had no reset arm — inconsistent with
          every other always block in the file.

          Fix: subsumed by HIGH 1 fix. The merged always block
          inherits the AXI4 FSM's '@(posedge clk or negedge rst_n)'
          sensitivity list and reset semantics.

Also addressed LOW finding:

  LOW. README.md in src/popsolutions/axi4/ did not document the
       loader port contract, semantics, conflict resolution rule,
       alignment requirement, or synthesis tie-off convention.

       Fix: appended a 'Memory loader back-door' section covering
       all of the above.

Test status (verified locally, all green):
  axi4_mem_model:        6/6 pass (no regression)
  axi4_master_simple:    5/5 pass (no regression)
  core_axi4_adapter:     5/5 pass (no regression)
  inner_jib_top (IJ7EA): 2/2 pass (no regression — full sum_ints
                                   still runs end-to-end)

  18/18 across both repos.

Deferred MEDIUM findings (not blocking):
  - Test coverage gap for slots 3-7 within a cache line and explicit
    misalignment negative test. Will land as a follow-up PR with
    test_loader_all_slots and test_loader_misalign_errors.
  - Bit-width arithmetic note already covered by the new alignment
    assertion (the unsafe path now errors at runtime).

Signed-off-by: Marcos <m@pop.coop>
@marcos-mendez
Copy link
Copy Markdown
Member Author

Self-review (code-reviewer agent, 2026-05-05)

Per the GOVERNANCE.md self-review bridge, an independent review pass was run before requesting merge. Surfacing findings as required.

Severity Count Status
CRITICAL 0
HIGH 3 All 3 addressed in 64dde795
MEDIUM 2 Bit-width arithmetic now guarded by alignment assertion. Test coverage gaps deferred to a follow-up PR (see below).
LOW 2 README documentation gap addressed (src/popsolutions/axi4/README.md updated). Synthesis caveat noted there.

HIGH 1 — addressed

Write-write conflict undefined behaviour. The standalone loader always block running concurrently with the AXI4 write FSM's always block created NBA-ordering ambiguity per the SV LRM. Fix: collapsed the loader logic into the AXI4 write FSM's procedural block, sequenced as the LAST non-blocking assignment in the step. Loader-wins-on-conflict is now deterministic across all SV-compliant simulators.

HIGH 2 — addressed

Misaligned loader_addr silently corrupted memory. Fix: added a SystemVerilog assert (loader_addr[1:0] == 2'b00) else $error(...) inside the loader branch. Synthesis tools ignore assertions; simulators raise a clear error when the testbench misaligns.

HIGH 3 — addressed

Loader always block lacked reset arm. Fix: subsumed by HIGH 1 (merged into the FSM's reset-sensitive block).

MEDIUM 1 — deferred to follow-up PR

Test coverage gaps: existing test_loader_then_axi4_read only exercises slots 0–2 and slot 0 of the next line. A follow-up PR will add test_loader_all_slots (writes all 8 slots of one line) and test_loader_misalign_errors (negative test confirming the new alignment assertion fires).

Verdict from this self-review pass

APPROVE for merge. No remaining CRITICAL or HIGH findings. CI green required (Verilator + cocotb on every push, see .github/workflows/ci.yml). Per GOVERNANCE.md "No AI self-merge" clause, the merge button is yours, Marcos. Suggested merge command:

gh pr merge 8 --repo popsolutions/MAST --squash

@marcos-mendez marcos-mendez merged commit 9938d23 into main May 5, 2026
1 check passed
marcos-mendez pushed a commit to popsolutions/InnerJib7EA that referenced this pull request May 5, 2026
 loader)

After PR popsolutions/MAST#7 (governance) and popsolutions/MAST#8 (loader
port) merged to MAST main, this commit moves the InnerJib7EA submodule
pin from the throwaway feat/mem-loader branch to MAST main HEAD. The
SHA referenced is now reachable from MAST main and will not garbage
collect.

No source-level changes; just submodule pin bump.

Signed-off-by: Marcos <m@pop.coop>
marcos-mendez added a commit to popsolutions/InnerJib7EA that referenced this pull request May 5, 2026
…ing (#7)

* feat(inner_jib_top): full integration + sum_ints end-to-end test passing

Sprint C complete. `src/inner_jib_top.sv` wires:

  upstream core (mast/src/core.sv)
    -> core_axi4_adapter
    -> axi4_master_simple
    -> axi4_mem_model (with loader back-door from MAST#8)

The cocotb test in `verif/inner_jib_top/` runs the real RISC-V program
`mast/examples/direct/sum_ints.asm` end-to-end through this chain:

  1. Pre-assembled `fixtures/sum_ints.hex` is loaded into the mem model
     via loader_en/addr/data, byte address 128 onward.
  2. set_pc_addr=128 + set_pc_req pulse points the core at the program.
  3. ena=1 releases the core.
  4. Test captures every (out, outen) pulse until halt.
  5. Asserts the captured sequence equals [0, 5, 4, 3, 2, 1, 0, 15] —
     hand-traced from the asm source.
  6. Asserts halt asserts within 50 000 cycles.

  TESTS=2 PASS=2 FAIL=0 SKIP=0
    test_reset                       PASS  90 ns
    test_sum_ints_runs_and_halts     PASS  2530 ns sim time

This is the first program of any kind running on PopSolutions silicon-
equivalent hardware in simulation. Every subsequent program (factorial,
primes, matrix multiply, GGML kernels) lands on the same path.

Closes #4 (top-level integration).

DEPENDS ON popsolutions/MAST#8 (loader back-door port). The MAST
submodule is temporarily pinned to MAST's feat/mem-loader branch
(commit c12a6a5); it will be bumped to MAST main HEAD after MAST#8
merges.

Also adds:
  - .github/workflows/ci.yml — Verilator + cocotb on every push/PR,
    with submodule checkout, Verilator 5.040 from source with cache.
  - README.md — status badge, run-locally instructions, MAST link.
  - .gitignore — verif outputs (sim_build, .venv, *.vcd, etc.)

Signed-off-by: Marcos <m@pop.coop>

* fix: bump MAST submodule + document fixture regen (PR #7 review)

Self-review on PR #7 flagged 2 MEDIUM findings:

  MEDIUM 1. fixtures/sum_ints.hex had no regeneration recipe.
            Future updates to sum_ints.asm would have no documented
            path to refresh the fixture.

            Fix: added verif/inner_jib_top/fixtures/README.md with
            the exact assembler command, a hand-traced output
            expectation, and a rationale for checking the hex in
            (vs. regenerating in CI).

  MEDIUM 2. Loader-to-core start timing assumption (one clock gap
            between final loader_en=0 and set_pc_req=1) was not
            documented. Test passes locally but the assumption is
            implicit.

            Defer to a follow-up — the assumption is documented now
            in the fixture README via the rationale paragraph; a
            dedicated comment in the test loop can land in a small
            doc-only PR.

Also bumps the MAST submodule pin to include MAST PR #8's self-review
fixes (loader collapsed into AXI4 write FSM, alignment assert, README
loader docs). The pin still points at MAST's feat/mem-loader branch
because PR #8 has not yet merged to MAST main; once it does, a
follow-up commit will bump the pin again to a main-reachable SHA.

Test status (verified locally with the bumped submodule):
  inner_jib_top: 2/2 pass — sum_ints.asm still runs end-to-end.

Signed-off-by: Marcos <m@pop.coop>

* chore: bump MAST submodule pin to main (post-merge of #7 governance + #8 loader)

After PR popsolutions/MAST#7 (governance) and popsolutions/MAST#8 (loader
port) merged to MAST main, this commit moves the InnerJib7EA submodule
pin from the throwaway feat/mem-loader branch to MAST main HEAD. The
SHA referenced is now reachable from MAST main and will not garbage
collect.

No source-level changes; just submodule pin bump.

Signed-off-by: Marcos <m@pop.coop>

* fix(ci): switch MAST submodule URL from SSH to HTTPS

The CI failure on this branch was: 'Host key verification failed' when
GitHub Actions tried to clone git@git.pop.coop:pop/MAST.git via SSH.
GitHub runners have no SSH key for git.pop.coop, so the SSH URL
cannot be used in CI.

Fix: switch the submodule URL to https://git.pop.coop/pop/MAST.git.
The Forgejo MAST repo is public, so HTTPS clone needs no auth.

Local developers who prefer SSH can override locally without committing:
  git config submodule.mast.url git@git.pop.coop:pop/MAST.git

Also adds 'branch = main' to the .gitmodules entry so future
'git submodule update --remote' tracks the canonical branch.

Signed-off-by: Marcos <m@pop.coop>

* fix(ci): add -Wno-IMPLICITSTATIC for Verilator 5.040 strictness on upstream tasks

CI on Verilator 5.040 (built from source in the workflow cache) now
treats parameter declarations inside upstream tasks
(mast/src/int/chunked_add_task.sv:12, chunked_sub_task.sv:12) as
IMPLICITSTATIC warnings, which our Makefile escalates to errors via
the cocotb default '%Error: Exiting due to N warning(s)'.

The local Verilator 5.048 (Manjaro) does not flag these — version
drift between local and CI again.

Fix: add -Wno-IMPLICITSTATIC alongside the other upstream-legacy
warning suppressions already in EXTRA_ARGS. The upstream files are
otherwise clean and we don't want to modify them locally.

Signed-off-by: Marcos <m@pop.coop>

---------

Signed-off-by: Marcos <m@pop.coop>
Co-authored-by: Marcos <m@pop.coop>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant