# **Multi-Core Toy CPU – Microarchitecture Specification**

Dual-core, unified L1 per core, shared L2, LL/SC atomics, debug instrumentation

#### 1. Top-Level Architecture



Unified L1 per core arbitrates IF/DATA; RR arbiter multiplexes L1 requests to a single shared L2.

| Address     | Region            | Notes               |
|-------------|-------------------|---------------------|
| 0x0000_0000 | Core0 code        | Reset PC Core0      |
| 0x0000_1000 | Core1 code        | Reset PC Core1      |
| 0x0000_2000 | Shared counter(s) | Used by LL/SC tests |
| 0x0000_3000 | Constants         | Loop bounds, etc.   |

## 2. Pipeline Microarchitecture



Taken branch flushes IF/ID

### 3. Cache Timing Examples

#### 3.1 L1 Hit (1-cycle turnaround)



#### 3.2 L1 Miss $\rightarrow$ L2 Fill



## 4. LL/SC Retry Flow



## 5. False Sharing (Same Cache Line)



Reservation cleared  $\rightarrow$  SC fails

## 6. Instruction Encoding

32-bit word: [31:28]=opcode, [27:24]=rd, [23:20]=rs1, [19:16]=rs2, [15:0]=imm.

| Opcode | Mnemonic | Description                           |  |
|--------|----------|---------------------------------------|--|
| 0000   | NOP      | No operation                          |  |
| 0001   | LL       | Load-Linked (sets reservation)        |  |
| 0010   | SC       | Store-Conditional (conditional write) |  |
| 0011   | ADD      | Addition (rd = rs1 + imm/rs2)         |  |
| 0100   | SUB      | Subtraction (rd = rs1 - imm/rs2)      |  |
| 0101   | BRZ      | Branch if zero (rs1==0)               |  |
| 0110   | JMP      | Unconditional jump                    |  |
| 1111   | HALT     | Halt on retire                        |  |

### 7. CSR Map

| Addr  | Name        | Bits   | Description             | Reset      |
|-------|-------------|--------|-------------------------|------------|
| 0x000 | mstatus     | [31:0] | Global status/IE        | 0x00000000 |
| 0x004 | mtvec       | [31:0] | Trap vector base        | 0x00000000 |
| 0x008 | mepc        | [31:0] | Exception PC            | 0x00000000 |
| 0x00C | mcause      | [31:0] | Trap cause              | 0x00000000 |
| 0x010 | mtval       | [31:0] | Bad addr/value          | 0x00000000 |
| 0x100 | cycle       | [63:0] | Cycle counter           | 0x00000000 |
| 0x104 | instret     | [63:0] | Instructions retired    | 0x00000000 |
| 0x110 | sc_succ     | [31:0] | SC success count        | 0x00000000 |
| 0x114 | sc_fail     | [31:0] | SC failure count        | 0x00000000 |
| 0x118 | I1_miss     | [31:0] | L1 miss count           | 0x00000000 |
| 0x11C | bpu_mispred | [31:0] | Branch mispredict count | 0x00000000 |
| 0x120 | tlb_miss    | [31:0] | TLB miss count          | 0x00000000 |
| 0x200 | last_pc     | [31:0] | Last retired PC         | 0x00000000 |
| 0x204 | last_inst   | [31:0] | Last retired inst       | 0x00000000 |
| 0x210 | watchdog    | [31:0] | Watchdog counter        | 0x00000000 |
| 0x214 | halt_status | [1:0]  | Per-core HALT bits      | 0x00000000 |

### 8. Debugging & Verification

- Per-core SC counters, per-cycle SC event pulses. - CSV logging (sc\_trace.csv): cycle, core, event. - Watchdog with dump (PCs, retire trace, counters, memory window). - Assertions: IF never writes/atomics; counter bounds.

#### 9. Roadmap

Exceptions & interrupts; Multiply/Divide unit; 2-bit BHT; unified TLB; N-core scaling; FPGA prototyping.