Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add fence signal to CPU bus #800

Merged
merged 10 commits into from
Feb 9, 2024
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ mimpid = 0x01040312 -> Version 01.04.03.12 -> v1.4.3.12

| Date | Version | Comment | Link |
|:----:|:-------:|:--------|:----:|
| 09.02.2024 | 1.9.4.7 | :warning: integrate fence signal into CPU bus, remove top entity's fence signals | (#800)[https://github.com/stnolting/neorv32/pull/800] |
| 09.02.2024 | 1.9.4.6 | :sparkles: add configurable XIP cache | [#799](https://github.com/stnolting/neorv32/pull/799) |
| 09.02.2024 | 1.9.4.5 | :bug: close further illegal compressed instruction encoding loopholes | [#797](https://github.com/stnolting/neorv32/pull/797) |
| 04.02.2024 | 1.9.4.4 | :bug: fix minor bug: CPU instruction bus privilege signal did not remain stable during the entire request | [#792](https://github.com/stnolting/neorv32/pull/792) |
Expand Down
182 changes: 88 additions & 94 deletions docs/datasheet/cpu.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,9 @@ RISC-V _User_ and _Privileged Architecture_ specifications.

**Section Structure**

* <<_architecture>>, <<_full_virtualization>> and <<_risc_v_compatibility>>
* <<_risc_v_compatibility>>
* <<_cpu_top_entity_signals>> and <<_cpu_top_entity_generics>>
* <<_architecture>> and <<_full_virtualization>>
* <<_instruction_sets_and_extensions>> and <<_custom_functions_unit_cfu>>
* <<_control_and_status_registers_csrs>>
* <<_traps_exceptions_and_interrupts>>
Expand Down Expand Up @@ -56,6 +57,74 @@ be emulated. The NEORV32 <<_core_libraries>> provide an emulation wrapper for th
instructions that is based on LR/SC pairs. A demo/program can be found in `sw/example/atomic_test`.


<<<
// ####################################################################################################################
:sectnums:
=== CPU Top Entity - Signals

The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
direction as seen from the CPU.

.NEORV32 CPU Signal List
[cols="<3,^3,^1,<5"]
[options="header", grid="rows"]
|=======================
| Signal | Width/Type | Dir | Description
4+^| **Global Signals**
| `clk_i` | 1 | in | Global clock line, all registers triggering on rising edge, this clock can be switched off during <<_sleep_mode>>
| `clk_aux_i` | 1 | in | Always-on clock, used to keep the the sleep control active when `clk_i` is switched off
| `rstn_i` | 1 | in | Global reset, low-active
| `sleep_o` | 1 | out | CPU is in <<_sleep_mode>> when set
| `debug_o` | 1 | out | CPU is in <<_cpu_debug_mode,debug mode>> when set
4+^| **Interrupts (<<_traps_exceptions_and_interrupts>>)**
| `msi_i` | 1 | in | RISC-V machine software interrupt
| `mei_i` | 1 | in | RISC-V machine external interrupt
| `mti_i` | 1 | in | RISC-V machine timer interrupt
| `firq_i` | 16 | in | Custom fast interrupt request signals
| `dbi_i` | 1 | in | Request CPU to halt and enter debug mode (RISC-V <<_on_chip_debugger_ocd>>)
4+^| **Instruction <<_bus_interface>>**
| `ibus_req_o` | `bus_req_t` | out | Instruction fetch bus request
| `ibus_rsp_i` | `bus_rsp_t` | in | Instruction fetch bus response
4+^| **Data <<_bus_interface>>**
| `dbus_req_o` | `bus_req_t` | out | Data access (load/store) bus request
| `dbus_rsp_i` | `bus_rsp_t` | in | Data access (load/store) bus response
|=======================

.Bus Interface Protocol
[TIP]
See section <<_bus_interface>> for the instruction fetch and data access interface protocol and the
according interface types (`bus_req_t` and `bus_rsp_t`).


<<<
// ####################################################################################################################
:sectnums:
=== CPU Top Entity - Generics

Most of the CPU configuration generics are a subset of the actual Processor configuration generics
(see section <<_processor_top_entity_generics>>). and are not listed here. However, the CPU provides
some _specific_ generics that are used to configure the CPU for the NEORV32 processor setup. These generics
are assigned by the processor setup only and are not available for user defined configuration.
The specific generics are listed below.

.Table Abbreviations
[NOTE]
The generic type "suv(x:y)" defines a `std_ulogic_vector(x downto y)`.

.NEORV32 CPU-Exclusive Generic List
[cols="<4,^2,<8"]
[options="header",grid="rows"]
|=======================
| Name | Type | Description
| `CPU_BOOT_ADDR` | suv(31:0) | CPU reset address. See section <<_address_space>>.
| `CPU_DEBUG_PARK_ADDR` | suv(31:0) | "Park loop" entry address for the <<_on_chip_debugger_ocd>>, has to be 4-byte aligned.
| `CPU_DEBUG_EXC_ADDR` | suv(31:0) | "Exception" entry address for the <<_on_chip_debugger_ocd>>, has to be 4-byte aligned.
| `CPU_EXTENSION_RISCV_Sdext` | boolean | Implement RISC-V-compatible "debug" CPU operation mode required for the <<_on_chip_debugger_ocd>>.
| `CPU_EXTENSION_RISCV_Sdtrig` | boolean | Implement RISC-V-compatible trigger module. See section <<_on_chip_debugger_ocd>>.
|=======================


<<<
// ####################################################################################################################
:sectnums:
Expand Down Expand Up @@ -252,15 +321,16 @@ is driven by the _accessed_ device or bus system (i.e. a processor-internal memo
[cols="^1,^1,<6"]
[options="header",grid="rows"]
|=======================
| Signal | Width | Description
| `addr` | 32 | Access address (byte addressing)
| `data` | 32 | Write data
| `ben` | 4 | Byte-enable for each byte in `data`
| `stb` | 1 | Request trigger ("strobe", single-shot)
| `rw` | 1 | Access direction (`0` = read, `1` = write)
| `src` | 1 | Access source (`0` = instruction fetch, `1` = load/store)
| `priv` | 1 | Set if privileged (M-mode) access
| `rvso` | 1 | Set if current access is a reservation-set operation (atomic `lr` or `sc` instruction)
| Signal | Width | Description
| `addr` | 32 | Access address (byte addressing)
| `data` | 32 | Write data
| `ben` | 4 | Byte-enable for each byte in `data`
| `stb` | 1 | Request trigger ("strobe", single-shot)
| `rw` | 1 | Access direction (`0` = read, `1` = write)
| `src` | 1 | Access source (`0` = instruction fetch, `1` = load/store)
| `priv` | 1 | Set if privileged (M-mode) access
| `rvso` | 1 | Set if current access is a reservation-set operation (atomic `lr` or `sc` instruction)
| `fence` | 1 | Data/instruction fence operation; valid without `stb` being set
|=======================

.Bus Interface - Response Bus (`bus_rsp_t`)
Expand Down Expand Up @@ -336,76 +406,6 @@ image::bus_interface_atomic.png[700]
The "normal" load data mechanism is used to return success/failure of the `sc.w` instruction to the CPU (via the LSB of `rsp.data`).


<<<
// ####################################################################################################################
:sectnums:
=== CPU Top Entity - Signals

The following table shows all interface signals of the CPU top entity `rtl/core/neorv32_cpu.vhd`. The
type of all signals is _std_ulogic_ or _std_ulogic_vector_, respectively. The "Dir." column shows the signal
direction as seen from the CPU.

.NEORV32 CPU Signal List
[cols="<3,^3,^1,<5"]
[options="header", grid="rows"]
|=======================
| Signal | Width/Type | Dir | Description
4+^| **Global Signals**
| `clk_i` | 1 | in | Global clock line, all registers triggering on rising edge, this clock can be switched off during <<_sleep_mode>>
| `clk_aux_i` | 1 | in | Always-on clock, used to keep the the sleep control active when `clk_i` is switched off
| `rstn_i` | 1 | in | Global reset, low-active
| `sleep_o` | 1 | out | CPU is in <<_sleep_mode>> when set
| `debug_o` | 1 | out | CPU is in <<_cpu_debug_mode,debug mode>> when set
| `ifence_o` | 1 | out | instruction fence (`fence.i` instruction )
| `dfence_o` | 1 | out | data fence (`fence` instruction )
4+^| **Interrupts (<<_traps_exceptions_and_interrupts>>)**
| `msi_i` | 1 | in | RISC-V machine software interrupt
| `mei_i` | 1 | in | RISC-V machine external interrupt
| `mti_i` | 1 | in | RISC-V machine timer interrupt
| `firq_i` | 16 | in | Custom fast interrupt request signals
| `dbi_i` | 1 | in | Request CPU to halt and enter debug mode (RISC-V <<_on_chip_debugger_ocd>>)
4+^| **Instruction <<_bus_interface>>**
| `ibus_req_o` | `bus_req_t` | out | Instruction fetch bus request
| `ibus_rsp_i` | `bus_rsp_t` | in | Instruction fetch bus response
4+^| **Data <<_bus_interface>>**
| `dbus_req_o` | `bus_req_t` | out | Data access (load/store) bus request
| `dbus_rsp_i` | `bus_rsp_t` | in | Data access (load/store) bus response
|=======================

.Bus Interface Protocol
[TIP]
See section <<_bus_interface>> for the instruction fetch and data access interface protocol and the
according interface types (`bus_req_t` and `bus_rsp_t`).


<<<
// ####################################################################################################################
:sectnums:
=== CPU Top Entity - Generics

Most of the CPU configuration generics are a subset of the actual Processor configuration generics
(see section <<_processor_top_entity_generics>>). and are not listed here. However, the CPU provides
some _specific_ generics that are used to configure the CPU for the NEORV32 processor setup. These generics
are assigned by the processor setup only and are not available for user defined configuration.
The specific generics are listed below.

.Table Abbreviations
[NOTE]
The generic type "suv(x:y)" defines a `std_ulogic_vector(x downto y)`.

.NEORV32 CPU-Exclusive Generic List
[cols="<4,^2,<8"]
[options="header",grid="rows"]
|=======================
| Name | Type | Description
| `CPU_BOOT_ADDR` | suv(31:0) | CPU reset address. See section <<_address_space>>.
| `CPU_DEBUG_PARK_ADDR` | suv(31:0) | "Park loop" entry address for the <<_on_chip_debugger_ocd>>, has to be 4-byte aligned.
| `CPU_DEBUG_EXC_ADDR` | suv(31:0) | "Exception" entry address for the <<_on_chip_debugger_ocd>>, has to be 4-byte aligned.
| `CPU_EXTENSION_RISCV_Sdext` | boolean | Implement RISC-V-compatible "debug" CPU operation mode required for the <<_on_chip_debugger_ocd>>.
| `CPU_EXTENSION_RISCV_Sdtrig` | boolean | Implement RISC-V-compatible trigger module. See section <<_on_chip_debugger_ocd>>.
|=======================


<<<
// ####################################################################################################################
:sectnums:
Expand Down Expand Up @@ -586,18 +586,11 @@ The `I` ISA extensions is the base RISC-V integer ISA that is always enabled.
| Illegal inst. | - | 3
|=======================

.`fence` Instruction - Predecessor and Successor Bits
.`fence` Instruction
[NOTE]
The `fence` instruction word's _predecessor_ and _successor_ bits (used for memory ordering) are not evaluated
by the hardware at all.

.`fence` Instruction - How it works
[NOTE]
CPU-internally, the `fence` instruction does not perform any operation inside the CPU. It only sets the
top's `d_bus_fence_o` signal high for one cycle to inform the memory system a `fence` instruction has been
executed. Any flags within the `fence` instruction word are ignore by the hardware. However, the `d_bus_fence_o`
signal is connected to the <<_processor_internal_data_cache_dcache>>. Hence, executing the `fence` instruction
will clear/flush the data cache and resynchronize it with main memory.
by the hardware at all. For the NEORV32 the `fence` instruction behaves exactly like the `fence.i` instruction
(see <<_zifencei_isa_extension>>).

.`wfi` Instruction
[NOTE]
Expand Down Expand Up @@ -653,10 +646,11 @@ RISC-V specs. Also, custom trap codes for <<_mcause>> are implemented.

The `Zifencei` CPU extension allows manual synchronization of the instruction stream. This extension is always enabled.

The `fence.i` instruction resets the CPU's front-end (instruction fetch) and flushes the prefetch buffer.
This allows a clean re-fetch of modified instructions from memory. Also, the top's `i_bus_fencei_o` signal is set
high for one cycle to inform the memory system (like the <<_processor_internal_instruction_cache_icache>> to perform a flush/reload.
Any additional flags within the `fence.i` instruction word are ignored by the hardware.
.NEORV32 Fence Instructions
[NOTE]
The NEORV32 treats both fence instructions (`fence` = data fence, `fence.i` = instruction fence) in exactly the same way.
Both instructions cause a flush of the CPU's instruction prefetch buffer and also send a fence request via the system
bus (see <<_bus_interface>>). This system bus fence operation will, for example, clear/flush all downstream caches.

.Instructions and Timing
[cols="<2,<4,<3"]
Expand Down
3 changes: 0 additions & 3 deletions docs/datasheet/soc.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -105,9 +105,6 @@ Some interfaces (like the TWI and the 1-Wire bus) require tri-state drivers in t
| `slink_tx_dat_o` | 32 | out | - | TX data
| `slink_tx_val_o` | 1 | out | - | TX data valid
| `slink_tx_rdy_i` | 1 | in | `'L'` | TX allowed to send
5+^| **Advanced Memory Control Signals**
| `fence_o` | 1 | out | - | set if `fence` instruction is being executed
| `fencei_o` | 1 | out | - | set if `fence.i` instruction is being executed
5+^| **<<_execute_in_place_module_xip>>**
| `xip_csn_o` | 1 | out | - | chip select, low-active
| `xip_clk_o` | 1 | out | - | serial clock
Expand Down
6 changes: 2 additions & 4 deletions docs/datasheet/soc_dcache.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -24,13 +24,12 @@ equal to 4 bytes) and `DCACHE_NUM_BLOCKS` (the total amount of cache blocks; has
equal to 1) generics. The data cache provides only a single set, hence it is direct-mapped.


**Cached/Unached Accesses**
**Cached/Uncached Accesses**

The data cache provides direct accesses (= uncached) to memory in order to access memory-mapped IO (like the
processor-internal IO/peripheral modules). All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF`
will not be cached at all (see section <<_address_space>>).


.Caching Internal Memories
[NOTE]
The data cache is intended to accelerate data access to **processor-external** memories
Expand All @@ -39,8 +38,7 @@ when using only processor-internal data and instruction memories.

.Manual Cache Clear/Reload
[NOTE]
By executing the `fence` instruction (<<_i_isa_extension>>) the cache is cleared and a reload from
main memory is triggered.
By executing the `fence(.i)` instruction the cache is cleared and a reload from main memory is triggered.

.Retrieve Cache Configuration from Software
[TIP]
Expand Down
6 changes: 2 additions & 4 deletions docs/datasheet/soc_icache.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -27,13 +27,12 @@ set-associative) generics. If the cache associativity is greater than one the LR
used) is used.


**Cached/Unached Accesses**
**Cached/Uncached Accesses**

The data cache provides direct accesses (= uncached) to memory in order to access memory-mapped IO (like the
processor-internal IO/peripheral modules). All accesses that target the address range from `0xF0000000` to `0xFFFFFFFF`
will not be cached at all (see section <<_address_space>>).


.Caching Internal Memories
[NOTE]
The instruction cache is intended to accelerate instruction fetches from **processor-external** memories
Expand All @@ -42,8 +41,7 @@ when using only processor-internal data and instruction memories.

.Manual Cache Clear/Reload
[NOTE]
By executing the `fence.i` instruction (<<_zifencei_isa_extension>>) the cache is cleared and a reload from
main memory is triggered. This also allows to implement self-modifying code.
By executing the `fence(.i)` instruction the cache is cleared and a reload from main memory is triggered.

.Retrieve Cache Configuration from Software
[TIP]
Expand Down
3 changes: 3 additions & 0 deletions docs/datasheet/soc_xip.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,9 @@ cache layout:
When the cache is implemented, the XIP module operates in **burst mode** utilizing the flash's _incremental read_ capabilities.
Thus, several bytes (= `XIP_CACHE_BLOCK_SIZE`) are read consecutively from the flash using a single read command.

The XIP cache is cleared when the XIP module is disabled (`XIP_CTRL_EN = 0`), when XIP mode is disabled
(`XIP_CTRL_XIP_EN = 0`) or when the CPU issues a `fence(.i)` instruction.


**Register Map**

Expand Down
6 changes: 0 additions & 6 deletions rtl/core/neorv32_cpu.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -79,8 +79,6 @@ entity neorv32_cpu is
rstn_i : in std_ulogic; -- global reset, low-active, async
sleep_o : out std_ulogic; -- cpu is in sleep mode when set
debug_o : out std_ulogic; -- cpu is in debug mode when set
ifence_o : out std_ulogic; -- instruction fence
dfence_o : out std_ulogic; -- data fence
-- interrupts --
msi_i : in std_ulogic; -- risc-v machine software interrupt
mei_i : in std_ulogic; -- risc-v machine external interrupt
Expand Down Expand Up @@ -257,10 +255,6 @@ begin
sleep_o <= ctrl.cpu_sleep; -- set when CPU is sleeping (after WFI)
debug_o <= ctrl.cpu_debug; -- set when CPU is in debug mode

-- instruction/data fence --
ifence_o <= ctrl.lsu_fencei;
dfence_o <= ctrl.lsu_fence;


-- Register File --------------------------------------------------------------------------
-- -------------------------------------------------------------------------------------------
Expand Down
21 changes: 10 additions & 11 deletions rtl/core/neorv32_cpu_control.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -432,12 +432,13 @@ begin
ipb.we(1) <= '1' when (fetch_engine.state = IF_PENDING) and (fetch_engine.resp = '1') else '0';

-- bus access type --
bus_req_o.priv <= fetch_engine.priv; -- current effective privilege level
bus_req_o.data <= (others => '0'); -- read-only
bus_req_o.ben <= (others => '0'); -- read-only
bus_req_o.rw <= '0'; -- read-only
bus_req_o.src <= '1'; -- source = instruction fetch
bus_req_o.rvso <= '0'; -- cannot be a reservation set operation
bus_req_o.priv <= fetch_engine.priv; -- current effective privilege level
bus_req_o.data <= (others => '0'); -- read-only
bus_req_o.ben <= (others => '0'); -- read-only
bus_req_o.rw <= '0'; -- read-only
bus_req_o.src <= '1'; -- source = instruction fetch
bus_req_o.rvso <= '0'; -- cannot be a reservation set operation
bus_req_o.fence <= ctrl.lsu_fence; -- fence(.i) operation


-- Instruction Prefetch Buffer (FIFO) -----------------------------------------------------
Expand Down Expand Up @@ -1009,9 +1010,8 @@ begin
if (trap_ctrl.exc_buf(exc_illegal_c) = '1') then -- abort if illegal instruction
execute_engine.state_nxt <= DISPATCH;
else
ctrl_nxt.lsu_fence <= not execute_engine.ir(instr_funct3_lsb_c); -- data fence
ctrl_nxt.lsu_fencei <= execute_engine.ir(instr_funct3_lsb_c); -- instruction fence
execute_engine.state_nxt <= RESTART; -- reset instruction fetch + IPB (only required for fence.i)
ctrl_nxt.lsu_fence <= '1'; -- NOTE: fence == fence.i
execute_engine.state_nxt <= RESTART; -- reset instruction fetch + IPB (actually only required for fence.i)
end if;

when BRANCH => -- update next_PC on taken branches and jumps
Expand Down Expand Up @@ -1134,8 +1134,7 @@ begin
ctrl_o.lsu_req <= ctrl.lsu_req;
ctrl_o.lsu_rw <= ctrl.lsu_rw;
ctrl_o.lsu_mo_we <= '1' when (execute_engine.state = MEM_REQ) else '0'; -- write memory output registers (data & address)
ctrl_o.lsu_fence <= ctrl.lsu_fence;
ctrl_o.lsu_fencei <= ctrl.lsu_fencei;
ctrl_o.lsu_fence <= ctrl.lsu_fence; -- fence(.i)
ctrl_o.lsu_priv <= csr.mstatus_mpp when (csr.mstatus_mprv = '1') else csr.privilege_eff; -- effective privilege level for loads/stores in M-mode

-- instruction word bit fields --
Expand Down
3 changes: 3 additions & 0 deletions rtl/core/neorv32_cpu_lsu.vhd
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,9 @@ begin
-- source identifier --
bus_req_o.src <= '0'; -- 0 = data access

-- data/instruction fence(.i)
bus_req_o.fence <= ctrl_i.lsu_fence;


-- Data Output - Alignment and Byte Enable ------------------------------------------------
-- -------------------------------------------------------------------------------------------
Expand Down
Loading
Loading