Skip to content

Commit

Permalink
[DRAMsys 5.0] DRAMsys co-simulation with MemPool/TeraPool system (#106)
Browse files Browse the repository at this point in the history
* [TeraPool] Configurations Changes for TeraPool merge into MemPool

* [DRAMsys] add dram rtl model

* [DRAMsys] fix simulation bug

* [DRAMsys] setting to add dramsys support

* [DRAMsys] fix bugs: stack overflow when reading from dramsys

* [Software] Temp change for easy debug

* [DMA] DMA bug fix and mempool trace bug fix

* [DRAM] Update DRAM lib with AXI reordering

* [DRAM] Format codes

* [DRAM] Merge SRAM and DRAM simulation in one RTL file

* [Software] Update memcpy kernel

* [Makefile] Update Makefile control simulation with dram var

* [DRAM] Delete old file

* [Bender] Remove the deleted RTL file

* [RTL] Change the AXI MUX to AXI Xbar to connect the DRAM

* [DRAM] DRAM update to support interleaved address mapping

* [DRAM] Update DRAM model to support interleaved mode and fix write bugs

* [Hardware] Support the different interleave mode for DRAM access

* [Config] L2 address and size update

* [Kernel] memcpy kernel update

* [DRAM] Non-Ideal PHY latency support

* [DRAM] Python Script for DRAM Bandwidth Analysis

* [Format] Format the files for CI check

* [Format] Format and put liscenses to files for CI checking

* [Format] Format DRAM python script for CI check

* [AXI] Update Auto Spliter Adding, Update Interleave SystemVerilog Writing Style

* [HBM2E] Update DRAM HBM model to MICRON HBM2E-3600

* [Env] Update some configurations, include the fifo size and DRAM configuration

* [Rebase] Rebase the DRAM work on top of main branch

* [Config] Complete MinPool config for CI checking

* [memcpy] Reduce transfer size for MinPool CI check

* [memcpy] Reduce transfer size for MinPool CI check

* [memcpy] Remove unused dump from kernel

* [FIFO depth] The Fifo depth tune for support 8 outstanding transctions to hide DRAM latency

* [DRAMsys] Remove the local version of DRAMsys hardware folder, add the open-sourced DRAMsys as a submodule

* [Config] Move the dram related configurations to the config.mk

* [Makefile] Modify Makefile for updating submodule, patching dram configurations, and compiling dramsys dynamic library.

* [hardware] hardware change for the new version dramsys support

* [DRAM] Add the configuration files for HBM2 DRAM simulation, these file will patch to dram_sim_rtl submodule by  makefile target

* [tb] Change back the simulation clk period to 2ns, but 1ns will have better DRAM BW as the HBM2 support upto 3600Gbps DDR

* [Bender] Update bender to the correct RTL name, as DRAMsys updated themself

* [software] Update memcpy kernel with reasonable transfer size and turn on the verification

* [config] Change simulation to SRAM as L2 for CI checking

* [CI test] Fix tb whitespace tailing and change the bender to compile dramsys rtl only by vsim

* [CHANGELOG and README] Add changelog and readme for DRAM co-simulation

* [Compiler version] Remove the cmake and gcc version from Makefile and update the ci.yml

* [rtl] As we solved the bug in DRAM reset, set the reset edge back to the original version

* Change the memcpy result dump CSR and remove the repeat in Makefile

---------

Co-authored-by: Zhang Chi <chizhang@iis.ee.ethz.ch>
  • Loading branch information
yichao-zh and Zhang Chi committed May 6, 2024
1 parent 1499545 commit 9dd79af
Show file tree
Hide file tree
Showing 27 changed files with 691 additions and 83 deletions.
6 changes: 3 additions & 3 deletions .gitlab/.gitlab-ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@ variables:
PATH: '/home/gitlabci/miniconda3/condabin:/home/gitlabci/.cargo/bin:/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/usr/local/condor/bin:/usr/sepp/bin:$VERILATOR_ROOT/bin'
OBJCACHE: ''
RISCV_WARNINGS: '-Werror'
CC: 'gcc-8.2.0'
CXX: 'g++-8.2.0'
CMAKE: 'cmake-3.18.1'
CC: 'gcc-11.2.0'
CXX: 'g++-11.2.0'
CMAKE: 'cmake-3.28.3'

workflow:
rules:
Expand Down
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -40,3 +40,6 @@
[submodule "hardware/deps/fpu_div_sqrt_mvp"]
path = hardware/deps/fpu_div_sqrt_mvp
url = https://github.com/pulp-platform/fpu_div_sqrt_mvp.git
[submodule "hardware/deps/dram_rtl_sim"]
path = hardware/deps/dram_rtl_sim
url = https://github.com/pulp-platform/dram_rtl_sim.git
4 changes: 4 additions & 0 deletions Bender.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,6 +57,10 @@ sources:
- hardware/tb/traffic_generator.sv
# Level 2
- hardware/tb/mempool_tb.sv
# DRAMsys
- hardware/deps/dram_rtl_sim/src/sim_dram.sv
- hardware/deps/dram_rtl_sim/src/axi_dram_sim.sv
- hardware/deps/dram_rtl_sim/src/dram_sim_engine.sv

- target: mempool_verilator
files:
Expand Down
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
### Added
- Add `apb` dependency of version 0.2.4
- Add support for the `FENCE` instruction
- Add support for DRAMsys5.0 co-simulation

### Changes
- Add physical feasible TeraPool configuration with SubGroup hierarchy.
Expand Down
38 changes: 36 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -30,10 +30,10 @@ CMAKE ?= cmake
# CC and CXX are Makefile default variables that are always defined in a Makefile. Hence, overwrite
# the variable if it is only defined by the Makefile (its origin in the Makefile's default).
ifeq ($(origin CC),default)
CC = gcc
CC ?= gcc
endif
ifeq ($(origin CXX),default)
CXX = g++
CXX ?= g++
endif
BENDER_VERSION = 0.28.1

Expand Down Expand Up @@ -160,6 +160,40 @@ update-deps:
done
git apply hardware/deps/patches/*

# Build, update and patch the DRAMsys submodule
$(eval DRAM_PATH=$(realpath $(shell git config --file .gitmodules --get-regexp dram_rtl_sim.path | awk '/hardware/{ print $$2 }')))
$(eval DRAM_LIB_PATH=$(DRAM_PATH)/dramsys_lib)
$(eval DRAMSYS_PATH=$(DRAM_LIB_PATH)/DRAMSys)
$(eval DRAMSYS_PATCH_PATH=$(DRAM_LIB_PATH)/dramsys_lib_patch)
$(eval DRAMSYS_SO_PATH=$(DRAMSYS_PATH)/build)

clean-dram:
if [ -d "$(DRAMSYS_PATH)" ]; then \
rm -rf $(DRAMSYS_PATH); \
fi

build-dram: clean-dram
if [ ! -d "$(DRAMSYS_PATH)" ]; then \
git clone https://github.com/tukl-msd/DRAMSys.git $(DRAMSYS_PATH); \
fi
cd $(DRAMSYS_PATH) && git reset --hard 8e021ea && git apply $(DRAMSYS_PATCH_PATH)

config-dram: build-dram
@cp hardware/include/dram_config/am_hbm2e_16Gb_pc_brc.json $(DRAMSYS_PATH)/configs/addressmapping/.
@cp hardware/include/dram_config/mc_hbm2e_fr_fcfs_grp.json $(DRAMSYS_PATH)/configs/mcconfig/.
@cp hardware/include/dram_config/ms_hbm2e_16Gb_3600.json $(DRAMSYS_PATH)/configs/memspec/.
@cp hardware/include/dram_config/simconfig_hbm2e.json $(DRAMSYS_PATH)/configs/simconfig/.
@mv $(DRAMSYS_PATH)/configs/hbm2-example.json $(DRAMSYS_PATH)/configs/hbm2-example.json.ori
@cp hardware/include/dram_config/HBM2E-3600.json $(DRAMSYS_PATH)/configs/hbm2-example.json

setup-dram: config-dram
cd $(DRAMSYS_PATH) && \
if [ ! -d "build" ]; then \
mkdir build && cd build; \
cmake -DCMAKE_CXX_FLAGS=-fPIC -DCMAKE_C_FLAGS=-fPIC -D DRAMSYS_WITH_DRAMPOWER=ON .. ; \
make -j; \
fi

# Helper targets
.PHONY: clean format apps

Expand Down
38 changes: 38 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -187,6 +187,39 @@ To get a visualization of the traces, check out the `scripts/tracevis.py` script

We also provide Synopsys Spyglass linting scripts in the `hardware/spyglass`. Run `make lint` in the `hardware` folder, with a specific MemPool configuration, to run the tests associated with the `lint_rtl` target.

## DRAMsys Co-Simulation

The MemPool system supports both on-chip SRAM or off-chip DRAM co-simulation for higher hierarchy memory transfering. For off-chip DRAM co-simulation, it incorporates the `dram_rtl_sim` tool as a submodule, build at `hardware/deps/dram_rtl_sim`. Leveraging DRAMSys5.0, it facilitates an effective co-simulation environment between RTL models and DRAMSys5.0 for the simulation of DRAM + CTRL models, with contemporary off-chip DRAM technologies (e.g., LPDDR, DDR, HBM).

The DRAMsys tool aids are open-sourced and can be found here:
[https://github.com/pulp-platform/dram_rtl_sim](https://github.com/pulp-platform/dram_rtl_sim)

### Building DRAMsys Co-Simulation

To prepare for DRAMsys co-simulation, adjust the system configuration by setting `l2_sim_type` to `dram` in `config/config.mk`. Then, execute the following command in the project's root directory to establish the DRAMsys tool aids environment:

```bash
make setup-dram
```

This makefile target automates several tasks:
1. Cleans up the existing DRAMSys5.0 repository, if previously built.
2. Rebuilds the DRAMSys5.0 repository and applies necessary patches within `hardware/deps/dramsys_rtl_sim/dramsys_lib/`.
3. Applies HBM2 DRAM configuration patches tailored for the MemPool system simulation.
4. Compiles the DRAMSys dynamic linkable library located at `hardware/deps/dramsys_rtl_sim/dramsys_lib/DRAMSys`.

**Important:** This environment requires `cmake` version 3.28.1 or higher and GCC version 11.2.0 or above.

### DRAM Chip Configuration

DRAMsys supports a range of contemporary off-chip DRAM technologies, including LPDDR, DDR, and HBM. Configuration files, formatted as `.json`, are accessible in the following directory: `hardware/deps/dramsys_rtl_sim/dramsys_lib/DRAMSys/configs`. Additionally, we provide a recommended HBM2 configuration for the MemPool system located within `hardware/deps/dramsys_rtl_sim/dramsys_lib/DRAMSys`. This configuration is automatically applied as the default setting when establishing the DRAMsys tool aids environment. You are encouraged to review and modify these configurations as necessary to meet your specific simulation requirements.

### Testing MemPool-DRAMSys Co-Simulation

For data transfer testing between the MemPool system and higher hierarchy memory through DMA transfer, use the prepared example kernel located in `software/tests/baremetal/memcpy`. For more detailed methods on building applications and setting up RTL simulation, please refer to the sections aboves.

**Note:** Currently, the simulation crafting tool for off-chip DRAM co-simulation is not open-sourced. We utilize the `Questasim` simulator exclusively.

## Publications
If you use MemPool in your work or research, you can cite us:

Expand Down Expand Up @@ -602,5 +635,10 @@ The open-source simulator [Verilator](https://www.veripool.org/verilator) can be

- `toolchain/verilator` is licensed under GPL. See [Verilator's license](https://github.com/verilator/verilator/blob/master/LICENSE) for more details.

### DRAMsys5.0

- The `dram_rtl_sim` submodule, located at `hardware/deps/dram_rtl_sim`, is licensed under the Solderpad Hardware License 0.51. You can review the license [here](https://github.com/pulp-platform/dram_rtl_sim/blob/main/LICENSE).
- [DRAMSys5.0](https://github.com/tukl-msd/DRAMSys) is utilized for DRAM simulations. For details on its usage and licensing, please refer to the DRAMSys5.0 [license information](https://github.com/tukl-msd/DRAMSys).

</p>
</details>
9 changes: 4 additions & 5 deletions config/config.mk
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,6 @@ boot_addr ?= 2684354560 # A0000000

# L2 memory configuration (in dec)
l2_base ?= 2147483648 # 80000000
l2_size ?= 4194304 # 400000
l2_banks ?= 4

# L1 size per bank (in dec)
l1_bank_size ?= 1024
Expand All @@ -52,9 +50,6 @@ axi_data_width ?= 512
# Read-only cache line width in AXI interconnect (in bits)
ro_line_width ?= 512

# Number of DMA backends in each group
dmas_per_group ?= 4

#############################
## Xqueues configuration ##
#############################
Expand All @@ -81,3 +76,7 @@ xDivSqrt ?= 0
# This parameter is only used for TeraPool configurations
num_sub_groups_per_group ?= 1
remote_group_latency_cycles ?= 7

# DRAMsys co-simulation: dram/sram
l2_sim_type ?= sram
dram_axi_width_interleaved ?= 16
9 changes: 8 additions & 1 deletion config/mempool.mk
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,14 @@ num_divsqrt_per_tile ?= 1
banking_factor ?= 4

# Radix for hierarchical AXI interconnect
axi_hier_radix ?= 20
axi_hier_radix ?= 17

# Number of AXI masters per group
axi_masters_per_group ?= 1

# Number of DMA backends in each group
dmas_per_group ?= 1 # Brust Length = 16

# L2 Banks/Channels
l2_size ?= 4194304 # 400000
l2_banks ?= 4
10 changes: 7 additions & 3 deletions config/minpool.mk
Original file line number Diff line number Diff line change
Expand Up @@ -33,11 +33,15 @@ axi_data_width ?= 256
# Read-only cache line width in AXI interconnect (in bits)
ro_line_width ?= 256

# Number of DMA backends in each group
dmas_per_group ?= 1

# Radix for hierarchical AXI interconnect
axi_hier_radix ?= 2

# Number of AXI masters per group
axi_masters_per_group ?= 1

# Number of DMA backends in each group
dmas_per_group ?= 1 # Brust Length = 16

# L2 Banks/Channels
l2_size ?= 4194304 # 400000
l2_banks ?= 4
3 changes: 2 additions & 1 deletion config/terapool.mk
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@ axi_hier_radix ?= 9
axi_masters_per_group ?= 4

# Number of DMA backends in each group
dmas_per_group ?= 4
dmas_per_group ?= 4 # Brust Length = 16

# L2 Banks/Channels
l2_banks = 16
l2_size ?= 16777216 # 1000000
12 changes: 9 additions & 3 deletions hardware/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,12 @@ python ?= python3
# Enable tracing
snitch_trace ?= 1

# Path to DRAMsys
dramsys_resouces_path ?= $(MEMPOOL_DIR)/hardware/deps/dram_rtl_sim/dramsys_lib/DRAMSys/configs
dramsys_lib_path ?= $(MEMPOOL_DIR)/hardware/deps/dram_rtl_sim/dramsys_lib/DRAMSys/build/lib
questa_args += +DRAMSYS_RES=$(dramsys_resouces_path)
questa_args += -sv_lib $(dramsys_lib_path)/libsystemc -sv_lib $(dramsys_lib_path)/libDRAMSys_Simulator

# Check if the specified QuestaSim version exists
ifeq (, $(shell which $(questa_cmd)))
# Spaces are needed for indentation here!
Expand Down Expand Up @@ -131,6 +137,9 @@ vlog_defs += -DXQUEUE_SIZE=$(xqueue_size)
# TeraPool configurations
vlog_defs += -DNUM_SUB_GROUPS_PER_GROUP=$(num_sub_groups_per_group)
vlog_defs += -DREMOTE_GROUP_LATENCY_CYCLES=$(remote_group_latency_cycles)
# DRAMsys co-simulation
vlog_defs += -D${l2_sim_type}
vlog_defs += -DDRAM_AXI_WIDTH_INTERLEAVED=${dram_axi_width_interleaved}

# Traffic generation enabled
ifdef tg
Expand All @@ -154,9 +163,6 @@ cpp_defs += -DL2_BASE=$(l2_base)
cpp_defs += -DL2_SIZE=$(l2_size)
cpp_defs += -DL2_BANKS=$(l2_banks)
cpp_defs += -DAXI_DATA_WIDTH=$(axi_data_width)
cpp_defs += -DL2_BASE=$(l2_base)
cpp_defs += -DL2_SIZE=$(l2_size)
cpp_defs += -DL2_BANKS=$(l2_banks)

.DEFAULT_GOAL := compile

Expand Down
1 change: 1 addition & 0 deletions hardware/deps/dram_rtl_sim
Submodule dram_rtl_sim added at 15caf3
15 changes: 15 additions & 0 deletions hardware/include/dram_config/HBM2E-3600.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
{
"simulation": {
"addressmapping": "am_hbm2e_16Gb_pc_brc.json",
"mcconfig": "mc_hbm2e_fr_fcfs_grp.json",
"memspec": "ms_hbm2e_16Gb_3600.json",
"simconfig": "simconfig_hbm2e.json",
"simulationid": "hbm2e",
"tracesetup": [
{
"clkMhz": 1800,
"name": "HBM2E.stl"
}
]
}
}
47 changes: 47 additions & 0 deletions hardware/include/dram_config/am_hbm2e_16Gb_pc_brc.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
{
"addressmapping": {
"BYTE_BIT": [
0,
1,
2
],
"COLUMN_BIT": [
3,
4,
8,
9,
10,
11,
12
],
"PSEUDOCHANNEL_BIT":[
5
],
"BANK_BIT": [
16,
17
],
"BANKGROUP_BIT":[
6,
7,
13
],
"ROW_BIT": [
14,
15,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30
]
}
}
20 changes: 20 additions & 0 deletions hardware/include/dram_config/mc_hbm2e_fr_fcfs_grp.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
{
"mcconfig": {
"PagePolicy": "Open",
"Scheduler": "FrFcfsGrp",
"SchedulerBuffer": "Bankwise",
"RequestBufferSize": 128,
"CmdMux": "Oldest",
"RespQueue": "Fifo",
"RefreshPolicy": "AllBank",
"RefreshMaxPostponed": 8,
"RefreshMaxPulledin": 8,
"PowerDownPolicy": "NoPowerDown",
"Arbiter": "Simple",
"PhyDelayFw": 8,
"PhyDelayBw": 9,
"ThinkDelayFw": 12,
"ThinkDelayBW": 12,
"MaxActiveTransactions": 128
}
}
48 changes: 48 additions & 0 deletions hardware/include/dram_config/ms_hbm2e_16Gb_3600.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{
"memspec": {
"memarchitecturespec": {
"burstLength": 4,
"dataRate": 2,
"nbrOfBankGroups": 8,
"nbrOfBanks": 32,
"nbrOfColumns": 128,
"nbrOfPseudoChannels": 2,
"nbrOfRows": 32768,
"width": 64,
"nbrOfDevices": 1,
"nbrOfChannels": 1
},
"memoryId": "Test MemPool-TeraPool with HBM2 upto 3600bps (16Gb, Single Channel)",
"memoryType": "HBM2",
"memtimingspec": {
"CCDL": 4,
"CCDS": 2,
"CKE": 8,
"DQSCK": 2,
"FAW": 9,
"PL": 2,
"RAS": 30,
"RC": 45,
"RCDRD": 16,
"RCDWR": 12,
"REFI": 3900,
"REFISB": 122,
"RFC": 260,
"RFCSB": 200,
"RL": 41,
"RP": 15,
"RRDL": 2.22,
"RRDS": 2.22,
"RREFD": 8,
"RTP": 4,
"RTW": 18,
"WL": 8,
"WR": 41,
"WTRL": 6,
"WTRS": 4,
"XP": 10,
"XS": 270,
"clkMhz": 1800
}
}
}
Loading

0 comments on commit 9dd79af

Please sign in to comment.