Author: Shashank Obla (https://www.andrew.cmu.edu/user/sobla)
ReCONNECT is a highly-parametrizable, high-performance soft network-on-chip designed to be customizable to the needs of the application while being resource-minimal and tuned for modern FPGA architectures. Written directly in SystemVerilog (RTL), the NoC is specially optimized for high-frequency operations on modern FPGA architectures and operates at frequencies exceeding 500 MHz. Find more about it here: https://www.andrew.cmu.edu/user/sobla/projects/noc/
Simulations are configured and executed using the Makefile located in the test directory.
To run the default simulation, first install Verilator and run the following command:
Note
Requires Verilator 5.0 or later (tested with version 5.048).
cd test
make runTip
To point to a local installation of Verilator instead of the global one, override the VERILATOR path on the command line. You may also need to define the VERILATOR_ROOT environment variable pointing to the root of your local installation so that the compiler can locate Verilator's runtime headers and libraries:
export VERILATOR_ROOT=/path/to/local/verilator
make run VERILATOR=$VERILATOR_ROOT/bin/verilatorWe also provide a regression suite testing various NoC topologies and configurations:
make regressTo see the full list of options available, run the following command (Description of NoC parameters):
make help- Behavioral Simulation (Default): Uses fast, lightweight behavioral FIFO models (neither
QUARTUS_FIFO=1norVIVADO_FIFO=1is set). - (Altera) Quartus FPGA IP Simulation (
QUARTUS_FIFO=1): Simulates the design using Quartus IP blocks. - (AMD) Vivado FPGA IP Simulation (
VIVADO_FIFO=1): Simulates the design using Vivado IP blocks.
ModelSim supports behavioral simulation, Quartus and Vivado native FIFOs.
make modelsim [OPTIONS...]Important
(Altera) Quartus FPGA IPs: To run simulations with Quartus IPs, update your Quartus installation path on line 116 of msim_setup.tcl (Verified with Quartus version 23.2).
(AMD) Vivado IPs: To run simulations with Vivado FIFOs (using XPM FIFO components), you must source your Vivado settings script (e.g., source /path/to/Xilinx/Vivado/<version>/settings64.sh) before launching the simulation.
Run a 4 x 4 Mesh topology simulation in ModelSim (Behavioral):
make modelsim TOPOLOGY=mesh NUM_ROWS=4 NUM_COLS=4Run a clock-crossing simulation using Quartus/Intel FPGA IPs in ModelSim:
make modelsim QUARTUS_FIFO=1 CLKCROSS_FACTOR=2Run a clock-crossing simulation using Vivado/AMD FPGA IPs in ModelSim:
source /path/to/Xilinx/Vivado/2024.1/settings64.sh # Load Vivado settings first!
make modelsim VIVADO_FIFO=1 CLKCROSS_FACTOR=2Tip
It is recommended to use the native FPGA/IP blocks rather than behavioral models for timing closure and area utilization. To this end, make sure to set global defines in the project settings for QUARTUS_FIFO / VIVADO_FIFO if you intend to use the IP blocks.
Most are a subset of the Router Parameters but the mapping is provided below.
| Parameter | Description |
|---|---|
| DEST_WIDTH | See DEST_WIDTH option in Router Parameters |
| FLIT_WIDTH | See FLIT_WIDTH option in Router Parameters |
| FLIT_BUFFER_DEPTH | See FLIT_BUFFER_DEPTH option in Router Parameters |
| ROUTING_TABLE_PREFIX | Prefix of the location of hex files containing the routing tables |
| DISABLE_SELFLOOP | Disables path from input 0 to output 0 in the routers (Nodes connected to routers cannot send data to themselves) |
| ROUTER_PIPELINE_ROUTE_COMPUTE | See PIPELINE_ROUTE_COMPUTE option in Router Parameters |
| ROUTER_PIPELINE_ARBITER | See PIPELINE_ARBITER option in Router Parameters |
| ROUTER_PIPELINE_OUTPUT | See PIPELINE_OUTPUT option in Router Parameters |
| ROUTER_FORCE_MLAB | See FORCE_MLAB option in Router Parameters |
routing_tables/ contains scripts to generate routing tables based on X-Y dimension ordered routing for mesh, torus and directional torus and shortest-path routing for Double-Ring and Ring networks.
Note
For torus, ties occur when the node can be reached from either direction. Ties are broken by alternating for each node which side is chosen evening out the load on each link.
Usage:
- Router:
./gen_router_table.py <num_inputs> <num_outputs> <file_prefix> - Mesh:
./gen_mesh_table.py <num_rows> <num_cols> <file_prefix> - Double Ring:
./gen_double_ring_table.py <num_routers> <file_prefix> - Ring:
./gen_ring_table.py <num_routers> <file_prefix> - Directional Torus:
./gen_dtorus_table.py <num_rows> <num_cols> <file_prefix> - Torus:
./gen_torus_table.py <num_rows> <num_cols> <file_prefix> - Fully Connected:
./gen_fully_connected_table.py <num_routers> <concentration_factor> <file_prefix> - Fat Tree:
./gen_fattree_table.py <K> <N> <file_prefix>
Describes a Mesh NoC using the router interface for IO pairs.
| Parameter | Description |
|---|---|
| NUM_ROWS | Number of rows in the mesh |
| NUM_COLS | Number of columns in the mesh |
| PIPELINE_LINKS | Number of pipeline registers to add to the links between routers. Higher number delays credit resolution and a larger flit buffer might be required to prevent dead cycles |
| ROUTING_TABLE_PREFIX | Prefix of the location of hex files containing the routing tables. Tables follow the format prefix/i_j.hex for router at row i and column j |
| OPTIMIZE_FOR_ROUTING | Only available option being "XY", disables the appropriate turns in the router crossbars for XY Routing |
Describes a Torus NoC using the router interface for IO pairs.
| Parameter | Description |
|---|---|
| NUM_ROWS | Number of rows in the mesh |
| NUM_COLS | Number of columns in the mesh |
| PIPELINE_LINKS | Number of pipeline registers to add to the links between routers. Higher number delays credit resolution and a larger flit buffer might be required to prevent dead cycles |
| EXTRA_PIPELINE_LONG_LINKS | Number of extra pipeline registers to add to the links that wrap around (adds to PIPELINE_LINKS) |
| ROUTING_TABLE_PREFIX | Prefix of the location of hex files containing the routing tables. Tables follow the format prefix/i_j.hex for router at row i and column j |
| OPTIMIZE_FOR_ROUTING | Only available option being "XY", disables the appropriate turns in the router crossbars for XY Routing |
Describes a Directional Torus NoC using the router interface for IO pairs. Without loss of generality, links go W -> E and N -> S and wrap around at the edges.
| Parameter | Description |
|---|---|
| NUM_ROWS | Number of rows in the mesh |
| NUM_COLS | Number of columns in the mesh |
| PIPELINE_LINKS | Number of pipeline registers to add to the links between routers. Higher number delays credit resolution and a larger flit buffer might be required to prevent dead cycles |
| EXTRA_PIPELINE_LONG_LINKS | Number of extra pipeline registers to add to the links that wrap around (adds to PIPELINE_LINKS) |
| ROUTING_TABLE_PREFIX | Prefix of the location of hex files containing the routing tables. Tables follow the format prefix/i_j.hex for router at row i and column j |
| OPTIMIZE_FOR_ROUTING | Only available option being "XY", disables the appropriate turns in the router crossbars for XY Routing |
Describes a (double-)ring NoC using the router interface for IO pairs.
| Parameter | Description |
|---|---|
| NUM_ROUTERS | Number of routers in the ring |
| ROUTING_TABLE_PREFIX | Prefix of the location of hex files containing the routing tables. Tables follow the format prefix/i.hex for router i |
Describes a fully connected NoC where every router has a direct point-to-point link to every other router. Each router supports a configurable number of local endpoints (concentration factor). Uses the router interface for IO pairs.
| Parameter | Description |
|---|---|
| NUM_ROUTERS | Number of routers in the network. Total endpoints = NUM_ROUTERS × CONCENTRATION_FACTOR |
| CONCENTRATION_FACTOR | Number of local endpoints per router |
| PIPELINE_LINKS | Number of pipeline registers to add to the inter-router links. Higher number delays credit resolution and a larger flit buffer might be required to prevent dead cycles |
| ROUTING_TABLE_PREFIX | Prefix of the location of hex files containing the routing tables. Tables follow the format prefix/i.hex for router i |
| OPTIMIZE_FOR_ROUTING | Set to "MINIMAL" to disable network-to-network turns in the router crossbars, enforcing minimal (single-hop) routing (In a fully connected topology, packets should never transit through an intermediate router) |
Describes a
| Parameter | Description |
|---|---|
| K | The parameter |
| N | The parameter |
| ROUTING_TABLE_PREFIX | Prefix of the location of hex files containing the routing tables. Tables follow the format prefix/level_router.hex for router at stage level and coordinate router
|
| OPTIMIZE_FOR_ROUTING | Set to "MINIMAL" to disable upward-to-upward turns in the intermediate switches |
Describes a parametrizable router featuring input-independent output-based routing table for deterministic routing, virtual links enabled (ensures packets do not get interrupted), and full crossbar support. Uses wormhole routing and credit-based flow control.
| Signal | I/O | Description |
|---|---|---|
| clk | I | Clock |
| rst_n | I | Synchronous active-low reset |
| data_in | I | Flit Data |
| dest_in | I | Destination |
| is_tail_in | I | If the flit is the tail flit |
| send_in | I | Push (Router must accept - credit-based flow control) |
| credit_out | I | Send credits |
| data_out | O | Flit Data |
| dest_out | O | Destination |
| is_tail_out | O | If the flit is the tail flit |
| send_out | O | Push (Output must accept - credit-based flow control) |
| credit_in | I | Receive credits |
| DISABLE_TURNS | I | Default set to 0, used to disable turns in the router crossbar |
| Parameter | Description |
|---|---|
| NOC_NUM_ENDPOINTS | Number of endpoints in the NoC the router is a part of |
| ROUTING_TABLE_HEX | Location hex file containing the routing table loaded in using readmemh |
| NUM_INPUTS | Number of inputs of the router |
| NUM_OUTPUTS | Number of outputs of the router |
| DEST_WIDTH | Width of the destination input (can be larger than actual destination, only the lowest bits are used for destination decoding) |
| FLIT_WIDTH | Data width of the router |
| FLIT_BUFFER_DEPTH | Input flit buffer depth |
| PIPELINE_ROUTE_COMPUTE | Splits route computation into a separate pipeline stage |
| PIPELINE_ARBITER | Splits the abitration and switch traversal into separate pipeline stages |
| PIPELINE_OUTPUT | Adds an extra pipeline stage at the output of the router |
| FORCE_MLAB | Forces the flit buffers to use MLABs (LUTRAM) instead of M20Ks (BRAM) (Advanced) |
This repo also provides an AXI-Stream wrapper that implements data width conversion with clock crossing along with providing a simple AXI-Stream interface to the NoC. It contains shims to convert the credit-based NoC ports into AXI-Stream through a Dual-Clock Asynchronous FIFO.
NoC Topology specific parameters are same as in the NoC section and not repeated here.
| Signal | I/O | Description |
|---|---|---|
| clk_noc | I | NoC clock domain |
| clk_usr | I | User clock domain |
| rst_n | I | Asynchronous active-low reset |
| axis_in_tvalid | I | Signals that the input is driving a valid transfer. A transfer takes place with both tvalid and tready are asserted |
| axis_in_tready | O | Signals that the NoC is ready to accept data |
| axis_in_tdata | I | Primary payload containing a flit |
| axis_in_tlast | I | Indicated the boundary of a packet |
| axis_in_tid | I | Data stream identifier |
| axis_in_tdest | I | Routing information for the data stream |
| axis_out_tvalid | O | Signals that the NoC is driving a valid transfer. A transfer takes place with both tvalid and tready are asserted |
| axis_out_tready | I | Signals that the user is ready to accept data |
| axis_out_tdata | O | Primary payload containing a flit |
| axis_out_tlast | O | Indicated the boundary of a packet |
| axis_out_tid | O | Data stream identifier |
| axis_out_tdest | O | Routing information for the data stream |
| Parameter | Description |
|---|---|
| RESET_SYNC_EXTEND_CYCLES | Specifies the number of cycles to extend the synchronized reset for (may be beneficial to debounce or depending on how the reset is generated) |
| RESET_NUM_OUTPUT_REGISTERS | Specifies the number of output registers to help timing (NoC is not immediately ready after reset release) |
| TID_WIDTH | Width of AXI-Stream tid signal |
| TDEST_WIDTH | Width of AXI-Stream tdest signal |
| TDATA_WIDTH | Width of AXI-Stream tdata signal |
| SERIALIZATION_FACTOR | Factor to serialize in the user clock domain (doesn't use memory bits) |
| CLKCROSS_FACTOR | Factor to serialize while crossing from the user to the NoC clock domain (uses mixed-width DC FIFO) |
| SINGLE_CLOCK | (0 / 1) Specfies whether the NoC and user clock are the same (uses single-clock FIFO instead of dual-clock FIFO) |
| SERDES_IN_BUFFER_DEPTH | Serializer buffer depth (in units of TDATA_WIDTH words) |
| SERDES_OUT_BUFFER_DEPTH | Deserializer buffer depth (in units of TDATA_WIDTH words) |
| SERDES_EXTRA_SYNC_STAGES | Asynchronous FIFO extra metastability synchronization stages (-2 disables synchronization and may be used for synchronized clocks) |
| SERDES_FORCE_MLAB | Forces the buffers in the serdes modules to use MLABs (LUTRAM) instead of M20Ks (BRAM) if possible (mixed-width dual-clock FIFO does not support this) |
| FLIT_BUFFER_DEPTH | See NoC parameters |
| ROUTING_TABLE_PREFIX | See NoC parameters |
| DISABLE_SELFLOOP | See NoC parameters |
| ROUTER_PIPELINE_OUTPUT | See NoC parameters |
| ROUTER_FORCE_MLAB | See NoC parameters |
Contain modules axis_serializer_shim_in and axis_deserializer_shim_out which form the input and output shims respectively. Parameters are forwarded and can be seen in the top-level AXI-S NoC parameters