dual port ram changes migration#1
Open
mjao1 wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Dual Port RAM
Using dual port RAM for certain FUs is beneficial since many operations in ternip_rms.sv and ternip_rowwise_operations.sv have back-to-back reads and write.
The single memory request path was replaced with a dual port memory backend, then ternip_core + certain FUs were wired so reads and writes can happen in parallel:
New memory
Vector register file uses dual port backend
- port A: request_* + read_*
- port B: request2_* + read2_*
Core wiring/arbitration
- existing FU request network still drives port A (vector_request_*)
- new port B network vector_request2_* is arbitrated between RMS and rowwise_operation
- read2_* is currently tied off (port B is used for extra request bandwidth, primarily writes)
FU logic updated to use port B where it matters most
Cycle count
Measured speedups (generic, Verilator)
Measured speedups (xc7a200t_D=1024_OneCore, Verilator)
Measured speedups (xc7a200t_D=1024_MaxCores, Verilator)
Measured speedups (xcu250_D=1024_OneCore, Verilator)
Measured speedups (xcu250_D=1024_MaxCores, Verilator)
Measured speedups (xcu250_D=2048_OneCore, Verilator)
Measured speedups (xcu250_D=2048_MaxCores, Verilator)
Measured speedups (xcu250_D=2560_OneCore, Verilator)
Measured speedups (xcu250_D=2560_MaxCores, Verilator)
(Excluding generic)
Average rms speedup: 1.165x
Average rowwise speedup: 1.259x
Per-FU phase speedups (xcu250_D=1024_OneCore, Verilator)
Timing
xcu250_D=1024_OneCore
xc7a200t_D=1024_OneCore
Utilization
xcu250_D=1024_OneCore
xc7a200t_D=1024_OneCore
Power
xcu250_D=1024_OneCore
xc7a200t_D=1024_OneCore
Routing
xcu250_D=1024_OneCore
xc7a200t_D=1024_OneCore
Note: Single port baselines were measured from ternary_matmul commit 8ddfcc6378f4aeb1daa2b684609b62b91bf13c8d