-
Notifications
You must be signed in to change notification settings - Fork 754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[sram_ctrl/rtl] Timing fixes #22497
[sram_ctrl/rtl] Timing fixes #22497
Conversation
79a4896
to
e365526
Compare
e365526
to
57b1767
Compare
Got the block level synth running I've identified another feedthrough via the opentitan/hw/ip/tlul/rtl/tlul_adapter_sram.sv Line 311 in 2feef77
I think it'll be shallower feedthrough than the ones I've eliminated but best we get rid of them all. I can get rid of it but it will be a little more work, in particular may involve some extra DV work. |
As this second path is expected to be shallower and involves more DV work, should we first try to finalize the fix for the first and see how this impacts the actual chip timing? |
In parallel I'm fixing up the CI issues for these current fixes so they should be good to go once they've been reviewed. Happy to trial how that impacts the chip timing alone but I would like to continue with this other fix. Long feedthroughs starting at Ibex going through the crossbar and then through SRAM and all the way back in Ibex generally aren't good. With them gone the implementation flow may find it has more placement freedom and wants to put in fewer buffers giving better QoR. In particular could give us some modest area savings that we'll appreciate having. |
This adds a new output to tlul_adapter_sram to signal when a read-modify-write operation is pending. This is used in sram_ctrl to fix a timing path that factored a tilelink write enable signal to a tilelink ready signal. Signed-off-by: Greg Chadwick <gac@lowrisc.org>
Previously the grant signal from the SRAM primitive was used directly to calculate a tilelink ready signal. As the grant factors in this request this produces a valid -> ready timing path. This causes timing issues and is not required. This commit refactors the logic to avoid using the SRAM grant to calculate the tilelink ready. Signed-off-by: Greg Chadwick <gac@lowrisc.org>
57b1767
to
ebd50e7
Compare
After a bit more investigation this new path may not be that much shallower than the previous ones anyway, so I think worth fixing it before doing a new implementation trial. |
(reg2hw.status.escalated.q) ? (tl_gate_resp_pending & tlul_we) : 1'b1; | ||
assign key_valid = | ||
(key_req_pending_q) ? 1'b0 : | ||
(reg2hw.status.escalated.q) ? (tl_gate_resp_pending & sram_rmw_in_progress) : 1'b1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need to take tlul_we
or sram_rmw_in_progress
into account for key_vaiid
in the escalated case? As far as I can see, in the escalated case key_valid
is only used to make prim_ram_1p_scr
grant requests -- can't we make it grant all requests, including reads?
The reason I'm asking is because the change would be much simpler, and it would also mean less code to maintain
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's a defence in depth? With this setup we've both got a TLUL gate and the clearing of key_valid
from stopping a transaction acting on the SRAM after escalation. If key_valid
is always true other than when key_req_pending_q
is set then after an escalation any transaction that reaches the tlul_sram_adapter
will get acted upon.
I note there's an explicit clone of the escalation signal, with one clone setting the escalated bit (which causes key_valid
is go low) and the other clone closing the TLUL gate:
opentitan/hw/ip/sram_ctrl/rtl/sram_ctrl.sv
Lines 164 to 172 in 9f4903e
lc_tx_t [1:0] escalate_en; | |
prim_lc_sync #( | |
.NumCopies (2) | |
) u_prim_lc_sync ( | |
.clk_i, | |
.rst_ni, | |
.lc_en_i (lc_escalate_en_i), | |
.lc_en_o (escalate_en) | |
); |
There's some discussion on these PRs: #11863 and #17072
There's also the argument we want to change behaviour as little as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation. I think defense in depth and changing behavior as little as possible are sufficient arguments for making the change this way
// a request from tlul_req will be granted regardless of whether a request exists. This is done | ||
// for timing reasons so that the output TL ready isn't combinatorially connected to the incoming | ||
// TL valid. In particular we must not use `sram_gnt` in the expression to avoid this. | ||
assign tlul_gnt = key_valid & ~init_req; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM based on the discussions in this PR. Thanks @GregAC!
Please don't merge this yet, AFAIK Greg found another path he is currently working on improving. |
Yes I've got two more fixes inbound, currently investigating some regression failures and then will update this PR with them. |
Since we are agreed on this, we could just merge this one for now and create a follow-up PR for the other improvements (tracked against #22462)? |
Regression issues I'm looking at may be related to these changes so I'd prefer to at least understand those before we merge this. |
Failures look to be related to my new work so merging this now |
I plan to put together a hacky block synthesis for sram_ctrl so I can check I've eliminated all of the feedthroughs that were causing issues. Will work on that tomorrow.
Contributes to resolving #22462.