-
Notifications
You must be signed in to change notification settings - Fork 745
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[otbn] Prefetch stage proposal #8898
Comments
@felixmiller @imphil I'd be particular interested to hear if that's any easy way to reduce the |
@GregAC: Do you have any data on which |
The stats output from the ISS gives frequency of various function calls. I've taken a look at that to see which most heavily used functions use
There is a limited set of what we'd probably use for number of limbs, 2, 4, 8, 12 corresponding to 512,1024,2048,3072 so we could just do unrolled versions of It looks like
I think I can hack up |
I was wrong about the After a little hacking on I tried replacing the inner loop in Without touching I would say it's a shame the concept of a |
Following discussion in the security meeting we will be going ahead with implementing this. We'll implement the initial proposal without any extras to improve |
To implement blanking for the register file in OTBN it is necessary to supply the register file inputs directly from flops. Otherwise glitches from the decode logic may produce the power signatures the blanking is aiming to prevent.
To achieve this in OTBN's single stage design an extra pre-fetch stage needs to be added. This will fetch the next instruction a cycle earlier to allow some pre-decode logic to determine the register file inputs for instruction so they can flopped for the instruction execution.
To minimize disruption/change with the existing RTL this can be implemented within the
otbn_instruction_fetch
module with only minor interface modifications:A new set of flops will be added to
otbn_instruction_fetch
holding the address of a prefetched instruction. The address of the incoming fetch requestinsn_fetch_req_addr
will be compared against the flopped prefetch address. If they match the incoming prefetched instruction data from imem will be provided as theinsn_fetch_resp_data
output the following cycle (having been flopped withinotbn_instruction_fetch
. This effectively matches the behaviour of the existing instruction memory interface other than it's possible for a request to get an invalid response (so the instruction data must be ignored and no instruction executed).The next address to prefetch will be decided as follows:
Design Impact
Minimal change will be required outside of
otbn_instruction_fetch
. Looking at the RTL OTBN should already behave appropriately when a fetched instruction returns an invalid response, this is just something that cannot occur in the current design. Area impact is insignificant and it should improve timing.Verification Impact
New coverpoints will be required for the cases where invalid instructions are seen (only after a branch). It is likely the existing RIG will hit these. Minor ISS changes will be required to deal with the new instruction latencies
Performance Impact
BN.SID and BN.MOVR will have to become two cycle instructions as the output of the base register file cannot be fed directly into the bignum register file. BN.LID doesn't need an extra cycle as it already takes two cycles with the bignum register file write ocurring in the second cycle.
BN.MOVR is heavily used in our current RSA encode and decode (26% of instructions in rsa_1024_enc_test and 9% of instructions in rsa_1024_dec_test are BN.MOVR). BN.SID and jump instructions are also present but far less frequent. With the proposal above we are in danger of missing our performance targets for RSA without either reduction of BN.MOVR usage in the software or improving BN.MOVR performance with the prefetch stage.
One possibility is to only blank on the bignum register file and perform the base register file read in the prefetch stage. This would mean no extra cycle for BN.MOVR or BN.SID at the cost of more complexity (in particular a forwarding path or extra stalling will be required to deal with the new data hazards). For verification extra coverpoints would be needed to check for the various hazarding cases. RIG changes may be needed to hit the coverpoints but the ISS shouldn't be effected.
p256 & p384 sign & verify make minimal use of the impacted instructions so performance reduction will be insignificant.
The text was updated successfully, but these errors were encountered: