[otbn] Prefetch stage proposal #8898
To implement blanking for the register file in OTBN it is necessary to supply the register file inputs directly from flops. Otherwise glitches from the decode logic may produce the power signatures the blanking is aiming to prevent.
To achieve this in OTBN's single stage design an extra pre-fetch stage needs to be added. This will fetch the next instruction a cycle earlier to allow some pre-decode logic to determine the register file inputs for instruction so they can flopped for the instruction execution.
To minimize disruption/change with the existing RTL this can be implemented within the
A new set of flops will be added to
The next address to prefetch will be decided as follows:
Minimal change will be required outside of
New coverpoints will be required for the cases where invalid instructions are seen (only after a branch). It is likely the existing RIG will hit these. Minor ISS changes will be required to deal with the new instruction latencies
BN.SID and BN.MOVR will have to become two cycle instructions as the output of the base register file cannot be fed directly into the bignum register file. BN.LID doesn't need an extra cycle as it already takes two cycles with the bignum register file write ocurring in the second cycle.
BN.MOVR is heavily used in our current RSA encode and decode (26% of instructions in rsa_1024_enc_test and 9% of instructions in rsa_1024_dec_test are BN.MOVR). BN.SID and jump instructions are also present but far less frequent. With the proposal above we are in danger of missing our performance targets for RSA without either reduction of BN.MOVR usage in the software or improving BN.MOVR performance with the prefetch stage.
One possibility is to only blank on the bignum register file and perform the base register file read in the prefetch stage. This would mean no extra cycle for BN.MOVR or BN.SID at the cost of more complexity (in particular a forwarding path or extra stalling will be required to deal with the new data hazards). For verification extra coverpoints would be needed to check for the various hazarding cases. RIG changes may be needed to hit the coverpoints but the ISS shouldn't be effected.
p256 & p384 sign & verify make minimal use of the impacted instructions so performance reduction will be insignificant.
The text was updated successfully, but these errors were encountered:
The stats output from the ISS gives frequency of various function calls. I've taken a look at that to see which most heavily used functions use
There is a limited set of what we'd probably use for number of limbs, 2, 4, 8, 12 corresponding to 512,1024,2048,3072 so we could just do unrolled versions of
It looks like
I think I can hack up
I was wrong about the
After a little hacking on
I tried replacing the inner loop in
I would say it's a shame the concept of a
Following discussion in the security meeting we will be going ahead with implementing this.
We'll implement the initial proposal without any extras to improve