[otbn,simd] Add RTL of SIMD instructions implemented in BN ALU by etterli · Pull Request #29344 · lowRISC/opentitan

etterli · 2026-02-20T09:43:24Z

This PR adds the first part of the SIMD instructions' RTL implementation. It adds the RTL for all instructions implemented in the Bignum ALU. See #29231 for the instruction definition / description.

Note that many regression tests still fail as not yet all new instructions are implemented in RTL.

andrea-caforio

This is cool @etterli. I focused on the first commit because that's where
the math is. ;-)

andrea-caforio · 2026-02-20T12:08:45Z

+ * X0 = X[31:0], X1 = X[63:32], ..., X7 = X[255:224], same for Y
+ * Di = Decision by carry bits CXi and CYi
+ *
+ * D7                           D7   D6                             D7   D0


Correct me if I'm wrong but is the first stage (decision) of this diagram
part of this module because the carry bits are generated externally?

Also is D7 D0 correct as the inputs to the first decider stage?

The diagram does only show the selection stage. The decision stage is not depicted. And yes, this module is closely related to the actual adders in the bignum alu. I factored it out to hide some complexity (it is not much to be honest). The decision bits are based upon the actual carry bits from the two adders, so yes, externally.

D7 and D0 are the correct inputs to the selection MUX for the lowest 32 bits. Because depending on the ELEN (either 32 bit or 256 bit), this chunk must use the decision for chunk 0 (D0) in case ELEN = 32 or the selection must be based upon D7 which is the decision if we are operating on 256 bits. In the 256 bit case, the MSB carry decides for all chunks which result to take.

andrea-caforio · 2026-02-20T12:12:19Z

+ * The otbn_alu_bignum calculates pseudo modulo addition and subtraction by using two adders and
+ * evaluating their carry bits. Depending on the carry bits adder X or Y is selected as result.
+ *
+ * For addition, subtract mod if a + b >= mod:


Isn't this module in a way independent of the modulus? Because it simply
multiplexes some vector elements. So I'm not sure why the modulus
is mentioned here?

I see your point. However, the whole selection logic makes only sense if it is put in context of the two adders and what they compute. I don't think this module makes any sense in any other standalone use.

Would it help if the header would introduce this context?

andrea-caforio · 2026-02-20T12:14:44Z

+ * - Adder X calculates X = a + b
+ * - Adder Y calculates Y = X - mod
+ *
+ * - If X generates a carry:


I know what is meant here but it still slightly confusing to use the term "carry" here.
It is a decision bit that indicates whether a value is in the interval [0, mod-1] or
[mod, 2*mod-1].

This ties in with my comment on mentioning the modulus here even though
the module is independent of it.

In the current naming, the decision bit is the bit carrying the information what this evaluation resulted in (0 to take result X, 1 to take result Y). The signal which is referred to here is the actual carry bit of the adder X (which is computed externally)..

andrea-caforio · 2026-02-20T12:17:24Z

+ *
+ * For subtraction, this stage generates an additional signal whether any vector element uses the
+ * result of adder Y. This signal is used for MOD integrity checks and blanking assertions. For
+ * addition this signal is always set as the carries of Y are used for the decisions.


I don't understand this. So this additional signal is only used in the subtraction case for
some security checks? Why not unconditionally set it to 1 like for addition?

I followed the behaviour of the current OTBN. I do not know the design rationale for making this check dependent on the result. Let's discuss this offline.

andrea-caforio · 2026-02-20T12:19:05Z

+
+  // Vector element length type for bignum vec ISA implemented in BN ALU for
+  // bn.addv(m), bn.subv(m) and bn.shv.
+  // The ISA forsees only 4 types (16 to 128 bits). However, only a subset is implemented.


With "4 types (16 to 128 bits)" you mean 16, 32, 64 and 128?

Yes. But this line was updated in the last force push.

andrea-caforio · 2026-02-20T12:25:47Z

+) (
+  input  logic [LVLEN-1:0]     operand_a_i,
+  input  logic [LVLEN-1:0]     operand_b_i,
+  input  logic                 operand_b_invert_i,


This is the indicator bit for performing a subtraction, why not call it like that?

Because executing a subtraction also requires to set the carry in accordingly (to 1, such that the two's complement is correctly computed). This signal only controls whether the operand B should be inverted or not. Could be useful if we want to use a one's complement (but I don't think so).

andrea-caforio · 2026-02-20T12:26:28Z

+ *
+ * This carry chaining allows to compute additions over multiples of LVChunkLEN wide elements
+ * including the full vector width (i.e., a non vectorized addition). To perform subtraction the
+ * input B can be inverted and all carries must be set to 1 as: a - b = a + ~b + 1.


This sounds like the caller has to invert B in the subtraction case but it is handled in this
module?

Would something like this be more clear:

A subtraction can be performed by setting the operand_b_invert_i signal and the input carries to one because: a - b = a + ~b + 1.

Rephrased it

andrea-caforio · 2026-02-20T12:29:39Z

+/**
+ * OTBN vectorized shifter
+ *
+ * This shifter is capable of shifting vectors elementwise as well as concatenate and shift 256


Maybe you can mention somewhere that these are logical shifts as opposed
to arithmetic ones, which are only supported for the GPR registers.

Mentioned it

andrea-caforio · 2026-02-20T12:32:51Z

+ * This module transposes the elements of two input vectors in two different ways.
+ * It supports 32b, 64b and 128b element lengths.
+ *
+ * If there are two vectors with 4 elements the transpositions are as follows:


Here you should mention that trn1 interleaves even coordinates and trn2 odd ones otherwise
the word transposition is a bit misleading.

Thanks, this is indeed more clear. I updated it.

vogelpi

Thanks @etterli for your PR, I've reviewed the first commit and will continue later. It's great to see that you thoroughly implemented the feedback from our previous discussions :-)

vogelpi

@etterli , I've now also reviewed the rest. This is fantastic work, well done!

I have mostly nits, a few questions and maybe one or two comments requiring actual work. But this looks really good!

vogelpi · 2026-02-24T21:42:43Z

-              alu_bignum_adder_y_op_a_en       = 1'b1;
-              alu_bignum_adder_y_op_shifter_en = 1'b1;
-              flags_adder_update[flag_group]   = 1'b1;
+              rf_ren_a_bignum                           = 1'b1;


I haven't reviewed in depth whether for every instruction you enable only those parts of the ALU which are really needed. How confident are you that you only enable what is really needed?

I am pretty confident. I just checked this again.

Okay thank you! Also with the reworked output mux, we would now get errors if two data paths were active at the same time.

When reworking the mux I also added an assertion which checks that only one path is active at the same time.

Very nice, thanks!

vogelpi · 2026-02-24T21:47:51Z

+ *    \-----------------------------------------------------------------/
+ *     \---------------------------------------------------------------/
+ *                                     |
+ *                             operation_result_o


Right now, operation_result_o is implemented using a unique case. But if you look at this figure, you can see that at least of right most inputs, only ever one input can be non-zero. So at least this part of the mux could be implemented using a OR tree, ran than a real MUX. This can lead to a notable area reduction, because the MUX is wide.

However, I am not sure if synthesis is smart enough to figure that out given all the pre-decoding and blanking. You may want to add a blanker again to the "Y result" and the "MOD result" inputs (the latter may not be needed) just for this purpose. And then take the operation result mux out of the unique case and manually implement it using a bitwise OR tree.

Yes, this is a smart optimization.

What do you think about ORing the 3 non-arithmetic results and then feeding this combined signal into a 3-to-1 MUX? This way we can save two blankers and their control overhead but still reduce the MUX width. But probably two blankers and one wide OR is more efficient.

The 3-to-1 MUX requires around 1.9kGE whereas with blankers we are at 2.1kGE (excluding the FF and other logic). Both are better than a full MUX which is around 2.4kGE. Estimated using nangate45 values and 2 input gates only. Maybe there are more efficient multi-input gates but it seems to be around the same.

Implemented to 3to1 option for now.

We discussed this offline. By slightly modifying the mod_result_selector module, we can convert one of the 256-bit multiplexer inputs in the final multiplexer into an 8-bit multiplexer which can even be predecoded. So we'll manage to save one input in the final multiplexer (the one from Adder Y, the result of Adder Y will then always go through the mod_result_selector) and the final multiplexer can be implemented with an OR tree, which will simplify timing optimization during synthesis (the critical path goes through the shifter and followed by Adder Y).

etterli

Thanks @vogelpi for the review. I addressed the points.

andreaskurth

Nice work, @etterli! Please find a couple of suggestions and questions below. Overall, I don't see any blocking problems, though. 👍

andreaskurth · 2026-02-25T12:52:29Z

+  // The vector chunk length. This defines the width of the internal adders.
+  parameter int LVChunkLEN = VChunkLEN,
+  // The number of vector chunks, i.e., the number of adders.
+  localparam int LNVecProc = LVLEN / LVChunkLEN


Suggest adding an ASSERT_INIT to ensure that this divides without remainder

Good idea. Added it

andreaskurth · 2026-02-25T13:27:25Z

+    assign op_b = operand_b_invert_i ? ~operand_b_i[i_adder * LVChunkLEN+:LVChunkLEN]
+                                     :  operand_b_i[i_adder * LVChunkLEN+:LVChunkLEN];
+
+    // Do the addition and update carry flag


Suggest expanding this to:

// Compute op_a + op_b + carry_in using a two-operand addition. // By appending 1'b1 and carry_in as the LSBs, the addition of the // LSB position (1 + carry_in) generates a carry into the upper bits // exactly when carry_in is set, so result[LVChunkLEN:1] = op_a + op_b + carry_in // and result[LVChunkLEN+1] is the carry out. The LSB of result is unused.

Thanks, added this.

andreaskurth · 2026-02-25T14:22:48Z

+      AluElen32: begin
+        alu_adder_carry_sel_bignum = 1'b1;
+        alu_shift_mask_bignum      = (32'd1 << (32 - alu_shift_amt_bignum[4:0])) - 32'd1;
+      end
+      AluElen256: begin
+        alu_adder_carry_sel_bignum = 1'b0;
+        alu_shift_mask_bignum      = {32{1'b1}};
+      end
+      default: begin // same as 256b
+        alu_adder_carry_sel_bignum = 1'b0;
+        alu_shift_mask_bignum      = {32{1'b1}};
+      end


This code feels a bit brittle as it will break when VChunkLEN != 32, right? I think using that parameter and $clog2(VChunkLEN)-1 instead of the hard-coded indices/sizes could make the code work with other values.

AluElen32 has the 32 even in the enum name - potentially worth renaming to AluElenVChunkLEN (and AluElen256 could become AluElenWLEN)?

Hmm, I tried to generalize as much as possible but there are still a few places where stuff will break if VChunkLEN is changed (generalizing everything is far from trivial / sometimes not even possible). So I don't think it is worth to generalize this. It will just make it even more complex to read. Also, the only reason why VChunkLEN should change is when 16-bit elements should be implemented. Then this part must anyway be touched.

I would like to refrain from renaming it because in some places, e.g., in the transposer, it is assumed that this type represents the 32 bit case.

Also, generating the carry control signal based upon the parameter would require us to define all NVecChunk bits because these are also predecoded. Right now, we only need 1 bit instead of 8 bits.

andreaskurth · 2026-02-25T14:41:30Z

+      AluOpBignumTrn1: begin
+        expected_trn_en      = 1'b1;
+        expected_trn_is_trn1 = 1'b1;
+      end
+      AluOpBignumTrn2: begin
+        expected_trn_en      = 1'b1;
+        expected_trn_is_trn1 = 1'b0;
+      end


This could be simplified:

AluOpBignumTrn1, AluOpBignumTrn2: begin expected_trn_en = 1'b1; expected_trn_is_trn1 = operation_i.op == AluOpBignumTrn1; end

andreaskurth · 2026-02-25T14:47:42Z

+        expected_shift_en                       = 1'b1;
+        expected_shift_right                    = operation_i.shift_right;
+      end
+      AluOpBignumSubv: begin


This could also be implemented in the same case as AluOpBignumSub. Differences are in:

expected_adder_y_carries_top

adder_update_flags_en_raw

expected_shift_right

Here the benefit I see is less in the reduced amount of code and more in bundling together what belongs together and make the differences between scalar and vector explicit.

andreaskurth · 2026-02-25T14:48:41Z

+        expected_x_res_operand_a_sel = 1'b1;
+        expected_shift_mod_sel       = 1'b0;
+      end
+      AluOpBignumAddvm: begin


Also this could be implemented in the same case as AluOpBignumAddm. Only expected_adder_y_carries_top needs to be differentiated.

Merged these cases together.

andreaskurth · 2026-02-25T14:49:31Z

+        expected_shift_en                       = 1'b1;
+        expected_shift_right                    = operation_i.shift_right;
+      end
+      AluOpBignumAddv: begin


Similar to the suggestion for Sub, and same differences.

andreaskurth · 2026-02-25T14:54:20Z

+    adder_update_flags_en_raw    = 1'b0;
+    logic_update_flags_en_raw    = 1'b0;


Is it worth adding an assertion checking that vector operations don't update adder or logic flags? That's an invariant in the current architecture, and if the code is ever changed to violate that invariant, that can result in bugs that are pretty hard to root-cause. Such an assertion would catch this explicitly.

The assertion may be as simple as

ASSERT(VecOpsNoFlagUpdate_A, is_vec_op |-> !adder_update_flags_en_raw && !logic_update_flags_en_raw)

(where the is_vec_op helper signal has to be defined - e.g., with an inside construct.)

Such an assertion is definitively meaningful but I think the simulator will catch this. And that someone changes by accident both, the RTL and the simulator, seems pretty unlikely. What do you think?

I would add this SVA as it may save time when debugging. Because it's then obvious that this is not "just" a mismatch between model and RTL, but something which is intentionally not meant to happen.

Done. It also includes bn.rshi, bn.addm, and bn.subm which are the only other BN ALU operations which do not update flags.

andreaskurth · 2026-02-25T14:57:25Z

+waive -rules {CLOCK_USE RESET_USE} -location {otbn_alu_bignum.sv} \
+      -regexp {'(clk_i|rst_ni)' is connected to '(otbn_vec_transposer|otbn_vec_shifter)' port} \
+      -comment {The module is fully combinatorial, clk/rst are only used for assertions.}
+


This change should be fixed-up into the commit that makes the waiver necessary, I think.

etterli

Thanks @andreaskurth for the review. I answered to your comments and will push the updated design tomorrow. I first want to test all the changes.

@vogelpi, sorry some comments were not shown on the github page. I now also address them. Hope I haven't missed any.

etterli · 2026-02-25T16:10:17Z

+      AluElen32: begin
+        alu_adder_carry_sel_bignum = 1'b1;
+        alu_shift_mask_bignum      = (32'd1 << (32 - alu_shift_amt_bignum[4:0])) - 32'd1;
+      end
+      AluElen256: begin
+        alu_adder_carry_sel_bignum = 1'b0;
+        alu_shift_mask_bignum      = {32{1'b1}};
+      end
+      default: begin // same as 256b
+        alu_adder_carry_sel_bignum = 1'b0;
+        alu_shift_mask_bignum      = {32{1'b1}};
+      end


Hmm, I tried to generalize as much as possible but there are still a few places where stuff will break if VChunkLEN is changed (generalizing everything is far from trivial / sometimes not even possible). So I don't think it is worth to generalize this. It will just make it even more complex to read. Also, the only reason why VChunkLEN should change is when 16-bit elements should be implemented. Then this part must anyway be touched.

I would like to refrain from renaming it because in some places, e.g., in the transposer, it is assumed that this type represents the 32 bit case.

etterli · 2026-02-25T16:23:28Z

+      AluElen32: begin
+        alu_adder_carry_sel_bignum = 1'b1;
+        alu_shift_mask_bignum      = (32'd1 << (32 - alu_shift_amt_bignum[4:0])) - 32'd1;
+      end
+      AluElen256: begin
+        alu_adder_carry_sel_bignum = 1'b0;
+        alu_shift_mask_bignum      = {32{1'b1}};
+      end
+      default: begin // same as 256b
+        alu_adder_carry_sel_bignum = 1'b0;
+        alu_shift_mask_bignum      = {32{1'b1}};
+      end


Also, generating the carry control signal based upon the parameter would require us to define all NVecChunk bits because these are also predecoded. Right now, we only need 1 bit instead of 8 bits.

etterli · 2026-02-25T16:35:35Z

-              alu_bignum_adder_y_op_a_en       = 1'b1;
-              alu_bignum_adder_y_op_shifter_en = 1'b1;
-              flags_adder_update[flag_group]   = 1'b1;
+              rf_ren_a_bignum                           = 1'b1;


I am pretty confident. I just checked this again.

etterli · 2026-02-25T17:03:28Z

        expected_shift_mod_sel       = 1'b0;
+        expected_mod_is_subtraction  = 1'b1;
+      end
+      AluOpBignumSubvm: begin


etterli · 2026-02-25T17:04:41Z

+      AluOpBignumTrn1: begin
+        expected_trn_en      = 1'b1;
+        expected_trn_is_trn1 = 1'b1;
+      end
+      AluOpBignumTrn2: begin
+        expected_trn_en      = 1'b1;
+        expected_trn_is_trn1 = 1'b0;
+      end


etterli · 2026-02-25T17:08:58Z

+        expected_shift_en                       = 1'b1;
+        expected_shift_right                    = operation_i.shift_right;
+      end
+      AluOpBignumSubv: begin


etterli · 2026-02-25T17:12:26Z

+        expected_shift_en                       = 1'b1;
+        expected_shift_right                    = operation_i.shift_right;
+      end
+      AluOpBignumAddv: begin


etterli · 2026-02-25T17:27:56Z

+ *    \-----------------------------------------------------------------/
+ *     \---------------------------------------------------------------/
+ *                                     |
+ *                             operation_result_o


Yes, this is a smart optimization.

What do you think about ORing the 3 non-arithmetic results and then feeding this combined signal into a 3-to-1 MUX? This way we can save two blankers and their control overhead but still reduce the MUX width. But probably two blankers and one wide OR is more efficient.

vogelpi

Thanks a lot for implementing the feedback, @etterli . There is one more point regarding the final multiplexer. We can then merge the PR.

vogelpi · 2026-02-26T09:28:01Z

-              alu_bignum_adder_y_op_a_en       = 1'b1;
-              alu_bignum_adder_y_op_shifter_en = 1'b1;
-              flags_adder_update[flag_group]   = 1'b1;
+              rf_ren_a_bignum                           = 1'b1;


Okay thank you! Also with the reworked output mux, we would now get errors if two data paths were active at the same time.

vogelpi · 2026-02-26T09:36:22Z

+    adder_update_flags_en_raw    = 1'b0;
+    logic_update_flags_en_raw    = 1'b0;


I would add this SVA as it may save time when debugging. Because it's then obvious that this is not "just" a mismatch between model and RTL, but something which is intentionally not meant to happen.

vogelpi · 2026-02-26T10:35:29Z

+ *    \-----------------------------------------------------------------/
+ *     \---------------------------------------------------------------/
+ *                                     |
+ *                             operation_result_o


We discussed this offline. By slightly modifying the mod_result_selector module, we can convert one of the 256-bit multiplexer inputs in the final multiplexer into an 8-bit multiplexer which can even be predecoded. So we'll manage to save one input in the final multiplexer (the one from Adder Y, the result of Adder Y will then always go through the mod_result_selector) and the final multiplexer can be implemented with an OR tree, which will simplify timing optimization during synthesis (the critical path goes through the shifter and followed by Adder Y).

vogelpi · 2026-02-26T10:39:13Z

CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_alu_bignum.sv
CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_controller.sv
CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_decoder.sv
CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_mod_result_selector.sv
CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_pkg.sv
CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_predecode.sv
CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_vec_adder.sv
CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_vec_shifter.sv
CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_vec_transposer.sv

This PR adds SIMD support as proposed in an approved RFC.

etterli · 2026-02-26T13:34:47Z

@vogelpi @andreaskurth I have now reworked the result mux and added the assertion. I also rebased it on master. If you want to review only the actual changes, see the force push from 2:28PM GMT+1, the next one is the rebase.

Please have a look again.

vogelpi

Thanks @etterli , I have one more question, but this is great work!

This adds a vectorized adder, a modulo result selector, a vectorized shifter and a vector transposer module. These modules are the building blocks to construct the vectorized BN ALU. Signed-off-by: Pascal Etterli <pascal.etterli@lowrisc.org>

Add the vectorized instructions implemented in the BN ALU to the OTBN. Signed-off-by: Pascal Etterli <pascal.etterli@lowrisc.org>

nasahlpa · 2026-02-26T14:34:51Z

CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_alu_bignum.sv
CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_controller.sv
CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_decoder.sv
CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_mod_result_selector.sv
CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_pkg.sv
CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_predecode.sv
CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_vec_adder.sv
CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_vec_shifter.sv
CHANGE AUTHORIZED: hw/ip/otbn/rtl/otbn_vec_transposer.sv

This PR adds SIMD support as proposed in an approved RFC.

etterli force-pushed the otbn-simd-rtl-bnalu branch from 93e5bdf to ca2aa31 Compare February 20, 2026 09:56

etterli requested review from andrea-caforio, andreaskurth, h-filali, nasahlpa, rswarbrick and vogelpi February 20, 2026 10:25

andrea-caforio reviewed Feb 20, 2026

View reviewed changes

etterli force-pushed the otbn-simd-rtl-bnalu branch from ca2aa31 to c64a89a Compare February 20, 2026 13:55

etterli added the CI:Rerun Rerun failed CI jobs label Feb 21, 2026

github-actions bot removed the CI:Rerun Rerun failed CI jobs label Feb 21, 2026

etterli force-pushed the otbn-simd-rtl-bnalu branch 3 times, most recently from 38f6dd0 to e26ec29 Compare February 21, 2026 13:17

nasahlpa reviewed Feb 23, 2026

View reviewed changes

Comment thread hw/ip/otbn/rtl/otbn_predecode.sv

Comment thread hw/ip/otbn/rtl/otbn_decoder.sv

Comment thread hw/ip/otbn/rtl/otbn_alu_bignum.sv

nasahlpa reviewed Feb 23, 2026

View reviewed changes

Comment thread hw/ip/otbn/rtl/otbn_vec_adder.sv

Comment thread hw/ip/otbn/rtl/otbn_vec_adder.sv Outdated

Comment thread hw/ip/otbn/rtl/otbn_vec_transposer.sv

etterli force-pushed the otbn-simd-rtl-bnalu branch 2 times, most recently from c3a5c09 to d0d5835 Compare February 24, 2026 10:58

vogelpi reviewed Feb 24, 2026

View reviewed changes

etterli force-pushed the otbn-simd-rtl-bnalu branch from d0d5835 to b40f17c Compare February 25, 2026 11:49

etterli commented Feb 25, 2026

View reviewed changes

etterli added the CI:Rerun Rerun failed CI jobs label Feb 25, 2026

github-actions bot removed the CI:Rerun Rerun failed CI jobs label Feb 25, 2026

andreaskurth reviewed Feb 25, 2026

View reviewed changes

etterli commented Feb 25, 2026

View reviewed changes

etterli force-pushed the otbn-simd-rtl-bnalu branch 2 times, most recently from 62f815f to 0cca6c1 Compare February 26, 2026 08:57

vogelpi reviewed Feb 26, 2026

View reviewed changes

etterli force-pushed the otbn-simd-rtl-bnalu branch 2 times, most recently from 5d4fc5a to 4286ba8 Compare February 26, 2026 13:34

vogelpi approved these changes Feb 26, 2026

View reviewed changes

Comment thread hw/ip/otbn/rtl/otbn_alu_bignum.sv Outdated

Comment thread hw/ip/otbn/rtl/otbn_mod_result_selector.sv Outdated

etterli force-pushed the otbn-simd-rtl-bnalu branch from 4286ba8 to cb521ed Compare February 26, 2026 14:02

[otbn,rtl] Integrate vectorized BN ALU instructions

03859b2

Add the vectorized instructions implemented in the BN ALU to the OTBN. Signed-off-by: Pascal Etterli <pascal.etterli@lowrisc.org>

etterli force-pushed the otbn-simd-rtl-bnalu branch from cb521ed to 03859b2 Compare February 26, 2026 14:07

etterli added the CI:Rerun Rerun failed CI jobs label Feb 26, 2026

github-actions bot removed the CI:Rerun Rerun failed CI jobs label Feb 26, 2026

vogelpi added this pull request to the merge queue Feb 26, 2026

Merged via the queue into lowRISC:master with commit b883337 Feb 26, 2026
77 of 81 checks passed

vogelpi mentioned this pull request Apr 14, 2026

[RFC] PQC Support on OTBN #26846

Open

		adder_update_flags_en_raw = 1'b0;
		logic_update_flags_en_raw = 1'b0;

Conversation

etterli commented Feb 20, 2026

Uh oh!

andrea-caforio left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vogelpi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vogelpi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

etterli Feb 25, 2026 •

edited

Loading

etterli Feb 26, 2026 •

edited

Loading