# Area-Throughput Efficient Implementations of CRAFT Cipher For Internet of Vehicles

Jiahao Xiang<sup>a,b</sup>, Lang Li<sup>a,b,\*</sup>

<sup>a</sup>College of Computer Science and Technology, Hengyang Normal University, Hengyang, 421002, , China <sup>b</sup>Hunan Provincial Key Laboratory of Intelligent Information Processing and Application, Hengyang Normal University, Hengyang, 421002, , China

#### **Abstract**

Purpose: With extraordinary growth in the Internet of Vehicles (IoV), the amount of data exchanged between IoV devices is growing at an unprecedented scale. Most of the IoV devices need instant response and real-time security to ensure the safety of users. The CRAFT cipher that is a lightweight block cipher for low-area can be used in IoV devices. In order to better adapt to these environment, the objective of this paper is to explore opportunities to optimize area and throughput for CRAFT cipher targeted for low-resource IoV devices. Methods: A novel compact CRAFT implementation is proposed in serialized fashion to achieve a small hardware footprint. We propose novel unrolled structure of CRAFT cipher for the high throughput feature. Results: The results on Artix-7 show that ... Conclusion: Hence, our works let CRAFT cipher more suitable for IoV devices.

Keywords: Lightweight block cipher, Internet of Vehicles, Field-programmable gate array(FPGA), Low-area, High-throughput

#### 1. Introduction

Internet of Vehicles (IoV) is an emerging concept in intelligent transportation systems (ITS) to enhance the existing capabilities of VANETs by integrating with the Internet of Things (IoT) [1]. As IoT technology continues to advance, IoV technology is also making great progress. But the same security issues that exist in IoT are also were introduced into IoV. At the some time, IoV involves a huge amount of dynamic real-time critical data so its security is a major concern.

Lightweight cryptography is a subfield of cryptography that aims to provide solutions tailored for resource-constrained devices [2]. It can provides security with low resource consumption and low delay in IoV environment.

In this work, we propose the three architectures of FPGA implementations for the CRAFT [3], respectively Round based, Serial, and Loop unrolled. This allows IoV practitioners to select the architectures that best suit their needs. The contributions of this article can be summarized as follows.

The rest of this article is organized as follows. Section 2 presents specification of CRAFT; the proposed the three architectures of FPGA implementations for the CRAFT are present in Section 3; Section 4 presents the implementation results, analysis, and comparison with other similar works; finally, the work is concluded in Section 5.

#### 2. Specification of CRAFT

All notations used in this paper are shown in Table 1. CRAFT is a lightweight tweakable block cipher made out of involutory building blocks. It consists of a 64-bit block, a 128-bit key,

\*Corresponding author

Email address: lilang911@126.com (Lang Li)

and a 64-bit tweak. In this cipher, a 64-bit input plaintext P is transformed into a 64-bit output ciphertext C using a 128-bit key K and a 64-bit tweak T. Figure 1 shows the structure of CRAFT.

## 3. Implementations

To achieve efficient area and throughput, we have, for the first time, optimized the components of CRAFT and proposed two implementation architectures: Serial and Unrolled.

## 3.1. Serial Architecture (A1)

Compared to round-based architecture, serial architecture are able to reuse components and significantly reduce area usage, e.g., the number of Sub-Box is reduced from 16 to 1. The clock gating technique is also used to enable each component and reduce the energy consumption of encryption. Our proposed architecture is presented in Figure 2. The design includes one Sub-Box, one 4-bit Mix-columns, two register banks for storing keys (called Key-Register) and plaintext (called State-Register), which also act as temporary registers for storing the intermediate results. In order to store intermediate results into State-Register bank, the design has one feedback paths. PermuteNibbles is included in State-Register bank. It is noticeable that since the execution of permute requires 64-bit, in order to reuse the State-Register block, we change the order of execution of Sub-Box and Permute. And the first round of encryption process through the control signal to avoid Permute operation, to ensure the correctness of the encryption algorithm.

## 3.1.1. Sub-Box Optimization

Sub-Box provides a confusing characteristic for the entire encryption algorithm, however it requires a large amount of

| Notation | Description                                 |  |
|----------|---------------------------------------------|--|
| TK       | 64-bit tweakeys                             |  |
| $RC_i$   | 64-bit round constant in the $i^{th}$ round |  |



Figure 1: Structure of CRAFT

area. There are different methods of implementation of Sub-Box. The most popular implementation is using a lookup table (LUT), such as [4]. However it uses a lot of flip-flop, which will bring a lot of area consumption. Using Sub-Box's equivalent logical expression for this will reduce area consumption, such as [5], [6].

Boolean satisfiability (SAT) solvers can be used to find Sub-Box that satisfy certain implement, such as being resistant to software or hardware implement. In more detail, the Sub-Box implement can be encoded as Boolean constraints by representing the Sub-Box as a truth table and then using Boolean variables to represent the input and output bits of the Sub-Box. The constraints can then be formulated based on the desired implement of the Sub-Box. Once the Sub-Box implement are encoded as Boolean constraints, a SAT solver can be used to find a satisfying assignment to these constraints, which corresponds to an Sub-Box that satisfies the desired implement.

The gate equivalent complexity(GEC) of a SAT instance is the number of logical gates required to implement the Boolean formula that represents the instance. GEC can be calculated by converting the Boolean formula into a circuit of logical gates, such as AND, OR, and NOT gates. The number of gates in the circuit corresponds to the GEC of the instance.

In our design, we optimize and use GEC encoding scheme of [6] to implement the Sub-Box. Our encoding scheme as follows in Equations 1:

$$\forall i \in \{0, 1, \dots, K-1\}:$$

$$T_{i} = F_{if}(BB_{i}[0], \sim (Q_{4i} \cdot Q_{4i+1}) \cdot \sim Q_{4i+2} \cdot Q_{4i+3})$$

$$+ F_{if}(BB_{i}[1], Q_{4i+2} \cdot (Q_{4i} + Q_{4i+1}))$$

$$+ F_{if}(BB_{i}[2], Q_{4i} \cdot Q_{4i+1} \cdot Q_{4i+2})$$

$$+ F_{if}(BB_{i}[3], Q_{4i+2}) + F_{if}(BB_{i}[4], Q_{4i})$$

$$+ F_{if}(BB_{i}[5], Q_{4i} \cdot Q_{4i+1})$$

$$+ F_{if}(BB_{i}[6], Q_{4i} + Q_{4i+1}) + F_{if}(BB_{i}[7], max)$$

$$(1)$$

where K is numbers of the logical gates,  $Q_{4i} - Q_{4i+3}$  is the input of the  $i^{th}$  logical gate,  $T_i$  is the output of the  $i^{th}$  logical

gate, and  $F_{if}$  is a function that returns the value of the second argument if the first argument is true and returns the value of zero otherwise. The value of max is all one's in the binary expression, which is represented logically as an inverse.  $BB_i$  represents the type of the  $i^{th}$  logical gate, which is a 8-bit binary number. The different types of logical gate used in this encoding scheme are listed in Table 2.

The optimized scheme of Sub-Box is shown in Equations 2, where  $X_0 - X_3$  is the input of the Sub-Box and  $Y_0 - Y_3$  is the output of the Sub-Box. The proposed scheme of Sub-Box is implemented by four MOAI1 gates, three MAOI1 gates, and one AND3 gate. This module of the proposed Sub-Box reduced the area by 28.9% with [5] (based on gate equivalent estimation on UMC 180nm library).

$$T_{0} = \text{MAOI1}(X_{0}, X_{1}, X_{0}, X_{1})$$

$$T_{1} = \text{AND3}(X_{3}, X_{2}, X_{3})$$

$$T_{2} = \text{MAOI1}(X_{1}, X_{2}, X_{0}, X_{3})$$

$$T_{3} = \text{MOAI1}(X_{1}, X_{0}, X_{2}, X_{2})$$

$$T_{4} = \text{MOAI1}(X_{3}, T_{0}, T_{3}, T_{3})$$

$$T_{5} = \text{MOAI1}(T_{3}, T_{0}, X_{0}, T_{1})$$

$$T_{6} = \text{MAOI1}(X_{0}, T_{0}, X_{3}, T_{0})$$

$$T_{7} = \text{MOAI1}(X_{0}, T_{1}, T_{2}, T_{2})$$

$$Y_{0} = T_{5} \quad Y_{1} = T_{7} \quad Y_{2} = T_{6} \quad Y_{3} = T_{4}$$

## 3.1.2. Mix-Columns Optimization

The Mix-Columns component is a linear transformation of the input column. The input column is multiplied by a constant matrix M to produce the output column. M is a involutory matrix, which means  $M^2 = E$ , where E is the identity matrix. It is easy to decrypt the ciphertext by multiplying the ciphertext with M again. The Mix-columns component is shown in Equation 3. where  $I_{0,j}$ ,  $I_{1,j}$ ,  $I_{2,j}$ , and  $I_{3,j}$  are the input column,  $I'_{0,j}$ ,  $I'_{1,j}$ ,  $I'_{2,j}$ , and  $I'_{3,j}$  are the output column, and j is the column index,  $j \in \{0, \ldots, 3\}$ .



Figure 2: Serial architecture of CRAFT

Table 2: Encoding of different types of logical gate

|                                                        | 71 0 0                   |           |
|--------------------------------------------------------|--------------------------|-----------|
| logical expression                                     | $BB_{i}[0:7]$            | gate type |
| $Q_0 \oplus Q_1$                                       | 00000010                 | XOR       |
| $\sim (Q_0 \oplus Q_1)$                                | 00000011                 | XNOR      |
| $Q_0 \wedge Q_1$                                       | 00000100                 | AND       |
| $\sim (Q_0 \wedge Q_1)$                                | 00000101                 | NAND      |
| $Q_0 \vee Q_1$                                         | 00000110                 | OR        |
| $\sim (Q_0 \vee Q_1)$                                  | 00000111                 | NOR       |
| $\sim Q_0$                                             | 00001001                 | NOT       |
| $\sim Q_1$                                             | 00001011                 | NOT       |
| $\sim Q_2$                                             | $0\ 0\ 0\ 1\ 0\ 0\ 0\ 1$ | NOT       |
| $Q_0 \oplus Q_1 \oplus Q_2$                            | 00010010                 | XOR3      |
| $\sim (Q_0 \oplus Q_1 \oplus Q_2)$                     | 00010011                 | XNOR3     |
| $Q_0 \wedge Q_1 \wedge Q_2$                            | $0\ 0\ 1\ 0\ 0\ 0\ 0$    | AND3      |
| $\sim (Q_0 \wedge Q_1 \wedge Q_2)$                     | $0\ 0\ 1\ 0\ 0\ 0\ 0\ 1$ | NAND3     |
| $Q_0 \lor Q_1 \lor Q_2$                                | $0\ 1\ 1\ 1\ 0\ 1\ 1\ 0$ | OR3       |
| $\sim (Q_0 \vee Q_1 \vee Q_2)$                         | $0\ 1\ 1\ 1\ 0\ 1\ 1\ 1$ | NOR3      |
| $\sim ((Q_0 \wedge Q_1) \vee (\sim (Q_2 \vee Q_3)))$   | $1\ 0\ 1\ 1\ 0\ 0\ 0\ 0$ | MAOI1     |
| $\sim (\sim (Q_0 \wedge Q_1) \wedge ((Q_2 \vee Q_3)))$ | $1\ 0\ 1\ 1\ 0\ 0\ 0\ 1$ | MOAI1     |
|                                                        |                          |           |

$$\begin{bmatrix}
I'_{0,j} \\
I'_{1,j} \\
I'_{2,j} \\
I'_{3,j}
\end{bmatrix} = \begin{bmatrix}
1 & 0 & 1 & 1 \\
0 & 1 & 0 & 1 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1
\end{bmatrix} \begin{bmatrix}
I_{0,j} \\
I_{1,j} \\
I_{2,j} \\
I_{3,j}
\end{bmatrix}$$
(3)

In order to reduce the area of this component, the serial architecture of Mix-Columns is utilized, as shown in Figure 3. The serial architecture of Mix-Columns requires four 4-bit registers, two multiplexers, and three XOR gates. The operation of Mix-Columns involves three distinct stages: freeze, shift, and add. During the freeze stage, the register values are kept unchanged by setting both  $CM_0$  and  $CM_1$  to 0. In the shift stage, a shift in the register values from  $RM_0$  to  $RM_4$  is induced by setting both  $CM_0$  and  $CM_1$  to 1. Finally, in the add stage, an addition operation on the column values is executed according to Equation 3. This is achieved by setting  $CM_0$  and  $CM_1$  to 0 and 1, respectively.

The timing diagram for the serial architecture of Mix-Columns is depicted in Figure 4. It requires five clock cycles to compute the next columns from the previous ones, and an additional four clock cycles to transfer data from the internal register of Mix-Columns to the State-Register. Therefore, a complete

state round requires a total of 36 clock cycles.



Figure 3: Serial Architecture of Mix-Columns with clock gating



Figure 4: Timing Diagram for the Serial Architecture of Mix-Columns

### 3.1.3. Control Units

The finite-state machine (FSM) of the serial architecture, as shown in Figure 5, begins its encryption process by storing the initial key and plaintext in the Key-Register and State-Register, respectively. The key undergoes expansion in the Key-Schedule phase, during which the clocks of the Mix-Columns and State-Register are disabled. The Mix-Columns phase follows, storing one column of the State-Register in the Mix-Columns registers and taking five clock cycles to execute Mix-Columns on one

column. Upon completion of this phase, the clocks of the State-Register and Key-Register are disabled. The Sub-Box phase then takes over, sending the data stored in the Mix-Columns registers back to the State-Register and XORing it with the keys, a process that takes another four clock cycles. This cycle between the Mix-Columns and Sub-Box phases is repeated four times for the four columns of the State-Register. The Permute operation is then executed in one clock cycle inside the State-Register. The encryption process concludes when the Round counter reaches 31, at which point the ciphertext is stored in the State-Register.



Figure 5: Finite-state machine for Serial Architecture

Clock gating, a technique discussed in [7], can help reduce the dynamic power consumption of the encryption. This technique is applied separately to the State-Register, Key-Register, and Mix-Columns. For example, during the Key-Schedule phase, the clock of the State-Register and Mix-Columns is turned off because these two blocks are not needed. This helps save a significant amount of power. Figure 6 shows the timing diagram of a design that uses the clock gating technique.



Figure 6: Timing diagram for Serial Architecture

#### 3.2. Unrolled Architecture (A2)

The architecture shown in Figure 7 includes two Sub-Boxes, two Mix-Columns, a Key, a State-Register, two PermuteNibbles, and one feedback path. It's designed to perform a 30-round encryption process in just 15 cycles. Only the Mix-Columns and Add-Key operations are carried out in the final cycle, finishing the encryption process in a total of 16 cycles.

The unrolled architecture, based on the iterative architecture from [3], completes the encryption process in only 16 cycles, compared to the 32 cycles needed by the iterative architecture. Here, a cycle includes two round functions of CRAFT. While this approach might use more area, it provides higher throughput at the same frequency.

#### 3.3. Iterative Architecture (A3)

The architecture from [4] operates on a round-based architecture. It employs a single round function to encrypt a block, incorporating a Sub-Box, a Permute, and an Add-Key operation. This round function is executed 32 times to encrypt a single block. Concurrently, the Key Schedule runs alongside the round function. This architecture is depicted in Figure 8. For comparison, different architectures are listed in Table 3.

| Table 3: Different architectures description | Table 5: | : Dillerent | arcmitectures | describuo |
|----------------------------------------------|----------|-------------|---------------|-----------|
|----------------------------------------------|----------|-------------|---------------|-----------|

| Architecture | Cipher  | Reference | Description |
|--------------|---------|-----------|-------------|
| A1           | CRAFT   | This work | Serial      |
| A2           | CRAFT   | This work | Unrolled    |
| A3           | PRESENT | [4]       | Iterative   |

## 4. Experimental Evaluation

In ASIC implementations, the gate equivalent (GE) is often used to evaluate the area consumption of a design. A single GE is equivalent to a two-input NAND gate. To calculate the area in GE, we divide the total area (in  $\mu m^2$ ) by the area of a two-input NAND gate (also in  $\mu m^2$ ). However, the number of GEs can vary based on the specific technology used, as discussed in [2]. For instance, the number of GEs for the same design will differ between UMC 180nm technology and TSMC 180nm technology. Therefore, GE is not suitable for comparing the area consumption of different designs on different technologies. In order to ensure a fair comparison, we evaluate the area consumption of the proposed designs using FPGA implementations, a method also utilized in [8].

## 4.1. Platform

The proposed architectures were implemented on a Xilinx FPGA board using Vivado v2023.2. To test in various environments, three different FPGA platforms were used for benchmarking: Artix-7(xc7a100tcsg324-1), Spartan-7(xc7s100fgga484-1), and Kintex-7(xc7k70tfbg484-1). Artix-7 offers high performance in resource-limited situations. Spartan-7 is designed for high-restriction environments. Kintex-7 is ideal for applications like 3G and 4G wireless, flat panel displays, and video over IP solutions.

#### 4.2. Area

The Area metric, which includes components like Flip-Flops, LUTs, and Slices, is used to measure the area used by the proposed designs. To make a fair comparison, the FPGA's embedded memory blocks were not used. This was done by turning off the related settings in the VHDL, as suggested in [9]. Also, all designs were synthesized and implemented using the same settings, specifically, the default settings of Vivado Synthesis and Implementation.



Figure 7: Unrolled architecture of CRAFT



Figure 8: Iterative architecture of PRESENT

#### 4.3. Throughput

The performance of the proposed designs is evaluated using the Throughput metric. This metric uses three parameters: the maximum throughput rate, the throughput rate at 100MHz, and the throughput rate per slice. The maximum throughput rate is the highest rate that our designs can achieve, calculated using Equation 4. The throughput rate at 100MHz shows the rate achievable when the clock frequency is set to 100MHz, calculated using Equation 5. The throughput rate per slice is a measure of efficiency, calculated by dividing the throughput rate by the number of Slices (Equation 6). In these calculations, the Plaintext Size is 64-bit, Latency refers to the number of clock cycles required to encrypt a single block, and Slices refers to the number of Slices used by the design.

$$MaxThroughput(Thr) = \frac{MaxFrequency \times PlaintextSize}{Latency} \tag{4}$$

$$Throughput_{@100MHz}(Thr^*) = \frac{100MHz \times PlaintextSize}{Latency}$$
 (5)

$$ThroughputPerSlice = \frac{Thr}{Slices}$$
 (6)

## 4.4. Power and Energy

The Power metric, which includes both dynamic and static power consumption, is used to evaluate the power consumption

of the proposed designs, as defined in Equation 7. On the other hand, the Energy metric measures the energy consumption of the designs. It's calculated by multiplying the power consumption by the time needed to encrypt a single block. This time is determined by dividing the latency by the frequency, as explained in Equation 8.

Total Power(
$$TP$$
) = Dynamic Power( $DP$ )  
+ Static Power( $SP$ ) (7)

$$Energy(E) = \frac{TP \times Latency}{Frequency}$$
 (8)

#### 5. Results

This section presents the results of the proposed designs, divided into three categories: area, throughput, and power and energy. The results are demonstrated across three different FPGA platforms: Artix-7, Spartan-7, and Kintex-7. The area consumption of the designs is displayed in Table 4, the throughput results are illustrated in Table 5, and the details of power and energy consumption are provided in Table 6.

The area consumption of the proposed designs is evaluated based on three factors: Flip-Flops (FF), Look-Up Tables (LUT), and Slices. These designs are compared with the iterative architecture (A3) of PRESENT, as described in [4]. The results indicate that the proposed designs consume 15.72% less area than the iterative architecture of PRESENT.

| Table 4: | Area | used | for | the | three | Architectures |
|----------|------|------|-----|-----|-------|---------------|
|----------|------|------|-----|-----|-------|---------------|

| Platform  | Design | State(bit) | Key(bit) | FF  | LUT | S lices |
|-----------|--------|------------|----------|-----|-----|---------|
|           | A1     | 64         | 128      | 144 | 177 | 59      |
|           | A2     | 64         | 128      | 157 | 378 | 111     |
|           | A3     | 64         | 128      | 201 | 243 | 70      |
|           | A1     | 64         | 128      | 144 | 177 | 57      |
| Spartan-7 | A2     | 64         | 128      | 157 | 381 | 118     |
|           | A3     | 64         | 128      | 201 | 244 | 70      |
|           | A1     | 64         | 128      | 144 | 178 | 58      |
| Kintex-7  | A2     | 64         | 128      | 157 | 377 | 115     |
|           | A3     | 64         | 128      | 201 | 244 | 68      |

Table 5: Throughput results for the three Architectures

| Platform  | Design | Latency | FMAX(MHz) | Thr(Mbps) | $Thr^*(Mbps)^a$ | $\frac{Thr}{Slices}(\frac{Kbps}{Slices})$ |
|-----------|--------|---------|-----------|-----------|-----------------|-------------------------------------------|
|           | A1     | 1215    | 557.41    | 29.36     | 5.27            | 497.65                                    |
| Artix-7   | A2     | 16      | 142.38    | 569.52    | 400.00          | 5130.81                                   |
| A3        | A3     | 32      | 274.04    | 548.08    | 200.00          | 7829.71                                   |
|           | A1     | 1215    | 525.76    | 27.69     | 5.27            | 485.87                                    |
| Spartan-7 | A2     | 16      | 138.86    | 555.44    | 400.00          | 4707.12                                   |
| 1         | A3     | 32      | 296.29    | 592.58    | 200.00          | 8465.43                                   |
|           | A1     | 1215    | 853.97    | 44.98     | 5.27            | 775.57                                    |
| Kintex-7  | A2     | 16      | 175.25    | 701.00    | 400.00          | 6095.65                                   |
|           | A3     | 32      | 357.78    | 715.56    | 200.00          | 10522.94                                  |

<sup>&</sup>lt;sup>a</sup> Throughput rate at 100MHz



Figure 9: Comparison of Flip-Flop in Architectures A1, A2, and A3

Regarding the Flip-Flops (FF), the key schedule of the craft cipher is implemented using multiplexers. This eliminates the need for FF to store the sub-key, resulting in a lower FF count compared to other ciphers. This is a significant factor contributing to the craft cipher's requirement of less than 1000 GE, which is the lowest known requirement on the IBM 130 nm ASIC library, as shown in [3]. A comparison of FF counts is provided in Figure 9.

When it comes to Look-Up Tables (LUT), the proposed designs (A1) require fewer LUTs than the iterative architecture (A3) of PRESENT. This is attributed to the fact that the proposed designs utilize a single Sub-Box, in contrast to the 16 Sub-Boxes used by the iterative architecture of PRESENT. Furthermore, the proposed designs also require fewer LUTs than the unrolled architecture of CRAFT, which uses 32 Sub-Boxes, compared to just one in the proposed designs. A detailed comparison of LUT counts is illustrated in Figure 10.

#### 6. Conclusion

#### References

- S. Sharma, B. Kaushik, A survey on internet of vehicles: Applications, security issues & solutions, Vehicular Communications 20 (2019) 100182.
- [2] K. McKay, L. Bassham, M. Sönmez Turan, N. Mouha, Report on lightweight cryptography, Tech. rep., National Institute of Standards and Technology (2016).
- [3] C. Beierle, G. Leander, A. Moradi, S. Rasoolzadeh, Craft: lightweight tweakable block cipher with efficient protection against dfa attacks, IACR Transactions on Symmetric Cryptology 2019 (1) (2019) 5–45.
- [4] C. A. Lara-Nino, A. Diaz-Perez, M. Morales-Sandoval, Lightweight hardware architectures for the present cipher in FPGA, IEEE Trans. Circuits Syst. I Regul. Pap. 64-I (9) (2017) 2544–2555. doi:10.1109/TCSI.2017.2686783. URL https://doi.org/10.1109/TCSI.2017.2686783
- [5] Z. Bao, J. Guo, S. Ling, Y. Sasaki, Peigen–a platform for evaluation, implementation, and generation of s-boxes, IACR Transactions on Symmetric Cryptology (2019) 330–394.

Table 6: Power and Energy consumption for the three Architectures

| Platform  | Design | DP(mW) | SP(mW) | TP(mW) | E(uJ) | $\frac{E}{bit}(\frac{nJ}{bit})$ |
|-----------|--------|--------|--------|--------|-------|---------------------------------|
|           | A1     | 2.00   | 139.00 | 141.00 | 1.71  | 26.77                           |
| Artix-7   | A2     | 7.00   | 139.00 | 146.00 | 0.02  | 0.37                            |
|           | A3     | 2.00   | 139.00 | 141.00 | 0.05  | 0.71                            |
|           | A1     | 2.00   | 140.00 | 142.00 | 1.73  | 26.96                           |
| Spartan-7 | A2     | 7.00   | 140.00 | 147.00 | 0.02  | 0.37                            |
| _         | A3     | 3.00   | 140.00 | 143.00 | 0.05  | 0.72                            |
|           | A1     | 2.00   | 145.00 | 147.00 | 1.79  | 27.91                           |
| Kintex-7  | A2     | 8.00   | 145.00 | 153.00 | 0.02  | 0.38                            |
|           | A3     | 3.00   | 145.00 | 148.00 | 0.05  | 0.74                            |

DP: Dynamic Power SP: Static Power TP: Total Power E: Energy



Figure 10: Comparison of Look-Up Tables in Architectures A1, A2, and A3  $\,$ 

- [6] J. Feng, Y. Wei, F. Zhang, E. Pasalic, Y. Zhou, Novel optimized implementations of lightweight cryptographic s-boxes via sat solvers, IEEE Transactions on Circuits and Systems I: Regular Papers (2023).
- [7] K. Shahbazi, S.-B. Ko, Area-efficient nano-aes implementation for internet-of-things devices, IEEE Transactions on Very Large Scale Integration (VLSI) Systems 29 (1) (2020) 136–148.
- [8] K. Mohajerani, R. Haeussler, R. Nagpal, F. Farahmand, A. Abdulgadir, J.-P. Kaps, K. Gaj, Fpga benchmarking of round 2 candidates in the nist lightweight cryptography standardization process: Methodology, metrics, tools, and results, Cryptology ePrint Archive (2020).
- [9] A. Xilinx, Ultrafast design methodology guide for xilinx fpgas and socs: Super logic region (2022).