#### Abstract

This project aims to develop a FPGA system that can encrypt and decrypt data that is being used and transmitted between 'Internet of Things' devices. It does this using the SIMON algorithm that was selected from many during the research stage because of its flexibility and its lightweight characteristics. The FPGA development was done in the System Verilog language and simulated using ModelSim before being synthesised in Quartus Prime. A software version was also developed in C for comparison with the hardware version. Both versions will be compared with similar projects found in literature in terms of data throughput, power consumption and circuit area. So far the software version is ready for testing and the hardware is working in simulation but not tested on a FPGA.

# Contents

| $\mathbf{A}$   | bstra                              | $\operatorname{ct}$    |                     | 4<br>4<br>5 |  |  |  |  |  |  |
|----------------|------------------------------------|------------------------|---------------------|-------------|--|--|--|--|--|--|
| Contents       |                                    |                        |                     |             |  |  |  |  |  |  |
| 1 Introduction |                                    |                        |                     |             |  |  |  |  |  |  |
| 2              | Background Research and Literature |                        |                     |             |  |  |  |  |  |  |
|                | 2.1                                | Intern                 | net of Things       | . 4         |  |  |  |  |  |  |
|                | 2.2                                | Crypt                  | tography            | . 4         |  |  |  |  |  |  |
|                |                                    | 2.2.1                  | Asymmetric Key      | . 4         |  |  |  |  |  |  |
|                |                                    | 2.2.2                  | Symmetric Key       | . 5         |  |  |  |  |  |  |
|                |                                    | 2.2.3                  | Decisions           | . 6         |  |  |  |  |  |  |
|                | 2.3                                | Conve                  | entional Algorithms | . 6         |  |  |  |  |  |  |
|                |                                    | 2.3.1                  | Standardization     | . 6         |  |  |  |  |  |  |
|                |                                    | 2.3.2                  | Other Algorithms    |             |  |  |  |  |  |  |
|                |                                    | 2.3.3                  | Decisions           | . 7         |  |  |  |  |  |  |
|                | 2.4 Lightweight Algorithms         |                        |                     |             |  |  |  |  |  |  |
|                |                                    | 2.4.1                  | SIMON & SPECK       |             |  |  |  |  |  |  |
|                |                                    | 2.4.2                  | Other Algorithms    |             |  |  |  |  |  |  |
|                |                                    | 2.4.3                  | Decisions           | . 9         |  |  |  |  |  |  |
| 3              | Pro                                | $\operatorname{gress}$ |                     | 10          |  |  |  |  |  |  |
|                | 3.1                                | Hoste                  | ed C                | . 10        |  |  |  |  |  |  |
|                | 3.2                                | Syster                 | m Verilog           | . 12        |  |  |  |  |  |  |
| 4              | Fut                                | ure Pl                 | lan                 | 13          |  |  |  |  |  |  |

## Introduction

Over the last few years there has been a shift in type of devices connected to the internet from just servers, PC's and later smartphones, to small embedded processors that can control many devices. The idea of connecting such devices to internet has been dubbed 'Internet of Things' or 'IoT' and has the aim to make our lives simpler. Due to the wide range of products that a connected IoT device can be applied to improve the efficiency and/or usefulness, it has been predicted that billions of devices will be in use by 2020[?]. This also means that the complexity of the devices varies greatly with simple light switches and even kettles being given the IoT treatment, but at the other end of the scale a network of connected self-driving cars is being considered[?].

To keep the potential adversaries from accessing the data, transmitted over an open internet channel, and possibly controlling numerous connected devices, maliciously or not, an encryption algorithm can be used. There are many encryption algorithms that perform this function and most can be implemented in both software and dedicated hardware such as a Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA). As a majority of IoT devices are implemented on small embedded processors which have limited resources, the hardware option might possibly be a better solution for IoT devices. However, due to the fact that most IoT devices are always on, power consumption is a very important factor when considering options for adding hardware accelerated encryption and for battery powered devices it is often more critical than the actual encryption.

The goals of this project are to explore various encryption algorithms and compare their performance based on data throughput, accuracy, security and power consumption when implemented in software and hardware. To evaluate these parameters the same algorithms can be coded in C or C++ for the software versions and a Hardware Development Language (HDL) such as System Verilog can be first simulated in ModelSim, before programming a FPGA for the hardware version. These comparisons can then be used to match the algorithms to the appropriate IoT device as they all have different requirements for relative security level and power consumption, as for example a light switch does not necessarily need to be protected from the same level of attack as a set of digital locks or private data storage.

# Background Research and Literature

## 2.1 Internet of Things

As mentioned in chapter 1 there are many IoT devices that require varying levels of security and have to be protected against different of attacks, like side channel attacks. Due to IoT devices having limited resources [?] suggests that 2000 Gate Equivalents in hardware is the maximum size for most embedded platforms but even that might be to big for devices like RFID tags. Power consumption should also be kept to as little as possible but [?] outlines a limit of tens of micro Watts ( $\mu W$ ) for RFID tags.

## 2.2 Cryptography

The primary objective of cryptography is to convert, or encrypt, a readable message known as plaintext into an unreadable form, ciphertext, so that adversaries cannot read the contents, but over the years the scope of cryptography has widened. Cryptography is therefore the study of encryption and other techniques, including identity authentication and integrity checks. Its counterpart: the study of breaking the encryption to find the original message, is known as cryptanalysis[?]. Eventually, for most encryption techniques a weakness is found, and subsequently exploited, so more complex techniques are conceived and with the invention of computers the complexity of the algorithms has increased greatly.

## 2.2.1 Asymmetric Key

Asymmetric key cryptography can be referred to as public key due to the fact that one of the related keys can be publicly available without compromising the security of the encrypted data. This is because the keys are usually generated based on mathematical problems that have no solution or the solution is impossible for a computer to solve efficiently, such that solving it takes longer than an exhaustive key search[?].

Public key cryptography can be used in two different modes as if data is encrypted with the intended recipients public key only they can decrypt it with their private key, thus encryption. However, if a private key is used for encryption then using the public key to decrypt it ensures the senders identity, authentication[?].

### 2.2.2 Symmetric Key

Similar to the symmetric/private comparison symmetric key cryptography is also known as private key, as in order to keep the encrypted data secure the key used must be kept secret. There are two main types of private key algorithms that operate on the plaintext differently: block ciphers which uses a fixed number of bits, block; or stream ciphers which encrypts data bit by bit[?].

Modern block ciphers work by iterating a basic cipher function with the ciphertext used as the input for the next round[?]. To increase the security of a block cipher subkeys, or round keys, are generated for each round of the iteration using similar operations to those used in the actual cipher. The round functions are mostly designed using either a Feistel network[?] (F network) or a Substitution Permutation network (SP network)[?].

The Fesitel network was named after physicist Horst Feistel who was a integral part of the team at IBM that developed the early block cipher Lucifer. It works by splitting the input plaintext into equal words and applying the round function to it right, or LSB, word before swapping the words for the next iteration. This functionality, for both encryption and decryption, is described in Figure 2.1a.



(a) Basic Feistel network.

On the other hand, Substitution Permutation networks operate on the whole plaintext block using S-boxes for substitution and P-boxes for permutation. There can be more steps that treat the block slightly differently but those are the basic steps and when combined they are enough to provide Shannon's confusion and diffusion[?]. The basic structure of an SP network is in Figure 2.1b.

Stream ciphers work by generating a pseudo-random keystream to combine with the plaintext[?]. Because the keystream is pseudo-random and not completely random a stream cipher is breakable. The keystream is created by a pseudo-random number generated with a cryptographic key used as a seed.

There are modes of operation for block ciphers, in [?], that provide better security by using feedback of the ciphertext to the next block. Some of these modes of operation also allow block ciphers to behave similar to stream ciphers as they encrypt an initialization vector with the key and the resulting ciphertext can be combined with the plaintext.

### 2.2.3 Decisions

Due to the fact that asymmetric key algorithms are hard to solve they require complex hardware or software to implement which is undesirable for this project. Also, with the exception of ECC the key sizes needed for the security can be very large so with the limited IO pins available on FPGAs they could prove difficult to program. On the other hand, many private key algorithms are designed to be efficient in hardware especially Feistel networks as an inverted round function isn't required. While a stream cipher can be useful to encrypt serial data that will most likely be the source, the modes of operation available for block ciphers provide more flexible functionality including stream cipher modes. Therefore, the algorithm chosen for this project will most likely be a block cipher with a Feistel network.

## 2.3 Conventional Algorithms

There are many block ciphers that are considered very secure and therefore popular, they include: DES[?], AES[?], Blowfish[?]. DES operates on a block of 64 bits for 16 rounds using a key length of 64 bits but it has an effective key length of 56 bits as 8 bits were used for parity. AES, an upgrade to DES, is far more secure as uses a 128 bit block and has the flexibility of using three different key lengths: 128, 192 and 256. The number of rounds that AES iterates depends on the key length with 10 rounds used for a 128 bit key, 12 for 192, and 14 for the largest key. Blowfish, like DES, operates on a 64 bit block and iterates for 16 rounds, but it can use a variable key length in the range 32 to 448 bits.

#### 2.3.1 Standardization

DES, which stands for Data Encryption Standard, is one the earliest block ciphers used in the computer age. It has the name Data Encryption Standard as it was

accepted as the standard encryption algorithm by the US National Bureau of Standards (NBS), now the National Institute of Standards and Technology (NIST), in 1977 after it was altered by the National Security Agency (NSA), which caused some controversy[?].

DES was used for about two decades but in the 1990s several successful attacks proved its weakness[?] so in 1997 NIST started a selection process to find its replacement. It took three years to decide on the algorithm to be set as the standard which was announced as Rijndael in 2000 and the standard was et in 2001, with the 128, 192, 256 bit keys being used in the standard[?]. Unlike DES, AES uses a SP network as it is efficient, in time, in both hardware and software.

Since its standardization in 2001 AES has been used almost exclusively because its security is trusted. Because of this there are many different software and hardware implementations produced with some concentrating on side-channel attack resistance[?] or efficient S-box implementations[?]. However, even area optimized designs like [?] use 2400 gate equivalents which is too much for lightweight applications.

### 2.3.2 Other Algorithms

The US standardized algorithms quickly became very popular and can be considered the unofficial global standard. Although, there are many other algorithms that are considered secure and are commonly used. These algorithms might be used because there is still some distrust of the NSAs involvement in the algorithms and they are more open source. Blowfish was designed by Bruce Schneier in the early 1990s as he, and many others, noticed the insecurity of DES particularly with the 56 bit key length making a brute force attack more plausible[?].

As with AES there are many FPGA and ASIC implementations of the Blowfish algorithm, [?] and [?], but they require too many FPGA resources to be considered for the lightweight nature of this project.

#### 2.3.3 Decisions

Due to the conventional algorithms not being explicitly designed for hardware and definitely not for lightweight applications they are not appropriate for this project. Although, they are useful for comparison with the lightweight algorithms in section 2.4 in terms of security and throughput.

## 2.4 Lightweight Algorithms

After deciding that the conventional algorithms might not be suitable for the low power devices targeted by this project, some more lightweight algorithms were found including: PRESENT[?], PRINCE[?] and the SIMON and SPECK algorithms[?]. However, as IoT is an emerging technology and is the main reason for lightweight cryptography there isn't a standard set by NIST, but the process has begun in [?].

These algorithms are considered lightweight because they make sacrifices in and security or throughput, or both, to achieve small area and low power designs.

### 2.4.1 SIMON & SPECK

The SIMON and SPECK family of algorithms are the lightweight techniques proposed by the NSA that were designed to perform well in both software and hardware while still being secure; and to be flexible in terms of block and key size, listed in Table 2.1. The algorithms are similar but SIMON was optimised for hardware implementations and SPECK for software. As with AES the number of rounds iterated depends on the key size but as the block size varies as well it also has an effect as shown in Table 2.1. The structure of both algorithms is a Feistel network and thus it works on words of n bits, where 2n is the block size, and with a key of mn bits.

| Block Size | Key Size | n  | m | SIMON Rounds | SPECK Rounds |
|------------|----------|----|---|--------------|--------------|
| 32         | 64       | 16 | 4 | 32           | 22           |
| 48         | 72       | 24 | 3 | 36           | 22           |
| 48         | 96       | 24 | 4 | 36           | 23           |
| 64         | 96       | 32 | 3 | 42           | 26           |
| 64         | 128      | 32 | 4 | 44           | 27           |
| 96         | 96       | 48 | 2 | 52           | 28           |
| 96         | 144      | 48 | 3 | 54           | 29           |
| 128        | 128      | 64 | 2 | 68           | 32           |
| 128        | 192      | 64 | 3 | 69           | 33           |
| 128        | 256      | 64 | 4 | 72           | 34           |

Table 2.1: A table of the modes of operation for the SIMON & SPECK Algorithms. Adapted from [?].

Even though they are relatively new algorithms there are still a few FPGA implementations available for review, including [?] and [?] as well as those provided in [?]. These all show that very small designs are possible with even the 128/256 versions fitting below the 2000 GE limit.

## 2.4.2 Other Algorithms

The PRESENT cipher is another option for lightweight cryptography as it achieves a 1570 GE FPGA design with a 80 bit working on a 64 bit block. there is also an version that uses 128 bit key. The design of the cipher is based around a SP network, Figure 2.1a, with 64 bit subkeys.

PRINCE is a lightweight cipher that can encrypt data in one clock cycle with an unrolled SP network, as SP networks require less rounds than a F network. The steps include S-boxes and a matrix layer as well as the XORing of the round key and a round constant. Due to the unrolled nature of the algorithm the FPGA implementations are mainly combinational so the register count is lower than other algorithms.

### 2.4.3 Decisions

Based on the research explored in chapter 2 I chose the SIMON and SPECK algorithms from the NSA, mainly because of their flexibility in security levels, with different key lengths, that could be applied to the different devices explored in section 2.1. This means that multiple algorithms don't need to be developed and I could concentrate on making my code as efficient as possible. When compared in terms of power consumption the SIMON64/96 version in [?] also shows lower power consumption than the other algorithms explored. Also, while both PRESENT and PRINCE meet the lightweight specification they are also SP networks and in subsection 2.2.3 it was decided that a Feistel network is preferable.

After deciding on the SIMON and SPECK family I explored how each version works in order to make a more informed decision on which to work with in this project. As SIMON was designed primarily for hardware it only makes use of XOR  $(\oplus)$ , AND (&) and circular rotate operations  $(R^{j}[x])$  on the n bit wide words. The encryption and decryption functions take the Feistel network form described in Figure 2.1a with the round function Equation 2.1.

$$F(x) = (R^{1}[x]\&R^{8}[x]) \oplus R^{2}[x]$$
(2.1)

SPECK on the other hand, being optimised for software implementations uses XOR  $(\oplus)$ , modulo  $2^n$  addition (+) and circular rotate operations  $(R^j[x])$ . The encryption and decryption functions take a slightly different form to the basic Feistel network, Figure 2.1a, and are shown in Equation 2.2 and 2.3 where  $\alpha = 7$  and  $\beta = 2$  if n = 16, but  $\alpha = 8$  and  $\beta = 3$  otherwise.

$$L_{i+1} = (R^{-\alpha}[L_i] + R_i) \oplus K_i \tag{2.2}$$

$$R_{i+1} = R^{\beta}[R_i] \oplus (R^{-\alpha}[L_i] + R_i) \oplus K_i = R^{\beta}[R_i] \oplus L_{i+1}$$
 (2.3)

As the aim of this project is to compare how an algorithm performs in hardware and software SIMON was chosen even though SPECK shows almost as good performance in hardware and much better in software.

# **Progress**

Due to the fact that there isn't a standard library or program for SIMON, a software version was required for benchmarking as well as the main System Verilog version. Starting in software also increased by familiarity of the algorithm and provide a good starting point for System Verilog development. The code for the Hosted C version is available in the Appendix.

For all versions of the SIMON algorithm (table 2.1) efficiency, in terms of power consumption and resource use, is very important but time efficiency is not an initial priority. All versions of this algorithm were developed not only with the description but also the pseudocode provided in [?]. That document also has a set of test vectors for all modes that define the ciphertext that the algorithm should produce from the given key and plaintext. Due to the Feistel network structure of this algorithm encryption can be done in parallel to the key expansion but for decryption requires the keys to be pre expanded. For this reason, and because some modes of operation only require encryption, two variants of the algorithm were developed: one for just encryption and one with full functionality.

### 3.1 Hosted C

As there are ten versions of this algorithm that all differ slightly with the block key sizes they could all be developed individually. However, as ten versions of the similar functions would be required I felt it was better to have one version that is flexible. However, checking which mode the program is before most operations would be very inefficient and as the program would rarely changing mode during runtime the decisions could be made at compile time using preprocessor macros as in Listing 3.1. The macros were used to define the variable type used for the words based on the values for n with the uintn-t type. Also the value of m was used with macro if statements in the key expansion functions where extra code is required if m = 4.

```
defined S32_64
1 #if
   #define N 16
    #define M 4
    #define T 32
    \#define j 0
6 #elif defined S48_72
9 typedef uint32_t
                     uint24_t;
10 typedef uint64_t
                     uint48_t;
                     uint ## x ## _t
11 \#define TYPE_(x)
12 \#define TYPE(x)
                     TYPE_{-}(x)
13 \#define UINT(x, n) typedef struct n { TYPE(x) v : x; } n;
14 UINT(N, word);
16 typedef word
                   block[2];
17 typedef word
                   key [M];
18 typedef word
                   keys [T];
```

Listing 3.1: Macro definition of word type

The round function and Feistel functions are shown in Listing 3.2 and they are iterated in the encryption and decryption functions. These functions were tested with the test vectors from [?] producing the results in

```
1 TYPE(N) F (word x )
2 {
3    return (ROTL(x,1) & ROTL(x,8)) ^ ROTL(x, 2);
4 }
5    6 void ROUND (block b, word k )
7 {
8    word tmp = b[0];
9    b[0].v = b[1].v ^ F(b[0]) ^ k.v;
10    b[1] = tmp;
11 }
```

Listing 3.2: Round and Feistel functions

```
TESTING PRE KEY EXPANSION
               1918111009080100
PLAIN->CIPHER: 65656877->00000000
                                                START
PLAIN->CIPHER: 65656877->C69BE9BB
                                                ENCRYPT
PLAIN->CIPHER: 65656877->C69BE9BB
                                                TEST
PLAIN<-CIPHER: 65656877<-C69BE9BB
                                                DECRYPT
TESTING IN LOOP KEY EXPANSION
               1918111009080100
KEY:
PLAIN->CIPHER: 65656877->00000000
                                                START
PLAIN->CIPHER: 65656877->C69BE9BB
                                                ENCRYPT
PLAIN->CIPHER: 65656877->C69BE9BB
                                                TEST
```

Figure 3.1: Results from SIMON32/64 Test Vectors encrypted and decrypted in Hosted C program.

## 3.2 System Verilog

After the Hosted C version was working with the test vectors the System Verilog development began. Similar to the function used in the software developed in section 3.1 modules were created and tested with individual testbenches before being combined in the top level control modules. Most of the modules, including the rotate operations, were done in combination logic blocks for initial simplicity to ensure correct simulations. While this might work in simulations, it could be unreliable and inefficient when implemented in an FPGA. Also parameters, demonstrated in Listing 3.3, were used so the different modes could use the same code but the only testing done so far is with the SIMON32/64 version.

```
module SIMON_control  2 \# (
3 parameter N=16,
4 parameter M=4,
5 parameter T=32
6 )
7 (
8 input logic clk,
9 output int count,
10 output logic done,
11 input logic [2*N-1:0] plain,
12 output logic [2*N-1:0] cipher,
13 input logic [M-1:0] key
14 );
```

Listing 3.3: Encryption control module

Figure 3.2 shows the ModelSim simulation results of the control module, Listing 3.3 set in the 32/64 mode, for encryption with the correct ciphertext appearing on the cipher port only after the done signal is high.



Figure 3.2: Simulations Results for encryption in System Verilog.

The synthesis results, using Quartus Prime, shown in ??, show a RTL netview of the module. The synthesis shows that this module requires 161 pins and 131 registers and when a GE multiplier of 4 is used this encryption module has a GE area of 524 which is well below the 2000 limit set in section 2.1 and similar to that in [?]. Although, the module doesn't provide complete functionality so those values will most likely increase.

## Future Plan

To continue this project the System Verilog code needs to be improved with more sequential operations to improve reliability and efficiency. Serializing the algorithm at the byte level could also be explored which could have some interesting results similar to the bit serial version presented in [?]. Work will also have to be done to implement the System Verilog top module onto an FPGA and interface it with communication module, that could be provided by the FPGA development board used or from an OpenCores.org project[?]. When the FPGA is functioning correctly tests will need to be developed to determine the throughput in hardware and software, preferably operating at similar clock frequencies to ensure the results will be as comparable as possible. The hosted C program, section 3.1, should also be adapted to be programmed onto an 8 or 16 bit microcontroller and then the same parameters will be tested and compared.