FACE-LIGHT

# FACE-LIGHT: Fast AES CTR Mode Encryption for Low-end Microcontrollers

https://www.youtube.com/watch?v=ICPLQmnn7As&t=1124s

#### **CONTENTS**

01

Introduction

- AES
- Side Channel Attack
- Masking

02

**FACE** 

- Outline
- Structure

03

**Our Work** 

- FACE-LIGHT
- Extended-FACE
- Evaluation

04

Condusion

- Contribution
- Future Work

# Introduction

FACE-LIGHT

# AES (Advanced Encryption Standard)

- World side block cipher standard
  - FIPS 197
  - ISO/IEC 18033-3
- AES Modes
  - ECB, CBC, CFB, OFB, CTR





Fig 1. AES Structure

#### **AES-CTR**

- AES Counter Mode
- Parallel Process

- 128bits IV(Initial Vector)
  - 96bits Nonce
  - 32bits Counter
- Counter value increases by 1 on each block



Fig 2. AES-CTR Structure

## Side Channel Attack(SCA)

- Attack based on additional information during cipher operation
- Power Analysis
  - SPA(Simple Power Analysis)
  - DPA(Differential Power Analysis)
  - CPA(Correlation Power Analysis) Measure



Fig 3. Power Analysis Attack

# Masking

- SCA Countermeasure
  - Preventing power analysis
- Implemented with reference to the published masking technique\*
  - Optimized Implementation on 8bits Microcontroller



Fig 4. Masking Process

\*C. Herbst, E. Oswald, and S. Mangard, "An aes smart card implementation resistant to power analysis attacks," in ACNS, pp. 239–252, Springer, 2006.

# FACE

FACE-LIGHT

#### **Outline**

- Optimized implementation using AES-CTR technique
  - Last byte saves the counter value
  - The only difference between the first and the next block is the last byte
- Stores the repeated value except the counter value
  - Stores the value in the Look Up Table(LUT)
  - Refer to the LUT for specific round
  - Requires 5 LUT(5KB)
- Need to update LUT every period according to the change of the counter value





# Structure (Round 0)

- Utilize the change of last bytes of the data
  - The difference between the first block and the next block is 1byte
- 12 Bytes can be utilized after Round 0
  - S[12], S[13],S[14], S[15]
- Table can be used
   (2<sup>32</sup> 1)times
  - No need for update



# Structure (Round 1)

- Last byte spreading
  - Spreads across two stages
  - ShiftRows
  - MixColumns
- LUT generation available
  - Excpet first column
- Table can be used
   (28 1) times



# Structure (Round 1+)

- Last byte spreading
  - Spreads across two stages
  - ShiftRows
  - MixColumns
- LUT generation available
  - Utilize S[15] as index
  - Table Size: 1KB
- Table can be used (28 x 232) times



# Structure (Round 2)

- First column spreading
  - Spreads across two stages
  - ShiftRows
  - MixColumns
- All values are affected after Round 2
- LUT generation available
  - Intermediate value during MixColumns
- Table can be used
   (28 1) times



Different Part

# Structure (Round 2+)

- Values that are not stored in LUT in Round
   2
  - S[0], S[1],S[2], S[3]
- LUT generation available
  - Unused Intermediate value during MixColumns
  - Table Size: 4KB
- Table can be used (28 x 232) times



#### Limitation of FACE

- Difficult to utilize on 8bits Microcontroller
  - LUT capacity issues
  - Requires minimum 5KB of memory
- Requires updates of LUT at regular intervals



Fig 6. Arduino Uno Memory Structure

# Our Work FACE - LIGHT

FACE-LIGHT

## Target Board

- 8bits Microcontroller
  - Arduino Uno ATmega328P
- Hardware Spec
  - Flash Memory: 32KB
  - SRAM: 2KB
  - EEPROM: 1KB
  - Clock Speed: 16MHz



Fig 7. Arduino Uno

#### Overview

- Optimized implementation based on FACE
  - Optimized for low-power processor
- Stores the iterated value dependent on the counter value
  - Stores the value in the Look Up Table(LUT)
  - Multiple rounds omitted with a single reference
  - Requires 4 LUT (4KB)
- No Need to update LUT every period according to the change of the counter value
- Improved performance by combining with FACE

MixCoulmns

Initialization Vector (128 bits Counter Value: 32 Counter + 96 Nonce) Counter Round 2 -Round 0 S[8] S[1] S[5] S[9] S[5] S[9] Before 256st Block: S[2] S[3] ShiftRows AddRoundKey ShiftRows AddRoundKey SubBytes SubBytes S[9] S[13] S[1] S[2] MixCoulmns MixCoulmns Round 2 Round 0 Round 1 S[9] After 256st Block: S[11] S[7] S[11] S[11] S[15] AddRoundKey AddRoundKey ShiftRows ShiftRows SubBytes SubBytes S[12] S[8] S[12] S[13] S[1] **Identical Part** Different Part S[7] Completely Different Part

Fig 8. Overview of FACE-LIGHT

MixCoulmns

















# Look Up Table Structure

• Size of LUT: 4KB

Update not required



Fig 9. FACE-LIGHT Look Up Table



#### Extended FACE



Subbytes









## **Evaluation** (Calculating Speed)

- 22% performance improvement over standard AES
- No additional LUT update time required

**Unit: Clock Cycles** 

| Security Level | Dinu et al. * | Otte et al. ** | FACE-Light<br>(Our Work) | Ex-FACE<br>(Our Work) |
|----------------|---------------|----------------|--------------------------|-----------------------|
| AES-128        | 2,835         | 2,507          | 2,218                    | 1,967                 |
| AES-192        | N/A           | 2,991          | 2,702                    | 2,449                 |
| AES-256        | N/A           | 3,473          | 3,184                    | 2,931                 |

Table 1. Comparison of calculating speed

<sup>\*</sup> D. Dinu, A. Biryukov,"FELICS-fair evaluation of lightweight cryptographic systems," inNIST, 2015.

<sup>\*\*</sup> D. Otteet al., "AVR-crypto-lib,"Online: http://www.das-labor.org/wiki/AVR-Crypto-Lib/en, 2009.

### Evaluation (vs FACE)

- Optimized for FACE-LIGHT 8bits Microcontroller
- Support Constant Timing(No need to LUT update)
- 8bits low-power processor available without restrictions

|                  | FACE             | FACE-LIGHT<br>(Our Work) |
|------------------|------------------|--------------------------|
| Table Update     | О                | X                        |
| Constant Timing  | Not Support      | Support                  |
| Target Processor | 32-bits or above | 8-bits or above          |
| Expandable Round | Round 2          | Round 3                  |

Table 2. Comparison with original FACE

#### **Evaluation** (Side Channel Attack Resistance)

Resistant to power analysis attacks (CPA, DPA)



Fig 10. Graph of Power Analysis

## Evaluation (vs LEA)

- Better performance compared to Masked LEA using ARX operation
- Improved performance over previous Masked AES
  - FACE-LIGHT, software optimization

#### **Unit: Clock Cycles**

| LEA-128 * | Masked LEA-128 ** | Masked AES-128<br>(Previous Work)*** | Masked FACE-128<br>(Our Work) |
|-----------|-------------------|--------------------------------------|-------------------------------|
| 2,688     | 36,589            | 25,970                               | 6,219                         |

Table 3. Comparison with LEA and Previous Work

<sup>\*</sup> H. Seo, I. Jeong, J. Lee, and W. Kim, "Compact implementations of ARX-based block ciphers on IoT processors," ACM TECS, 2018. \*\* E. Park, S. Oh, and J. Ha, "Masking-based block cipher LEA resistant to side channel attacks," KIISC, 2017.

<sup>\*\*\*</sup> K. H. Kim, H. J. Seo, "Implementation of Optimized 1st-Order Masking AES Algorithm Against Side-Channel-analysis," KIPS, 2019.

# Conclusion

FACE-LIGHT

#### Contribution

- Effective optimization of AES-CTR on low-power processor
  - Clock Cycles optimization
- More Rounds are expandable than FACE
- Difficult in predicting attack points(Timing being constant)
- Masking operation to counter a side channel attack
- Lightweight AES

#### **Future Work**

- Optimization on various platforms
  - 16bits MSP ... ETC



- Pre-calculation of LUT
- Side channel attack resistant
- Software optimization
- Apply proposed methods to AES modes
  - AES GCM



Fig 11. MSP430FR2433 LaunchPad kit

# THANK YOU

pgm.kkh@gmail.com