# Masking Floating-Point Number Multiplication and Addition of Falcon

Keng-Yu Chen, Jiun-Peng Chen

Conference on Cryptographic Hardware and Embedded Systems

September 5th, 2024

## Table of Contents

- Introduction
- Preliminaries
- Masked Floating-Point Number Multiplication and Addition
- Evaluation and Implementation
- Conclusion

## Table of Contents

- Introduction
- 2 Preliminaries
- Masked Floating-Point Number Multiplication and Addition
- Evaluation and Implementation
- Conclusion

#### Introduction

 To defend the potential threat from large-scale quantum computers, the US National Institute of Standards and Technology (NIST) initiated standardization process for post-quantum cryptography in 2016.

#### Introduction

- To defend the potential threat from large-scale quantum computers, the US National Institute of Standards and Technology (NIST) initiated standardization process for post-quantum cryptography in 2016.
- In 2022, four selected algorithms CRYSTALS-Kyber, CRYSTALS-Dilithium, FALCON, and SPHINCS+ were expected to be part of NIST's post-quantum cryptographic standards.

# Theoretical Security - Hardness of Mathematical Problems

In theory, these algorithms can base their security on problems that are considered still hard given the advantage of quantum computing.

- O CRYSTALS-Kyber: Module Learning With Errors (MLWE)
- CRYSTALS-Dilithium: Module Short Integer Solution (MSIS)
- FALCON: NTRU Problem and SIS on NTRU lattices
- SPHINCS+: Security of the used hash function families

# Real-World Security - Side-Channel Attacks

In practice, the implementations of these algorithms can suffer side-channel attacks. Fortunately, there are countermeasures for them.

- CRYSTALS-Kyber: [Bos+21; Fri+22; Hei+22]
- ② CRYSTALS-Dilithium: [Mig+19]
- FALCON: [How+20; Gue+22; Zha+23]
- SPHINCS+: [Ber+10; Bel+13]

# Real-World Security - Side-Channel Attacks

In practice, the implementations of these algorithms can suffer side-channel attacks. Fortunately, there are countermeasures for them.

- CRYSTALS-Kyber: [Bos+21; Fri+22; Hei+22]
- CRYSTALS-Dilithium: [Mig+19]
- **Solution** FALCON: [How+20; Gue+22; Zha+23]
- SPHINCS+: [Ber+10; Bel+13]

Unfortunately, there are attacks on FALCON that have not been addressed.

### Attacks on FALCON



Figure: A graphical overview of FALCON.Sign.

|                                | Attack           | Countermeasure   |
|--------------------------------|------------------|------------------|
| Pre-image Vector Computation   |                  | None             |
| Gaussian Sampler over Lattices | [Gue+22; Zha+23] | [Gue+22; Zha+23] |

### Attacks on FALCON



Figure: A graphical overview of FALCON.Sign.

|                                | Attack           | Countermeasure   |
|--------------------------------|------------------|------------------|
| Pre-image Vector Computation   |                  | None             |
| Gaussian Sampler over Lattices | [Gue+22; Zha+23] | [Gue+22; Zha+23] |

In this paper, we present the following contributions:

• We propose the first masking scheme on the floating-point number multiplication and addition in the pre-image vector computation of FALCON as a countermeasure.

- We propose the first masking scheme on the floating-point number multiplication and addition in the pre-image vector computation of FALCON as a countermeasure.
- We verify the high-order security of our design in the probing model.

- We propose the first masking scheme on the floating-point number multiplication and addition in the pre-image vector computation of FALCON as a countermeasure.
- We verify the high-order security of our design in the probing model.
- To test the practical leakage of our work, we conduct the Test Vector Leakage Assessment (TVLA) [GJR+11] experiments.

- We propose the first masking scheme on the floating-point number multiplication and addition in the pre-image vector computation of FALCON as a countermeasure.
- We verify the high-order security of our design in the probing model.
- To test the practical leakage of our work, we conduct the Test Vector Leakage Assessment (TVLA) [GJR+11] experiments.
- We also test the performance by comparing with the reference implementation of FALCON [Pre+20].

Throughout the presentation, we assume

• For a variable x, the jth bit of x is written as  $x^{(j)}$ .

- For a variable x, the jth bit of x is written as  $x^{(j)}$ .
- The *i*th bit to *j*th bit  $(j \ge i)$  of x is represented by  $x^{[j:i]}$ .

- For a variable x, the jth bit of x is written as  $x^{(j)}$ .
- The *i*th bit to *j*th bit  $(j \ge i)$  of x is represented by  $x^{[j:i]}$ .
- A sequence of n variables  $(x_1, x_2, \dots, x_n)$  (e.g. shares of variable x) is written as  $(x_i)_{1 \le i \le n}$ , or simply  $(x_i)$ .

- For a variable x, the jth bit of x is written as  $x^{(j)}$ .
- The *i*th bit to *j*th bit  $(j \ge i)$  of x is represented by  $x^{[j:i]}$ .
- A sequence of n variables  $(x_1, x_2, \dots, x_n)$  (e.g. shares of variable x) is written as  $(x_i)_{1 \le i \le n}$ , or simply  $(x_i)$ .
- For a proposition P,  $\llbracket P \rrbracket = 1$  if and only if P is true and 0 if otherwise.

## Table of Contents

- Introduction
- Preliminaries
  - Falcon
  - Floating-Point Number Arithmetic
  - Power Analysis and Masking
- Masked Floating-Point Number Multiplication and Addition
- Evaluation and Implementation
- Conclusion

## Table of Contents

- Introduction
- Preliminaries
  - FALCON
  - Floating-Point Number Arithmetic
  - Power Analysis and Masking
- Masked Floating-Point Number Multiplication and Addition
- 4 Evaluation and Implementation
- Conclusion

- A NIST-standardized digital signature
- Use the Gentry-Peikert-Vaikuntanathan (GPV) framework [GPV08] with NTRU lattices

### KeyGen

Public Key:  $\mathbf{A} \in \mathbb{Z}_q^{N \times M}$ 

Secret Key: Short  $\mathbf{B} \in \mathbb{Z}_q^{M \times M}$ 

 $\mathbf{B}\mathbf{A}^T = \mathbf{0} \bmod q$ 

# Sign(m)

A short **s** s.t.

 $\mathbf{sA}^T = H(m) \bmod q$ 

 $H: \{0,1\}^* \to \{0,1\}^N$ 

Verify(m, s)

Check

**s** is short

To find such a short s, one can first

13 / 62

To find such a short s, one can first

• Compute H(m)

- Compute H(m)
- Find a solution **c** (not short) where  $\mathbf{cA}^T = H(\mathbf{m}) \mod q$

- Compute H(m)
- Find a solution **c** (not short) where  $cA^T = H(m) \mod q$
- Compute the pre-image vector  $\mathbf{t} \leftarrow \mathbf{c} \mathbf{B}^{-1}$

- Compute H(m)
- Find a solution **c** (not short) where  $\mathbf{cA}^T = H(\mathbf{m}) \mod q$
- Compute the pre-image vector  $\mathbf{t} \leftarrow \mathbf{c} \mathbf{B}^{-1}$
- ullet Apply the nearest plane algorithm to find an integer vector  ${\bf z}$  such that  $({\bf t}-{\bf z}){\bf B}$  is short.

- Compute H(m)
- Find a solution **c** (not short) where  $\mathbf{cA}^T = H(\mathbf{m}) \mod q$
- Compute the pre-image vector  $\mathbf{t} \leftarrow \mathbf{c} \mathbf{B}^{-1}$
- ullet Apply the nearest plane algorithm to find an integer vector  ${\bf z}$  such that  $({\bf t}-{\bf z}){\bf B}$  is short.
- $\mathbf{s} \leftarrow (\mathbf{t} \mathbf{z})\mathbf{B}$ . Note that  $\mathbf{s}\mathbf{A}^T = H(\mathbf{m}) \bmod q$

In Falcon,

Keng-Yu Chen, Jiun-Peng Chen Masking Falcon's FPU September 5th, 2024

14 / 62

In Falcon,

• Short secret polynomials  $f, g, F, G \in \mathbb{Z}[x]/(x^N+1)$  where

$$fG - gF = q$$
  $\mathbf{B} = \begin{bmatrix} g & -f \\ \hline G & -F \end{bmatrix}$ 

In Falcon,

• Short secret polynomials  $f, g, F, G \in \mathbb{Z}[x]/(x^N+1)$  where

$$fG - gF = q$$
  $\mathbf{B} = \begin{bmatrix} g & -f \\ \hline G & -F \end{bmatrix}$ 

• Public polynomial  $h = gf^{-1} \mod q$  and  $\mathbf{A}^T = \begin{bmatrix} 1 \\ h \end{bmatrix}$ 

In Falcon,

• Short secret polynomials  $f, g, F, G \in \mathbb{Z}[x]/(x^N+1)$  where

$$fG - gF = q$$
  $\mathbf{B} = \begin{bmatrix} g & -f \\ \hline G & -F \end{bmatrix}$ 

- ullet Public polynomial  $h=gf^{-1} mod q$  and  $m{A}^T=\left[rac{1}{h}
  ight]$
- $\mathbf{c} = [c \mid 0]$ , where  $c = H(r \mid m)$  for the message m and a random salt r.

In Falcon,

• Short secret polynomials  $f, g, F, G \in \mathbb{Z}[x]/(x^N+1)$  where

$$fG - gF = q$$
  $\mathbf{B} = \begin{bmatrix} g & -f \\ \hline G & -F \end{bmatrix}$ 

- ullet Public polynomial  $h=gf^{-1} mod q$  and  ${f A}^T=\left[rac{1}{h}
  ight]$
- $\mathbf{c} = [c \mid 0]$ , where  $c = H(\mathbf{r} \parallel \mathbf{m})$  for the message m and a random salt r.

Moreover, FALCON applies the fast Fourier nearest plane algorithm [DP16] to speed up the signing process.

# Sign (Simplified)

**Input:** Message m, secret key sk, bound  $\lfloor \beta^2 \rfloor$ 

Output: Signature sig

1: Sample salt  $r \leftarrow \{0,1\}^{320}$  uniformly

2:  $c \leftarrow H(r||m)$ 

3: Compute the pre-image vector  $\mathbf{t} \leftarrow [c \mid 0] \cdot \mathbf{B}^{-1}$ 

4: repeat

5:  $\mathbf{z} = \mathsf{ffSampling}(\mathbf{t}, \mathsf{sk})$ 

6:  $\mathbf{s} = [s_1 \mid s_2] = (\mathbf{t} - \mathbf{z})\mathbf{B}$ 

7: **until**  $\|\mathbf{s}\|^2 \le |\beta^2|$ 

8:  $sig \leftarrow (r, s_2)$ 

## Verify (Simplified)

**Input:** Message m, signature sig

**Input:** Bound  $\lfloor \beta^2 \rfloor$ 

Output: Accept or Reject

1:  $c \leftarrow H(r||m)$ 

2:  $s_1 \leftarrow c - s_2 h \mod q$ 

3: **if**  $||(s_1, s_2)||^2 \le \lfloor \beta^2 \rfloor$  **then** 

4: Accept

5: **else** 

6: Reject

## Sign (Simplified)

**Input:** Message m, secret key sk, bound  $\lfloor \beta^2 \rfloor$ 

Output: Signature sig

1: Sample salt  $r \leftarrow \{0,1\}^{320}$  uniformly

2:  $c \leftarrow H(r||m)$ 

3: Compute the pre-image vector  $\mathbf{t} \leftarrow [c \mid 0] \cdot \mathbf{B}^{-1}$ 

4: repeat

5:  $\mathbf{z} = \mathsf{ffSampling}(\mathbf{t}, \mathsf{sk})$ 

6:  $\mathbf{s} = [s_1 \mid s_2] = (\mathbf{t} - \mathbf{z})\mathbf{B}$ 

7: **until**  $\|\mathbf{s}\|^2 \le |\beta^2|$ 

8:  $sig \leftarrow (r, s_2)$ 

## Verify (Simplified)

**Input:** Message m, signature sig

**Input:** Bound  $\lfloor \beta^2 \rfloor$ 

Output: Accept or Reject

1:  $c \leftarrow H(r||m)$ 

2:  $s_1 \leftarrow c - s_2 h \mod q$ 

3: if  $||(s_1, s_2)||^2 \le |\beta^2|$  then

4: Accept

5: **else** 

6: Reject

#### Table of Contents

- Introduction
- Preliminaries
  - FALCON
  - Floating-Point Number Arithmetic
  - Power Analysis and Masking
- Masked Floating-Point Number Multiplication and Addition
- 4 Evaluation and Implementation
- Conclusion

#### Fast-Fourier Transform

The pre-image vector computation includes polynomial multiplications

$$\mathbf{t} = \left[ \begin{array}{c|c} c & 0 \end{array} \right] \cdot \mathbf{B}^{-1} = \frac{1}{q} \left[ \begin{array}{c|c} c \cdot -F & c \cdot f \end{array} \right]$$

#### Fast-Fourier Transform

The pre-image vector computation includes polynomial multiplications

$$\mathbf{t} = \left[ \begin{array}{c|c} c & 0 \end{array} \right] \cdot \mathbf{B}^{-1} = \frac{1}{q} \left[ \begin{array}{c|c} c \cdot -F & c \cdot f \end{array} \right]$$

To speed up and apply the fast Fourier nearest plane algorithm, the pre-image vector computation is performed in the Fourier domain:

$$\frac{1}{q} \left[ |\mathsf{FFT}(c) \odot \mathsf{FFT}(-F)| |\mathsf{FFT}(c) \odot \mathsf{FFT}(f)| \right]$$

#### Fast-Fourier Transform

The pre-image vector computation includes polynomial multiplications

$$\mathbf{t} = \left[ \begin{array}{c|c} c & 0 \end{array} \right] \cdot \mathbf{B}^{-1} = \frac{1}{q} \left[ \begin{array}{c|c} c \cdot -F & c \cdot f \end{array} \right]$$

To speed up and apply the fast Fourier nearest plane algorithm, the pre-image vector computation is performed in the Fourier domain:

$$\frac{1}{q} \left[ | \mathsf{FFT}(c) \odot \mathsf{FFT}(-F) | | \mathsf{FFT}(c) \odot \mathsf{FFT}(f) | \right]$$

Therefore, the pre-image vector computation is essentially coefficient-wise complex number multiplications.

### Floating-Point Number

A complex number is represented by two 64-bit floating-point numbers (FPNs). An FPN is composed of sign bit s, exponent e, and mantissa  $\tilde{m}$ 



Figure: A 64-bit Floating-Point Number

The value is 
$$(-1)^s \cdot 2^{e-1023} \cdot \underbrace{(1+\tilde{m}\cdot 2^{-52})}_{\times 2^{52}=m}$$

For convenience, we may use (s, e, m) to represent an FPN.

FPN multiplication (FprMuI) is proceeded by

FPN multiplication (FprMul) is proceeded by

FPN addition (FprAdd) is proceeded by

Sign bit XOR

FPN multiplication (FprMul) is proceeded by

- Sign bit XOR
- Exponent Addition

FPN multiplication (FprMul) is proceeded by

- Sign bit XOR
- Exponent Addition
- Mantissa Multiplication

FPN multiplication (FprMul) is proceeded by

- Sign bit XOR
- Exponent Addition
- Mantissa Multiplication
- Right-shifting the mantissa to  $[2^{54}, 2^{55})$

FPN multiplication (FprMul) is proceeded by

- Sign bit XOR
- Exponent Addition
- Mantissa Multiplication
- **3** Right-shifting the mantissa to  $[2^{54}, 2^{55})$
- Combining the results and rounding (FPR)

FPN multiplication (FprMul) is proceeded by

- Sign bit XOR
- Exponent Addition
- Mantissa Multiplication
- **1** Right-shifting the mantissa to  $[2^{54}, 2^{55}]$
- Combining the results and rounding (FPR)

FPN addition (FprAdd) is proceeded by

• Making the first operand  $\geq$  the second

FPN multiplication (FprMul) is proceeded by

- Sign bit XOR
- Exponent Addition
- Mantissa Multiplication
- $\bigcirc$  Right-shifting the mantissa to  $[2^{54}, 2^{55})$
- Combining the results and rounding (FPR)

- lacktriangle Making the first operand  $\geq$  the second
- Right-shifting the second operand

#### FPN multiplication (FprMul) is proceeded by

- Sign bit XOR
- Exponent Addition
- Mantissa Multiplication
- Right-shifting the mantissa to  $[2^{54}, 2^{55})$
- Combining the results and rounding (FPR)

- lacktriangle Making the first operand  $\geq$  the second
- Right-shifting the second operand
- Mantissa Addition / Subtraction

#### FPN multiplication (FprMul) is proceeded by

- Sign bit XOR
- Exponent Addition
- Mantissa Multiplication
- **1** Right-shifting the mantissa to  $[2^{54}, 2^{55})$
- Combining the results and rounding (FPR)

- lacktriangle Making the first operand  $\geq$  the second
- Right-shifting the second operand
- Mantissa Addition / Subtraction
- Normalizing the sum to  $[2^{54}, 2^{55})$

FPN multiplication (FprMul) is proceeded by

- Sign bit XOR
- Exponent Addition
- Mantissa Multiplication
- $\odot$  Right-shifting the mantissa to  $[2^{54}, 2^{55}]$
- Combining the results and rounding (FPR)

- lacktriangle Making the first operand  $\geq$  the second
- Right-shifting the second operand
- Mantissa Addition / Subtraction
- O Normalizing the sum to  $[2^{54}, 2^{55})$
- Ombining the results and rounding (FPR)

#### Sticky Bit

In floating-point arithmetic, when shifted right, the mantissa maintains a sticky bit

$$10010{\color{red}0100} \gg 4 \rightarrow 1001 \underbrace{1}_{\text{Sticky}}$$

It indicates whether there exists any 1 after the least significant bit. In the above example,

sticky bit = 
$$0 \lor [(0100) \neq 0] = [(00100) \neq 0]$$

#### Table of Contents

- Introduction
- Preliminaries
  - FALCON
  - Floating-Point Number Arithmetic
  - Power Analysis and Masking
- Masked Floating-Point Number Multiplication and Addition
- Evaluation and Implementation
- Conclusion

### Power Analysis Attacks

• Power consumption during the execution of programs depends on intermediate values.

## Power Analysis Attacks

- Power consumption during the execution of programs depends on intermediate values.
- Power analysis attacks, the side-channel attacks on the pre-image vector computation, leverage this fact to find the secret key.

# Power Analysis Attacks

- Power consumption during the execution of programs depends on intermediate values.
- Power analysis attacks, the side-channel attacks on the pre-image vector computation, leverage this fact to find the secret key.



Figure: An Example of a Power Trace

Masking defends such threats by secret-sharing the sensitive variables.

Keng-Yu Chen, Jiun-Peng Chen Masking Falcon's FPU September 5th, 2024

23 / 62

Masking defends such threats by secret-sharing the sensitive variables.

• Boolean Masking: A variable x is split into n shares  $(x_i)$  such that

$$x = \bigoplus_{i=1}^{n} x_i$$

Masking defends such threats by secret-sharing the sensitive variables.

• Boolean Masking: A variable x is split into n shares  $(x_i)$  such that

$$x = \bigoplus_{i=1}^{n} x_i$$

• Arithmetic Masking: A variable x is split into n shares  $(x_i)$  (when stored in a k-bit register) such that

$$x = \sum_{i=1}^{n} x_i \pmod{2^k}$$

• In each run, all  $x_i$ 's are randomized so that any n-1 shares of them are independently and uniformly distributed.

- In each run, all  $x_i$ 's are randomized so that any n-1 shares of them are independently and uniformly distributed.
- All operations need to be operated via shares.

#### Table of Contents

- Introduction
- 2 Preliminaries
- Masked Floating-Point Number Multiplication and Addition
  - Overview of Our Approach
  - SecNonzero
  - SecFprUrsh
  - SecFprNorm64
- 4 Evaluation and Implementation
- Conclusion

#### Table of Contents

- Introduction
- 2 Preliminaries
- Masked Floating-Point Number Multiplication and Addition
  - Overview of Our Approach
  - SecNonzero
  - SecFprUrsh
  - SecFprNorm64
- 4 Evaluation and Implementation
- Conclusion

We now show how we mask FPR, FprMul, and FprAdd.

27 / 62

We now show how we mask FPR, FprMul, and FprAdd. An intuitive approach to mask any algorithm:

We now show how we mask FPR, FprMul, and FprAdd. An intuitive approach to mask any algorithm:

• For operations like  $\land, \oplus$ : Boolean masking

We now show how we mask FPR, FprMul, and FprAdd. An intuitive approach to mask any algorithm:

- For operations like  $\land, \oplus$ : Boolean masking
- For operations like  $+, \times$ : arithmetic masking

We now show how we mask FPR, FprMul, and FprAdd.

An intuitive approach to mask any algorithm:

- For operations like  $\land, \oplus$ : Boolean masking
- For operations like  $+, \times$ : arithmetic masking

and use the following gadgets if necessary:

We now show how we mask FPR, FprMul, and FprAdd.

An intuitive approach to mask any algorithm:

- For operations like  $\land, \oplus$ : Boolean masking
- ullet For operations like  $+, \times$ : arithmetic masking

and use the following gadgets if necessary:

• A2B:  $(x_i)_{1 \le i \le n} \mapsto (y_i)_{1 \le i \le n}$  such that  $\sum_{i=1}^n x_i = \bigoplus_{i=1}^n y_i$ 

We now show how we mask FPR, FprMul, and FprAdd.

An intuitive approach to mask any algorithm:

- For operations like  $\land, \oplus$ : Boolean masking
- ullet For operations like  $+, \times$ : arithmetic masking

and use the following gadgets if necessary:

- A2B:  $(x_i)_{1 \le i \le n} \mapsto (y_i)_{1 \le i \le n}$  such that  $\sum_{i=1}^n x_i = \bigoplus_{i=1}^n y_i$
- B2A:  $(y_i)_{1 \le i \le n} \mapsto (x_i)_{1 \le i \le n}$  such that  $\bigoplus_{i=1}^n y_i = \sum_{i=1}^n x_i$

28 / 62

#### Overview of Our Approach

However, some operations in floating-point number arithmetic cannot be easily implemented in this way:

However, some operations in floating-point number arithmetic cannot be easily implemented in this way:

- Checking whether a secret value is nonzero
  - Given  $(x_i)$ , checking whether  $\bigoplus_{i=1}^n x_i \neq 0$  or  $\sum_{i=1}^n x_i \neq 0$

However, some operations in floating-point number arithmetic cannot be easily implemented in this way:

- Checking whether a secret value is nonzero
  - Given  $(x_i)$ , checking whether  $\bigoplus_{i=1}^n x_i \neq 0$  or  $\sum_{i=1}^n x_i \neq 0$
- Right-shifting a secret value by another secret value
  - Given  $(x_i)$  and  $(c_i)$ , right-shifting  $(x_i)$  by  $(c_i)$

However, some operations in floating-point number arithmetic cannot be easily implemented in this way:

- Checking whether a secret value is nonzero
  - Given  $(x_i)$ , checking whether  $\bigoplus_{i=1}^n x_i \neq 0$  or  $\sum_{i=1}^n x_i \neq 0$
- Right-shifting a secret value by another secret value
  - Given  $(x_i)$  and  $(c_i)$ , right-shifting  $(x_i)$  by  $(c_i)$
- Normalizing a secret value to [2<sup>63</sup>, 2<sup>64</sup>)
  - Given  $(x_i)$ , left-shifting  $(x_i)$  until its 64th bit is set

29 / 62

# Overview of Our Approach

We design novel gadgets for these three operations, including:

Keng-Yu Chen, Jiun-Peng Chen Masking Falcon's FPU September 5th, 2024

We design novel gadgets for these three operations, including:

• SecNonzero (Algorithm 6): securely checking whether a secret value is nonzero.

We design novel gadgets for these three operations, including:

- SecNonzero (Algorithm 6): securely checking whether a secret value is nonzero.
- SecFprUrsh (Algorithm 7): securely right-shifting a secret value by another secret value

We design novel gadgets for these three operations, including:

- SecNonzero (Algorithm 6): securely checking whether a secret value is nonzero.
- SecFprUrsh (Algorithm 7): securely right-shifting a secret value by another secret value
- SecFprNorm64 (Algorithm 8): securely normalizing a secret value to  $[2^{63}, 2^{64})$

We design novel gadgets for these three operations, including:

- SecNonzero (Algorithm 6): securely checking whether a secret value is nonzero.
- SecFprUrsh (Algorithm 7): securely right-shifting a secret value by another secret value
- SecFprNorm64 (Algorithm 8): securely normalizing a secret value to  $[2^{63}, 2^{64})$

In addition, we make several improvements to reduce the cost.

# Gadgets Used in Our Work

| Algorithm          | Description                         | Reference        |
|--------------------|-------------------------------------|------------------|
| SecAnd             | AND of Boolean shares               | [ISW03; Bar+16]  |
| SecMult            | Multiplication of arithmetic shares | [ISW03; Bar+16]  |
| SecAdd             | Addition of Boolean shares          | [Cor+15; Bar+18] |
| A2B                | Arithmetic to Boolean conversion    | $[Sch{+}19]$     |
| B2A                | Boolean to arithmetic conversion    | [BCZ18]          |
| B2A <sub>Bit</sub> | One-bit B2A conversion              | $[Sch{+}19]$     |
| RefreshMasks       | t-NI refresh of masks               | [Bar+16; BCZ18]  |
| Refresh            | t-SNI refresh of masks              | [Bar+16]         |

Table: List of used gadgets in our work

# Gadgets Used in Our Work

| Algorithm          | Description                         | Reference        |
|--------------------|-------------------------------------|------------------|
| SecAnd             | AND of Boolean shares               | [ISW03; Bar+16]  |
| SecMult            | Multiplication of arithmetic shares | [ISW03; Bar+16]  |
| SecAdd             | Addition of Boolean shares          | [Cor+15; Bar+18] |
| A2B                | Arithmetic to Boolean conversion    | [Sch+19]         |
| B2A                | Boolean to arithmetic conversion    | [BCZ18]          |
| B2A <sub>Bit</sub> | One-bit B2A conversion              | $[Sch{+}19]$     |
| RefreshMasks       | t-NI refresh of masks               | [Bar+16; BCZ18]  |
| Refresh            | t-SNI refresh of masks              | [Bar+16]         |

Table: List of used gadgets in our work

### Table of Contents

- Introduction
- 2 Preliminaries
- Masked Floating-Point Number Multiplication and Addition
  - Overview of Our Approach
  - SecNonzero
  - SecFprUrsh
  - SecFprNorm64
- Evaluation and Implementation
- Conclusion

We need a gadget that, given shares  $(x_i)$ , can derive one-bit shares  $(b_i)$  such that

$$\left[\left(\bigoplus_{i=1}^{n} x_{i} \neq 0\right)\right] = \bigoplus_{i=1}^{n} b_{i} \quad \text{or} \quad \left[\left(\sum_{i=1}^{n} x_{i} \neq 0\right)\right] = \bigoplus_{i=1}^{n} b_{i}$$

We need a gadget that, given shares  $(x_i)$ , can derive one-bit shares  $(b_i)$  such that

$$\left[\left(\bigoplus_{i=1}^{n} x_{i} \neq 0\right)\right] = \bigoplus_{i=1}^{n} b_{i} \quad \text{or} \quad \left[\left(\sum_{i=1}^{n} x_{i} \neq 0\right)\right] = \bigoplus_{i=1}^{n} b_{i}$$

For Boolean shares, our method is by considering OR-ing all the bits.

$$x = 0 \iff x^{(k)} \lor x^{(k-1)} \lor \cdots \lor x^{(1)} = 0$$

We need a gadget that, given shares  $(x_i)$ , can derive one-bit shares  $(b_i)$  such that

$$\left[\left(\bigoplus_{i=1}^{n} x_{i} \neq 0\right)\right] = \bigoplus_{i=1}^{n} b_{i} \quad \text{or} \quad \left[\left(\sum_{i=1}^{n} x_{i} \neq 0\right)\right] = \bigoplus_{i=1}^{n} b_{i}$$

For Boolean shares, our method is by considering OR-ing all the bits.

$$x = 0 \iff x^{(k)} \vee x^{(k-1)} \vee \cdots \vee x^{(1)} = 0$$

Now we turn to a gadget for secure OR operations.

### SecOr: OR of Boolean Shares

#### SecOr

**Input:** Boolean shares  $(x_i)_{1 \le i \le n}$  for value x

**Input:** Boolean shares  $(y_i)_{1 \le i \le n}$  for value y

**Output:** Boolean shares  $(z_i)_{1 \le i \le n}$  for value  $z = x \lor y$ 

1: 
$$(t_i)_{1 \le i \le n} \leftarrow (\neg x_1, x_2, \cdots, x_n)$$

2: 
$$(s_i)_{1 \leq i \leq n} \leftarrow (\neg y_1, y_2, \cdots, y_n)$$

3: 
$$(z_i) \leftarrow \mathsf{SecAnd}((s_i), (t_i))$$

4:  $z_1 \leftarrow \neg z_1$ 

5: **return**  $(z_i)$ 

It applies De Morgan's law and calls the AND algorithm SecAnd of shares as a subroutine.

$$x \vee y = \neg \left[ \left( \neg x \right) \wedge \left( \neg y \right) \right]$$

For arithmetic shares, instead of applying an *n*-shared A2B, we consider that

$$\sum_{i=1}^n x_i = 0 \Longleftrightarrow \sum_{i=1}^{\frac{n}{2}} x_i = \sum_{i=\frac{n}{2}+1}^n (-x_i) \Longleftrightarrow \sum_{i=1}^{\frac{n}{2}} x_i \oplus \sum_{i=\frac{n}{2}+1}^n (-x_i) = 0$$

For arithmetic shares, instead of applying an *n*-shared A2B, we consider that

$$\sum_{i=1}^{n} x_i = 0 \Longleftrightarrow \sum_{i=1}^{\frac{n}{2}} x_i = \sum_{i=\frac{n}{2}+1}^{n} (-x_i) \Longleftrightarrow \sum_{i=1}^{\frac{n}{2}} x_i \oplus \sum_{i=\frac{n}{2}+1}^{n} (-x_i) = 0$$

So we apply two n/2-shared A2Bs to the first n/2 shares and negative of the second n/2 shares and use the same idea.

For arithmetic shares, instead of applying an *n*-shared A2B, we consider that

$$\sum_{i=1}^n x_i = 0 \Longleftrightarrow \sum_{i=1}^{\frac{n}{2}} x_i = \sum_{i=\frac{n}{2}+1}^n (-x_i) \Longleftrightarrow \sum_{i=1}^{\frac{n}{2}} x_i \oplus \sum_{i=\frac{n}{2}+1}^n (-x_i) = 0$$

So we apply two n/2-shared A2Bs to the first n/2 shares and negative of the second n/2 shares and use the same idea.

In this way, we replace one n-shared A2B with two n/2-shared A2Bs, which is usually more efficient.

#### SecNonzero

11:  $\operatorname{len} \leftarrow \operatorname{len} \gg 1$ 12: **return**  $(t_i^{(1)})$ 

```
Input: Shares (x_i)_{1 \le i \le n} for value x, bitsize
Output: One-bit Boolean shares (b_i)_{1 \le i \le n} where \bigoplus_i b_i = 0 \Leftrightarrow x = 0
 1: if input (x_i) are arithmetic shares then
 2: (t_i)_{1 < i < \frac{n}{2}} \leftarrow A2B((x_i)_{1 < i < \frac{n}{2}})
 3: (t_i)_{\frac{n}{n}+1 < i < n} \leftarrow A2B((-x_i)_{\frac{n}{n}+1 < i < n})
 4: else
 5: (t_i)_{1 \le i \le n} \leftarrow (x_i)_{1 \le i \le n}
 6: len \leftarrow bitsize/2
 7: while len > 1 do
 8: (I_i) \leftarrow \text{Refresh}((t_i^{[2\text{len:len}]}), \text{len})
      (r_i) \leftarrow (t_i^{[\mathsf{len}:1]})
10: (t_i) \leftarrow SecOr((l_i), (r_i))
```

### Table of Contents

- Introduction
- 2 Preliminaries
- Masked Floating-Point Number Multiplication and Addition
  - Overview of Our Approach
  - SecNonzero
  - SecFprUrsh
  - SecFprNorm64
- 4 Evaluation and Implementation
- Conclusion

Given 64-bit shares  $(x_i)$  and 6-bit  $(c_i)$ , we need to derive shares  $(z_i)$  such that

$$\bigoplus_{i=1}^{n} z_{i} = \left( \left( \bigoplus_{i=1}^{n} x_{i} \right) \gg \left( \sum_{i=1}^{n} c_{i} \bmod 2^{6} \right) \right) \vee \left[ \bigoplus_{i=1}^{n} x_{i}^{[c:1]} \neq 0 \right]$$

Given 64-bit shares  $(x_i)$  and 6-bit  $(c_i)$ , we need to derive shares  $(z_i)$  such that

$$\bigoplus_{i=1}^{n} z_{i} = \left( \left( \bigoplus_{i=1}^{n} x_{i} \right) \gg \left( \sum_{i=1}^{n} c_{i} \bmod 2^{6} \right) \right) \vee \left[ \left( \bigoplus_{i=1}^{n} x_{i}^{[c:1]} \neq 0 \right) \right]$$

We observe that

Given 64-bit shares  $(x_i)$  and 6-bit  $(c_i)$ , we need to derive shares  $(z_i)$  such that

$$\bigoplus_{i=1}^{n} z_{i} = \left( \left( \bigoplus_{i=1}^{n} x_{i} \right) \gg \left( \sum_{i=1}^{n} c_{i} \bmod 2^{6} \right) \right) \vee \left[ \left( \bigoplus_{i=1}^{n} x_{i}^{[c:1]} \neq 0 \right) \right]$$

We observe that

• Right-shifting and right-rotating by a value c only differ by the most c significant bits.

Given 64-bit shares  $(x_i)$  and 6-bit  $(c_i)$ , we need to derive shares  $(z_i)$  such that

$$\bigoplus_{i=1}^{n} z_{i} = \left( \left( \bigoplus_{i=1}^{n} x_{i} \right) \gg \left( \sum_{i=1}^{n} c_{i} \bmod 2^{6} \right) \right) \vee \left[ \left( \bigoplus_{i=1}^{n} x_{i}^{[c:1]} \neq 0 \right) \right]$$

We observe that

- Right-shifting and right-rotating by a value c only differ by the most c significant bits.
- Both shifting and rotating can be operated share-wise.

Given 64-bit shares  $(x_i)$  and 6-bit  $(c_i)$ , we need to derive shares  $(z_i)$  such that

$$\bigoplus_{i=1}^{n} z_{i} = \left( \left( \bigoplus_{i=1}^{n} x_{i} \right) \gg \left( \sum_{i=1}^{n} c_{i} \bmod 2^{6} \right) \right) \vee \left[ \left( \bigoplus_{i=1}^{n} x_{i}^{[c:1]} \neq 0 \right) \right]$$

We observe that

- Right-shifting and right-rotating by a value c only differ by the most c significant bits.
- Both shifting and rotating can be operated share-wise.
- Right-rotating x by a value c is equal to right-rotating x by a value c mod 64.

Hence, our idea is to right-rotate all  $(x_i)$  by  $c_1, c_2, \dots, c_n$  sequentially.

38 / 62

Hence, our idea is to right-rotate all  $(x_i)$  by  $c_1, c_2, \cdots, c_n$  sequentially.

Some high bits are redundant, so we use an index  $m = (1 \ll 63)$  to indicate the first meaningful bit of the result.

Keng-Yu Chen, Jiun-Peng Chen

Hence, our idea is to right-rotate all  $(x_i)$  by  $c_1, c_2, \dots, c_n$  sequentially.

Some high bits are redundant, so we use an index  $m = (1 \ll 63)$  to indicate the first meaningful bit of the result. To clear the redundant high bits, consider

$$m':=m\gg c=(\underbrace{0,\cdots,0}_{c \text{ bits}},1,0,\cdots,0)$$

Hence, our idea is to right-rotate all  $(x_i)$  by  $c_1, c_2, \dots, c_n$  sequentially.

Some high bits are redundant, so we use an index  $m = (1 \ll 63)$  to indicate the first meaningful bit of the result. To clear the redundant high bits, consider

$$m':=m\gg c=\underbrace{(0,\cdots,0,}_{c\text{ bits}},1,0,\cdots,0)$$

$$m'' := m' \oplus (m' \gg 1) \oplus \cdots \oplus (m' \gg 63) = (\underbrace{0, \cdots, 0}_{c \text{ bits}}, 1, 1, \cdots, 1)$$

Hence, our idea is to right-rotate all  $(x_i)$  by  $c_1, c_2, \dots, c_n$  sequentially.

Some high bits are redundant, so we use an index  $m = (1 \ll 63)$  to indicate the first meaningful bit of the result. To clear the redundant high bits, consider

$$m':=m\gg c=(\underbrace{0,\cdots,0}_{c \text{ bits}},1,0,\cdots,0)$$

$$m'' := m' \oplus (m' \gg 1) \oplus \cdots \oplus (m' \gg 63) = (\underbrace{0, \cdots, 0}_{\text{c bits}}, 1, 1, \cdots, 1)$$

By an AND operation with m'', we can clear useless bits. Moreover, these redundant bits actually form the sticky bit.

#### SecFprUrsh

```
Input: 64-bit Boolean shares (x_i)_{1 \le i \le n}
                                                                    7: len \leftarrow 1
                                                                    8: while len \leq 32 do
Input: 6-bit arithmetic shares (c_i)_{1 \le i \le n}
Output: Boolean shares (z_i)_{1 \le i \le n} for value
                                                                    9: (m_i) \leftarrow (m_i \oplus (m_i \gg \text{len}))
     z = x \gg c with the sticky bit preserved
                                                                   10: len \leftarrow len \ll 1
 1: (m_i)_{1 \le i \le n} \leftarrow ((1 \ll 63), 0, \cdots, 0)
                                                                   11: (y_i) \leftarrow \mathsf{SecAnd}((x_i), (m_i))
 2: for i = 1 to n do
                                                                   12: (z_i) \leftarrow (v_i \oplus x_i \oplus v_i^{(1)})
         Right-rotate (x_i) by c_i
                                                                   13: (b_i) \leftarrow SecNonzero((z_i))
        (x_i) \leftarrow \mathsf{RefreshMasks}((x_i))
                                                                   14: (z_i) \leftarrow (v_i^{[64:2]} \vee b_i)
        Right-rotate (m_i) by c_i
                                                                   15: return (z_i)
        (m_i) \leftarrow \mathsf{RefreshMasks}((m_i))
```

### Table of Contents

- Introduction
- 2 Preliminaries
- Masked Floating-Point Number Multiplication and Addition
  - Overview of Our Approach
  - SecNonzero
  - SecFprUrsh
  - SecFprNorm64
- 4 Evaluation and Implementation
- Conclusion

September 5th, 2024

Given 64-bit shares  $(x_i)$  and 16-bit shares  $(e_i)$ , we need to derive new  $(x_i')$  and  $(e_i')$  such that if c is the smallest integer such that  $((\oplus_{i=1}^n x_i) \ll c) \in [2^{63}, 2^{64})$ 

then 
$$(\bigoplus_{i=1}^n x_i') = ((\bigoplus_{i=1}^n x_i) \ll c)$$
 and  $\sum_{i=1}^n e_i' = (\sum_{i=1}^n e_i) - c$ 

Keng-Yu Chen, Jiun-Peng Chen

Given 64-bit shares  $(x_i)$  and 16-bit shares  $(e_i)$ , we need to derive new  $(x_i')$  and  $(e_i')$  such that if c is the smallest integer such that  $((\oplus_{i=1}^n x_i) \ll c) \in [2^{63}, 2^{64})$ 

then 
$$(\bigoplus_{i=1}^n x_i') = ((\bigoplus_{i=1}^n x_i) \ll c)$$
 and  $\sum_{i=1}^n e_i' = (\sum_{i=1}^n e_i) - c$ 

We can repeatedly check whether  $(x_i^{(64)}) = 0$ , conditionally shift by 1 bit, and then decrease  $(e_i)$  by  $[(x_i^{(64)}) = 0]$ .

Given 64-bit shares  $(x_i)$  and 16-bit shares  $(e_i)$ , we need to derive new  $(x_i')$  and  $(e_i')$  such that if c is the smallest integer such that  $((\oplus_{i=1}^n x_i) \ll c) \in [2^{63}, 2^{64})$ 

then 
$$(\oplus_{i=1}^n x_i') = ((\oplus_{i=1}^n x_i) \ll c)$$
 and  $\sum_{i=1}^n e_i' = (\sum_{i=1}^n e_i) - c$ 

We can repeatedly check whether  $(x_i^{(64)}) = 0$ , conditionally shift by 1 bit, and then decrease  $(e_i)$  by  $[(x_i^{(64)}) = 0]$ .

To improve efficiency, we consider sequentially checking  $x^{[64:64-2^j]}=0$  for  $j=5,4,\cdots,0$ .

Given 64-bit shares  $(x_i)$  and 16-bit shares  $(e_i)$ , we need to derive new  $(x_i')$  and  $(e_i')$  such that if c is the smallest integer such that  $((\oplus_{i=1}^n x_i) \ll c) \in [2^{63}, 2^{64})$ 

then 
$$(\bigoplus_{i=1}^n x_i') = ((\bigoplus_{i=1}^n x_i) \ll c)$$
 and  $\sum_{i=1}^n e_i' = (\sum_{i=1}^n e_i) - c$ 

We can repeatedly check whether  $(x_i^{(64)}) = 0$ , conditionally shift by 1 bit, and then decrease  $(e_i)$  by  $[(x_i^{(64)}) = 0]$ .

To improve efficiency, we consider sequentially checking  $x^{[64:64-2^j]}=0$  for  $j=5,4,\cdots,0$ . In addition, we first decrease  $(e_i)$  by 63 and later add  $[(x_i^{[64:64-2^j]}) \neq 0] \cdot 2^j$  to it.

# SecFprNorm64

#### SecFprNorm64

```
Input: 64-bit Boolean shares (x_i)_{1 \le i \le n}
Input: 16-bit arithmetic shares (e_i)_{1 \le i \le n}
Output: Normalized (x_i)_{1 \le i \le n} in [2^{6\overline{3}}, \overline{2}^{64}) and (e_i)_{1 \le i \le n} with shift added
 1: e_1 \leftarrow e_1 - 63
 2: for i = 5 to 0 do
 3: (t_i) \leftarrow (x_i \oplus (x_i \ll 2^j))
 4: (n_i) \leftarrow (x_i \gg (64 - 2^j))
 5: (b_i) \leftarrow \text{SecNonzero}((n_i))
 6: (b_i') \leftarrow (-b_i)
 7: (t_i) \leftarrow \operatorname{SecAnd}((t_i), (\neg b'_1, b'_2, \dots, b'_n))
 8: (x_i) \leftarrow (x_i \oplus t_i)
 9: (b_i) \leftarrow B2A_{Bit}((b_i))
      (e_i) \leftarrow (e_i + (b_i \ll i))
10:
11: return (x_i), (e_i)
```

Utilizing these new gadgets SecNonzero, SecFprUrsh, and SecFprNorm64, we design the following algorithms:

Keng-Yu Chen, Jiun-Peng Chen Masking Falcon's FPU September 5th, 2024

43 / 62

Utilizing these new gadgets SecNonzero, SecFprUrsh, and SecFprNorm64, we design the following algorithms:

• SecFPR: Secure FPR by masking.

Utilizing these new gadgets SecNonzero, SecFprUrsh, and SecFprNorm64, we design the following algorithms:

- SecFPR: Secure FPR by masking.
- SecFprMul: Secure FprMul by masking.

Utilizing these new gadgets SecNonzero, SecFprUrsh, and SecFprNorm64, we design the following algorithms:

- SecFPR: Secure FPR by masking.
- SecFprMul: Secure FprMul by masking.
- SecFprAdd: Secure FprAdd by masking.

Utilizing these new gadgets SecNonzero, SecFprUrsh, and SecFprNorm64, we design the following algorithms:

- SecFPR: Secure FPR by masking.
- SecFprMul: Secure FprMul by masking.
- SecFprAdd: Secure FprAdd by masking.

We leave the details of the implementations and several tricks for improvements in Appendix.

### Table of Contents

- Introduction
- 2 Preliminaries
- Masked Floating-Point Number Multiplication and Addition
- Evaluation and Implementation
  - Security
  - Performance
- Conclusion

### Table of Contents

- Introduction
- 2 Preliminaries
- Masked Floating-Point Number Multiplication and Addition
- Evaluation and Implementation
  - Security
  - Performance
- Conclusion

To theoretically evaluate the security of our design, we consider the probing model [ISW03].

Keng-Yu Chen, Jiun-Peng Chen Masking Falcon's FPU September 5th, 2024

46 / 62

To theoretically evaluate the security of our design, we consider the probing model [ISW03].

• The *t*-probing model assumes that an adversary is able to peek any *t* intermediate values in the algorithm.

To theoretically evaluate the security of our design, we consider the probing model [ISW03].

- The *t*-probing model assumes that an adversary is able to peek any *t* intermediate values in the algorithm.
- To be secure in t-probing model,  $n \ge t + 1$ , and any share cannot be combined with each other.

To theoretically evaluate the security of our design, we consider the probing model [ISW03].

- The *t*-probing model assumes that an adversary is able to peek any *t* intermediate values in the algorithm.
- To be secure in *t*-probing model,  $n \ge t + 1$ , and any share cannot be combined with each other.
- It is complicated to prove *t*-probing security for a large composition of small gadgets. The concept of non-interference is convenient in this case.

### t-Non-Interference (t-NI) Security (from [Bar+16])

A gadget is t-Non-Interference (t-NI) secure if every set of t intermediate values can be simulated by no more than t shares of each of its inputs.

### t-Strong Non-Interference (t-SNI) Security (from [Bar+16])

A gadget is t-Strong-Non-Interference (t-SNI) secure if for every set of  $t_I$  internal intermediate values and  $t_O$  of its output shares with  $t_I + t_O \le t$ , they can be simulated by no more than  $t_I$ shares of each of its inputs.

For t = n - 1, if a gadget is t-NI or t-SNI secure, and if any n - 1 input shares are uniformly and independently distributed, then it is t-probing secure.

For t = n - 1, if a gadget is t-NI or t-SNI secure, and if any n - 1 input shares are uniformly and independently distributed, then it is t-probing secure.

Moreover,

For t = n - 1, if a gadget is t-NI or t-SNI secure, and if any n - 1 input shares are uniformly and independently distributed, then it is t-probing secure.

#### Moreover,

• *t*-SNI is stronger than *t*-NI by definition.

For t = n - 1, if a gadget is t-NI or t-SNI secure, and if any n - 1 input shares are uniformly and independently distributed, then it is t-probing secure.

#### Moreover,

- *t*-SNI is stronger than *t*-NI by definition.
- A composition of t-NI gadgets may not be t-NI, so we insert t-SNI gadgets to make it t-NI or t-SNI.

For t = n - 1, if a gadget is t-NI or t-SNI secure, and if any n - 1 input shares are uniformly and independently distributed, then it is t-probing secure.

#### Moreover,

- t-SNI is stronger than t-NI by definition.
- A composition of t-NI gadgets may not be t-NI, so we insert t-SNI gadgets to make it t-NI or t-SNI.

All the gadgets/algorithms in our paper are proven either t-NI or t-SNI secure.

# Gadgets/Algorithms in Our Work

| Algorithm          | Security | Algorithm    | Security |
|--------------------|----------|--------------|----------|
| SecAnd             | t-SNI    | SecOr        | t-SNI    |
| SecMult            | t-SNI    | SecNonzero   | t-SNI    |
| SecAdd             | t-NI     | SecFprUrsh   | t-SNI    |
| A2B                | t-SNI    | SecFprNorm64 | t-NI     |
| B2A                | t-SNI    | SecFPR       | t-SNI    |
| B2A <sub>Bit</sub> | t-SNI    | SecFprMul    | t-SNI    |
| RefreshMasks       | t-NI     | SecFprAdd    | t-SNI    |
| Refresh            | t-SNI    |              |          |

Table: List of gadgets/algorithms in our work with n = t + 1 shares

Probing model validates the security theoretically.

50 / 62

Probing model validates the security theoretically. In practice, the Test Vector Leakage Assessment (TVLA) methodology [GJR+11] can be applied.

Probing model validates the security theoretically.

In practice, the Test Vector Leakage Assessment (TVLA) methodology [GJR+11] can be applied.

A tester records two sets of traces where

Probing model validates the security theoretically.

In practice, the Test Vector Leakage Assessment (TVLA) methodology [GJR+11] can be applied.

A tester records two sets of traces where

• Set 1: fixed input

Probing model validates the security theoretically.

In practice, the Test Vector Leakage Assessment (TVLA) methodology [GJR+11] can be applied.

A tester records two sets of traces where

- Set 1: fixed input
- Set 2: random input

Probing model validates the security theoretically.

In practice, the Test Vector Leakage Assessment (TVLA) methodology [GJR+11] can be applied.

A tester records two sets of traces where

- Set 1: fixed input
- Set 2: random input

The Welch's *t*-test is then applied on the two sets.

Probing model validates the security theoretically.

In practice, the Test Vector Leakage Assessment (TVLA) methodology [GJR+11] can be applied.

A tester records two sets of traces where

- Set 1: fixed input
- Set 2: random input

The Welch's *t*-test is then applied on the two sets.

By convention, we consider the leakage is significant if the t-value exceeds  $\pm 4.5$ .

Probing model validates the security theoretically.

In practice, the Test Vector Leakage Assessment (TVLA) methodology [GJR+11] can be applied.

A tester records two sets of traces where

- Set 1: fixed input
- Set 2: random input

The Welch's *t*-test is then applied on the two sets.

By convention, we consider the leakage is significant if the t-value exceeds  $\pm 4.5$ .

For traces with a large number of points, we refer to [Din+17] alter this threshold to avoid false positives.

We implement our algorithms in the following setting:

Keng-Yu Chen, Jiun-Peng Chen Masking Falcon's FPU September 5th, 2024

51 / 62

We implement our algorithms in the following setting:

Plain-C code

We implement our algorithms in the following setting:

- Plain-C code
- Compiled by arm-none-eabi-gcc 10.3.1

We implement our algorithms in the following setting:

- Plain-C code
- Compiled by arm-none-eabi-gcc 10.3.1
- Using ChipWhisperer with target board STM32F303 with an ARM Cortex-M4 MCU

We implement our algorithms in the following setting:

- Plain-C code
- Compiled by arm-none-eabi-gcc 10.3.1
- Using ChipWhisperer with target board STM32F303 with an ARM Cortex-M4 MCU
- We compare the result with the reference implementation of the NIST Round-3 Submission of FALCON [Pre+20].

### **TVLA**

#### The TVLA results of floating-point number multiplication (FprMul, SecFprMul).



Figure: 1,000 traces, unmasked FprMul



Figure: 10,000 traces, 2-shared SecFprMul



Figure: 100,000 traces, 3-shared SecFprMul

### **TVLA**

### The TVLA results of floating-point number addition (FprAdd, SecFprAdd).



Figure: 1,000 traces, unmasked FprAdd



Figure: 10,000 traces, 2-shared SecFprAdd



Figure: 100,000 traces, 3-shared SecFprAdd

### Table of Contents

- Introduction
- 2 Preliminaries
- Masked Floating-Point Number Multiplication and Addition
- Evaluation and Implementation
  - Security
  - Performance
- Conclusion

### Performance Evaluation on ARM Cortex-M4

| Algorithm |                       | Cycles   |             |              |
|-----------|-----------------------|----------|-------------|--------------|
|           |                       | Unmasked | 2 Shares    | 3 Shares     |
|           | Total                 | 308      | 7134 (23×)  | 36388 (118×) |
| SecFprMul | 128-bit A2B           | -        | 1619        | 19253        |
|           | 64-bit SecNonzero     | -        | 389         | 1350         |
|           | Two 16-bit SecNonzero | -        | 662         | 2012         |
|           | SecFPR                | -        | 3362        | 10813        |
|           | #randombytes          | -        | 333         | 2005         |
|           | Total                 | 487      | 17154 (35×) | 48291 (99×)  |
| SecFprAdd | Three 64-bit SecAdd   | -        | 6990        | 16956        |
|           | Two 16-bit B2A        | -        | 88          | 332          |
|           | 16-bit A2B            | -        | 146         | 2267         |
|           | SecFprUrsh            | -        | 1112        | 3214         |
|           | SecFprNorm64          | -        | 2846        | 7270         |
|           | SecFPR                | -        | 3362        | 10813        |
|           | #randombytes          | -        | 849         | 2691         |

## Performance Evaluation on ARM Cortex-M4

| Algorithm |                       | Cycles   |             |              |
|-----------|-----------------------|----------|-------------|--------------|
|           |                       | Unmasked | 2 Shares    | 3 Shares     |
|           | Total                 | 308      | 7134 (23×)  | 36388 (118×) |
| SecFprMul | 128-bit A2B           | -        | 1619        | 19253        |
|           | 64-bit SecNonzero     | -        | 389         | 1350         |
|           | Two 16-bit SecNonzero | -        | 662         | 2012         |
|           | SecFPR                | -        | 3362        | 10813        |
|           | #randombytes          | -        | 333         | 2005         |
|           | Total                 | 487      | 17154 (35×) | 48291 (99×)  |
| SecFprAdd | Three 64-bit SecAdd   | -        | 6990        | 16956        |
|           | Two 16-bit B2A        | -        | 88          | 332          |
|           | 16-bit A2B            | -        | 146         | 2267         |
|           | SecFprUrsh            | -        | 1112        | 3214         |
|           | SecFprNorm64          | -        | 2846        | 7270         |
|           | SecFPR                | -        | 3362        | 10813        |
|           | #randombytes          | -        | 849         | 2691         |

# Performance Evaluation on General Purpose CPU

We also test the time for signing one message on Intel-Core i9-12900 KF.

| Security Level | Unmasked | 2 Shares       | 3 Shares         |
|----------------|----------|----------------|------------------|
| Falcon-512     | 246.56   | 1905.55 (7.7×) | 6137.25 (24.9×)  |
| Falcon-1024    | 501.62   | 3819.76 (7.6×) | 12287.29 (24.5×) |

Table: Time (in microseconds) for signing a message on Intel-Core i9-12900KF CPU.

## Table of Contents

- Introduction
- 2 Preliminaries
- Masked Floating-Point Number Multiplication and Addition
- Evaluation and Implementation
- Conclusion

In this paper,

Keng-Yu Chen, Jiun-Peng Chen

### In this paper,

• We present the first masking algorithm for floating-point number multiplication and addition to protect the pre-image vector computation.

- We present the first masking algorithm for floating-point number multiplication and addition to protect the pre-image vector computation.
- We design novel gadgets SecNonzero, SecFprUrsh, and SecFprNorm64 to mask the algorithms.

- We present the first masking algorithm for floating-point number multiplication and addition to protect the pre-image vector computation.
- We design novel gadgets SecNonzero, SecFprUrsh, and SecFprNorm64 to mask the algorithms.
- All our masked algorithms are proven t-NI or t-SNI secure they are t-probing secure.

- We present the first masking algorithm for floating-point number multiplication and addition to protect the pre-image vector computation.
- We design novel gadgets SecNonzero, SecFprUrsh, and SecFprNorm64 to mask the algorithms.
- All our masked algorithms are proven t-NI or t-SNI secure they are t-probing secure.
- The TVLA result shows no leakage in the 2-shared version in 10,000 traces, and no leakage in the 3-shared version in 100,000 traces.

- We present the first masking algorithm for floating-point number multiplication and addition to protect the pre-image vector computation.
- We design novel gadgets SecNonzero, SecFprUrsh, and SecFprNorm64 to mask the algorithms.
- All our masked algorithms are proven t-NI or t-SNI secure they are t-probing secure.
- The TVLA result shows no leakage in the 2-shared version in 10,000 traces, and no leakage in the 3-shared version in 100,000 traces.
- Our countermeasure when compared to the unmasked reference implementation is much slower. Improved SecAdd and A2B can reduce the cost.

# Reference I

- [ISW03] Yuval Ishai, Amit Sahai, and David Wagner. "Private Circuits: Securing Hardware against Probing Attacks".
  In: CRYPTO 2003. Ed. by Dan Boneh. Vol. 2729. LNCS. Springer, Heidelberg, Aug. 2003, pp. 463–481.
  DOI: 10.1007/978-3-540-45146-4\_27.
- [GPV08] Craig Gentry, Chris Peikert, and Vinod Vaikuntanathan. "Trapdoors for hard lattices and new cryptographic constructions". In: 40th ACM STOC. Ed. by Richard E. Ladner and Cynthia Dwork. ACM Press, May 2008, pp. 197–206. DOI: 10.1145/1374376.1374407.
- [Ber+10] Guido Bertoni et al. "Building power analysis resistant implementations of Keccak". In: Second SHA-3 candidate conference. Vol. 142. Citeseer. 2010.
- [GJR+11] Benjamin Jun Gilbert Goodwill, Josh Jaffe, Pankaj Rohatgi, et al. "A testing methodology for side-channel resistance validation". In: NIST non-invasive attack testing workshop. Vol. 7, 2011, pp. 115–136.
- [Bel+13] Sonia Belaïd et al. "Differential power analysis of HMAC SHA-2 in the Hamming weight model". In: 2013 International Conference on Security and Cryptography (SECRYPT). 2013, pp. 1–12.
- [Cor+15] Jean-Sébastien Coron et al. "Conversion from Arithmetic to Boolean Masking with Logarithmic Complexity". In: FSE 2015. Ed. by Gregor Leander. Vol. 9054. LNCS. Springer, Heidelberg, Mar. 2015, pp. 130–149. DOI: 10.1007/978-3-662-48116-5\_7.
- [Bar+16] Gilles Barthe et al. "Strong Non-Interference and Type-Directed Higher-Order Masking". In: ACM CCS 2016. Ed. by Edgar R. Weippl et al. ACM Press, Oct. 2016, pp. 116–129. DOI: 10.1145/2976749.2978427.

## Reference II

- [DP16] Léo Ducas and Thomas Prest. "Fast fourier orthogonalization". In: Proceedings of the ACM on International Symposium on Symbolic and Algebraic Computation. 2016, pp. 191–198.
- [Din+17] A. Adam Ding et al. "Towards Sound and Optimal Leakage Detection Procedure". In: Smart Card Research and Advanced Applications 16th International Conference, CARDIS 2017, Lugano, Switzerland, November 13-15, 2017, Revised Selected Papers. Ed. by Thomas Eisenbarth and Yannick Teglia. Vol. 10728. Lecture Notes in Computer Science. Springer, 2017, pp. 105-122. DOI: 10.1007/978-3-319-75208-2\\_7. URL: https://doi.org/10.1007/978-3-319-75208-2\\_5C\_7.
- [Bar+18] Gilles Barthe et al. "Masking the GLP Lattice-Based Signature Scheme at Any Order". In: EUROCRYPT 2018, Part II. Ed. by Jesper Buus Nielsen and Vincent Rijmen. Vol. 10821. LNCS. Springer, Heidelberg, Apr. 2018, pp. 354–384. DOI: 10.1007/978-3-319-78375-8\_12.
- [BCZ18] Luk Bettale, Jean-Sébastien Coron, and Rina Zeitoun. "Improved High-Order Conversion From Boolean to Arithmetic Masking". In: IACR TCHES 2018.2 (2018). https://tches.iacr.org/index.php/TCHES/article/view/873, pp. 22-45. ISSN: 2569-2925. DOI: 10.13154/tches.v2018.i2.22-45.
- [Mig+19] Vincent Migliore et al. "Masking Dilithium Efficient Implementation and Side-Channel Evaluation". In: ACNS 19. Ed. by Robert H. Deng et al. Vol. 11464. LNCS. Springer, Heidelberg, June 2019, pp. 344–362. DOI: 10.1007/978-3-030-21568-2\_17.

### Reference III

- [Sch+19] Tobias Schneider et al. "Efficiently Masking Binomial Sampling at Arbitrary Orders for Lattice-Based Crypto". In: PKC 2019, Part II. Ed. by Dongdai Lin and Kazue Sako. Vol. 11443. LNCS. Springer, Heidelberg, Apr. 2019, pp. 534–564. DOI: 10.1007/978-3-030-17259-6\_18.
- [How+20] James Howe et al. "Isochronous Gaussian Sampling: From Inception to Implementation". In: Post-Quantum Cryptography 11th International Conference, PQCrypto 2020. Ed. by Jintai Ding and Jean-Pierre Tillich. Springer, Heidelberg, 2020, pp. 53–71. DOI: 10.1007/978-3-030-44223-1\_5.
- [Pre+20] Thomas Prest et al. *FALCON*. Tech. rep. available at https://csrc.nist.gov/projects/post-quantum-cryptography/post-quantum-cryptography-standardization/round-3-submissions. National Institute of Standards and Technology, 2020.
- [Bos+21] Joppe W. Bos et al. "Masking Kyber: First- and Higher-Order Implementations". In: IACR TCHES 2021.4 (2021). https://tches.iacr.org/index.php/TCHES/article/view/9064, pp. 173-214. ISSN: 2569-2925. DOI: 10.46586/tches.v2021.i4.173-214.
- [KA21] Emre Karabulut and Aydin Aysu. "FALCON Down: Breaking FALCON Post-Quantum Signature Scheme through Side-Channel Attacks". In: 2021 58th ACM/IEEE Design Automation Conference (DAC). 2021, pp. 691–696. DOI: 10.1109/DAC18074.2021.9586131.
- [Fri+22] Tim Fritzmann et al. "Masked Accelerators and Instruction Set Extensions for Post-Quantum Cryptography". In: IACR TCHES 2022.1 (2022), pp. 414–460. DOI: 10.46586/tches.v2022.i1.414-460.

## Reference IV

- [Gue+22] Morgane Guerreau et al. "The Hidden Parallelepiped Is Back Again: Power Analysis Attacks on Falcon". In: IACR TCHES 2022.3 (2022), pp. 141–164. DOI: 10.46586/tches.v2022.i3.141–164.
- [Hei+22] Daniel Heinz et al. First-Order Masked Kyber on ARM Cortex-M4. Cryptology ePrint Archive, Report 2022/058. https://eprint.iacr.org/2022/058. 2022.
- [Zha+23] Shiduo Zhang et al. "Improved Power Analysis Attacks on Falcon". In: EUROCRYPT 2023, Part IV. Ed. by Carmit Hazay and Martijn Stam. Vol. 14007. LNCS. Springer, Heidelberg, Apr. 2023, pp. 565–595. DOI: 10.1007/978-3-031-30634-1.19.

### Table of Contents

- 6 Appendix Algorithms of Floating-Point Number Arithmetic
- Appendix Details of Our Design

# Floating-Point Number Packing and Rounding

#### **FPR**

**Input:** Sign bit s, exponent e, and 55-bit mantissa z **Output:** FPN x packed by s, e, z

- 1:  $e \leftarrow e + 1076$
- 2:  $b \leftarrow \llbracket e < 0 \rrbracket$
- 3:  $z \leftarrow z \land (b-1)$
- 4:  $b \leftarrow [z \neq 0]$
- 5:  $e \leftarrow e \land (-b)$
- 6:  $x \leftarrow ((s \ll 63) \lor (z \gg 2)) + e \ll 52$
- 7:  $f \leftarrow 0XC8 \gg z^{[3:1]}$
- 8:  $x \leftarrow x + f^{(1)}$  {increment if  $z^{[3:1]}$  is 011,110 or 111}
- 9: return x

# Floating-Point Number Multiplication

# FprMul

**Input:** FPN 
$$x = (sx, ex, mx)$$

**Input:** FPN 
$$y = (sy, ey, my)$$

**Output:** FPN product of x and y

1: 
$$s \leftarrow sx \oplus sy$$

2: 
$$e \leftarrow ex + ey - 2100$$

3: 
$$z \leftarrow mx \times my$$

4: 
$$b \leftarrow [z^{[50:1]} \neq 0]$$

5: 
$$z \leftarrow z^{[106:51]} \lor b$$

6: 
$$z' \leftarrow (z \gg 1) \lor z^{(1)}$$

7: 
$$w \leftarrow z^{(106)}$$

8: 
$$z \leftarrow z \oplus (z \oplus z') \wedge (-w)$$

9: 
$$e \leftarrow e + w$$

10: 
$$bx \leftarrow [ex \neq 0], by \leftarrow [ey \neq 0]$$

11: 
$$b \leftarrow bx \wedge by$$

12: 
$$z \leftarrow z \land (-b)$$

13: **return** 
$$FPR(s, e, z)$$

# Floating-Point Number Addition

## **FprAdd**

#### **Input:** FPNs x and y

**Output:** FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \lor ((1-(-d)^{(64)}) \land x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, y \leftarrow y \oplus m$$

5: Extract (sx, ex, mx) and (sy, ey, my) from x, y, respectively.

6: 
$$mx \leftarrow mx \ll 3$$
,  $my \leftarrow my \ll 3$ 

7: 
$$ex \leftarrow ex - 1078$$
,  $ey \leftarrow ey - 1078$ 

8: 
$$c \leftarrow ex - ev$$

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee \lceil my^{[c:1]} \neq 0 \rceil$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of z set

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** 
$$FPR(sx, ex, z)$$

## Table of Contents

- 6 Appendix Algorithms of Floating-Point Number Arithmetic
- 🕜 Appendix Details of Our Design
  - Simple Tricks
  - SecFPR: Secure FPR
  - SecFprMul: Secure FprMul
  - SecFprAdd: Secure FprAdd

## Table of Contents

- 6 Appendix Algorithms of Floating-Point Number Arithmetic
- 🕜 Appendix Details of Our Design
  - Simple Tricks
  - SecFPR: Secure FPR
  - SecFprMul: Secure FprMul
  - SecFprAdd: Secure FprAdd

# Simple Tricks

If we want to operate the following operations:

1: **if** 
$$a = 0$$
 **then**

2: 
$$b \leftarrow 0$$

Suppose a is one-bit, we may write it as

1: 
$$b \leftarrow b \wedge (-a)$$

We also apply the same idea for Boolean-shared values in our design

1: 
$$(b_i) \leftarrow \mathsf{SecAnd}((b_i), (-a_i))$$

We utilize that  $\bigoplus_{i=1}^{n} -a_i = -a$ , which is not true for general k-bit a.

# Simple Tricks

Similarly, for operations

1: **if** 
$$a = 1$$
 **then**

2: 
$$b \leftarrow 0$$

Suppose a is one-bit, we may write it as

1: 
$$b \leftarrow b \wedge (\neg(-a))$$

For Boolean-shared values,

1: 
$$(c_i) \leftarrow (-a_i)$$

2: 
$$(b_i) \leftarrow (\neg a_i)$$
  
2:  $(b_i) \leftarrow \mathsf{SecAnd}((b_i), (\neg c_1, c_2, \cdots, c_n))$ 

# Simple Tricks

Moreover, for operations,

1: **if** 
$$a = 1$$
 **then**

2: 
$$b \leftarrow c$$

Suppose a is one-bit, we may write it as

1: 
$$d \leftarrow b \oplus c$$

2: 
$$b \leftarrow b \oplus (d \wedge (-a))$$

For Boolean-shared values,

1: 
$$(d_i) \leftarrow (b_i \oplus c_i)$$

2: 
$$(d_i) \leftarrow \mathsf{SecAnd}((d_i), (-a_i))$$

3: 
$$(b_i) \leftarrow (b_i \oplus d_i)$$

# Simple Tricks in Masking FPR

#### **FPR**

**Input:** Sign bit *s*, exponent *e*, and 55-bit mantissa *z* **Output:** FPN *x* packed by *s*, *e*, *z* 

1:  $e \leftarrow e + 1076$ 

2:  $b \leftarrow \llbracket e < 0 \rrbracket$ 

3:  $z \leftarrow z \land (b-1)$ 

4:  $b \leftarrow [z \neq 0]$ 

5:  $e \leftarrow e \land (-b)$ 

6:  $x \leftarrow ((s \ll 63) \lor (z \gg 2)) + e \ll 52$ 

7:  $f \leftarrow 0XC8 \gg z^{[3:1]}$ 

8:  $x \leftarrow x + f^{(1)}$  {increment if  $z^{[3:1]}$  is 011,110 or 111}

9: return x

# Simple Tricks in Masking FprMul

## **FprMul**

**Input:** FPN 
$$x = (sx, ex, mx)$$

**Input:** FPN 
$$y = (sy, ey, my)$$

**Output:** FPN product of x and y

1: 
$$s \leftarrow sx \oplus sy$$

2: 
$$e \leftarrow ex + ev - 2100$$

3: 
$$z \leftarrow mx \times my$$

4: 
$$b \leftarrow [z^{[50:1]} \neq 0]$$

5: 
$$z \leftarrow z^{[106:51]} \lor b$$

6: 
$$z' \leftarrow (z \gg 1) \lor z^{(1)}$$

7: 
$$w \leftarrow z^{(106)}$$

8: 
$$z \leftarrow z \oplus (z \oplus z') \wedge (-w)$$

9: 
$$e \leftarrow e + w$$

10: 
$$bx \leftarrow [ex \neq 0], by \leftarrow [ey \neq 0]$$

11: 
$$b \leftarrow bx \land by$$

12: 
$$z \leftarrow z \wedge (-b)$$

13: **return** 
$$FPR(s, e, z)$$

# Simple Tricks in Masking FprAdd

## **FprAdd**

**Input:** FPNs x and y

**Output:** FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \vee ((1-(-d)^{(64)}) \wedge x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, y \leftarrow y \oplus m$$

5: Extract (sx, ex, mx) and (sy, ey, my) from x, y, respectively.

6: 
$$mx \leftarrow mx \ll 3, my \leftarrow my \ll 3$$

7: 
$$ex \leftarrow ex - 1078$$
,  $ey \leftarrow ey - 1078$ 

8: 
$$c \leftarrow ex - ev$$

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee \llbracket my^{[c:1]} \neq 0 \rrbracket$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of z set

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** 
$$FPR(sx, ex, z)$$

## Table of Contents

- 6 Appendix Algorithms of Floating-Point Number Arithmetic
- 🕜 Appendix Details of Our Design
  - Simple Tricks
  - SecFPR: Secure FPR
  - SecFprMul: Secure FprMul
  - SecFprAdd: Secure FprAdd

#### **FPR**

**Input:** Sign bit s, exponent e, and 55-bit mantissa z

**Output:** FPN x packed by s, e, z

1: 
$$e \leftarrow e + 1076$$

2: 
$$b \leftarrow \llbracket e < 0 \rrbracket$$

3: 
$$z \leftarrow z \land (b-1)$$

4: 
$$b \leftarrow [z \neq 0]$$

5: 
$$e \leftarrow e \land (-b)$$

6: 
$$x \leftarrow ((s \ll 63) \lor (z \gg 2)) + e \ll 52$$

7: 
$$f \leftarrow 0XC8 \gg z^{[3:1]}$$

8: 
$$x \leftarrow x + f^{(1)}$$

9: return

We now show how we mask the floating-point number rounding and packing algorithm FPR.

#### **FPR**

**Input:** Sign bit s, exponent e, and 55-bit mantissa z

**Output:** FPN x packed by s, e, z

1: 
$$e \leftarrow e + 1076$$

2: 
$$b \leftarrow \llbracket e < 0 \rrbracket$$

3: 
$$z \leftarrow z \land (b-1)$$

4: 
$$b \leftarrow [z \neq 0]$$

5: 
$$e \leftarrow e \wedge (-b)$$

6: 
$$x \leftarrow ((s \ll 63) \lor (z \gg 2)) + e \ll 52$$

7: 
$$f \leftarrow 0XC8 \gg z^{[3:1]}$$

8: 
$$x \leftarrow x + f^{(1)}$$

9: return

Recall that FPR is the last subroutine of FprMul and FprAdd.

By our masking design of FprMul and FprAdd,  $(s_i)$  is Boolean-masked,  $(e_i)$  is 16-bit arithmetic-masked, and  $(z_i)$  is Boolean-masked.

#### **FPR**

**Input:** Sign bit s, exponent e, and 55-bit mantissa z **Output:** FPN x packed by s, e, z

- 1:  $e \leftarrow e + 1076$
- 2:  $b \leftarrow \llbracket e < 0 \rrbracket$
- 3:  $z \leftarrow z \land (b-1)$
- 4:  $b \leftarrow [z \neq 0]$
- 5:  $e \leftarrow e \wedge (-b)$
- 6:  $x \leftarrow ((s \ll 63) \lor (z \gg 2)) + e \ll 52$
- 7:  $f \leftarrow 0XC8 \gg z^{[3:1]}$
- 8:  $x \leftarrow x + f^{(1)}$
- 9: return

This is by adding to any one share.

#### **FPR**

**Input:** Sign bit s, exponent e, and 55-bit mantissa z **Output:** FPN x packed by s, e, z

1: 
$$e \leftarrow e + 1076$$

2: 
$$b \leftarrow \llbracket e < 0 \rrbracket$$

3: 
$$z \leftarrow z \land (b-1)$$

4: 
$$b \leftarrow [z \neq 0]$$

5: 
$$e \leftarrow e \wedge (-b)$$

6: 
$$x \leftarrow ((s \ll 63) \lor (z \gg 2)) + e \ll 52$$

7: 
$$f \leftarrow 0XC8 \gg z^{[3:1]}$$

8: 
$$x \leftarrow x + f^{(1)}$$

9: return

This is equivalent to

1: **if** 
$$e < 0$$
 **then**

2: 
$$z \leftarrow 0$$

and is done by an A2B, taking the MSB, and the simple trick.

#### **FPR**

**Input:** Sign bit s, exponent e, and 55-bit mantissa z

**Output:** FPN x packed by s, e, z

1: 
$$e \leftarrow e + 1076$$

2: 
$$b \leftarrow \llbracket e < 0 \rrbracket$$

3: 
$$z \leftarrow z \land (b-1)$$

4: 
$$b \leftarrow [z \neq 0]$$

5: 
$$e \leftarrow e \land (-b)$$

6: 
$$x \leftarrow ((s \ll 63) \lor (z \gg 2)) + e \ll 52$$

7: 
$$f \leftarrow 0XC8 \gg z^{[3:1]}$$

8: 
$$x \leftarrow x + f^{(1)}$$

9: return

This is done by SecNonzero and the simple trick.

### **FPR**

**Input:** Sign bit s, exponent e, and 55-bit mantissa z **Output:** FPN x packed by s, e, z

1: 
$$e \leftarrow e + 1076$$

2: 
$$b \leftarrow \llbracket e < 0 \rrbracket$$

3: 
$$z \leftarrow z \land (b-1)$$

4: 
$$b \leftarrow [z \neq 0]$$

5: 
$$e \leftarrow e \wedge (-b)$$

6: 
$$x \leftarrow ((s \ll 63) \lor (z \gg 2)) + e \ll 52$$

7: 
$$f \leftarrow 0XC8 \gg z^{[3:1]}$$

8: 
$$x \leftarrow x + f^{(1)}$$

9: return

Shift, OR, and a SecAdd. We add  $(e_i)$  and the 55th bit of  $(z_i)$  in advance instead of adding  $(e_i)$  to a 64-bit value. That is, use a 16-bit SecAdd to save a 64-bit SecAdd

### **FPR**

**Input:** Sign bit s, exponent e, and 55-bit mantissa z **Output:** FPN x packed by s, e, z

1: 
$$e \leftarrow e + 1076$$

2: 
$$b \leftarrow \llbracket e < 0 \rrbracket$$

3: 
$$z \leftarrow z \wedge (b-1)$$

4: 
$$b \leftarrow [z \neq 0]$$

5: 
$$e \leftarrow e \wedge (-b)$$

6: 
$$x \leftarrow ((s \ll 63) \lor (z \gg 2)) + e \ll 52$$

7: 
$$f \leftarrow 0XC8 \gg z^{[3:1]}$$

8: 
$$x \leftarrow x + f^{(1)}$$

9: return

If the least 3 bits of  $(z_i)$  are 011, 110, and 111,  $f^{(1)} = 1$ . We OR  $(z_i^{(1)})$  and  $(z_i^{(3)})$  by SecOr, and then AND  $(z_i^{(2)})$  by SecAnd. The result is then added to  $(x_i)$  by SecAdd.

### **FPR**

**Input:** Sign bit s, exponent e, and 55-bit mantissa z

**Output:** FPN x packed by s, e, z

1: 
$$e \leftarrow e + 1076$$

2: 
$$b \leftarrow \llbracket e < 0 \rrbracket$$

3: 
$$z \leftarrow z \land (b-1)$$

4: 
$$b \leftarrow [z \neq 0]$$

5: 
$$e \leftarrow e \wedge (-b)$$

6: 
$$x \leftarrow ((s \ll 63) \lor (z \gg 2)) + e \ll 52$$

7: 
$$f \leftarrow 0XC8 \gg z^{[3:1]}$$

8: 
$$x \leftarrow x + f^{(1)}$$

9: return

Done.

The returned value is a 64-bit Boolean-masked  $(x_i)$ .

### SecFPR

```
Input: 1-bit Boolean shares (s_i)_{1 \le i \le n}
Input: 16-bit arithmetic shares (e_i)_{1 \le i \le n}
Input: 55-bit Boolean shares (z_i)_{1 \le i \le n}
Output: Boolean shares (x_i)_{1 \le i \le n}
1: e_1 \leftarrow e_1 + 1076
2: (e_i) \leftarrow A2B((e_i))
3: (b_i) \leftarrow (-e_i^{(16)})
4: (z_i) \leftarrow SecAnd((z_i), (\neg b_1, b_2, \cdots, b_n))
5: (e_i) \leftarrow SecAnd((e_i), (-z_i^{(55)}))
```

```
6: (e_i) \leftarrow \text{SecAdd}((e_i), (z_i^{(55)}))
 7: (e_i) \leftarrow \text{Refresh}((e_i))
 8: (s_i) \leftarrow \text{Refresh}((s_i))
 9: (x_i) \leftarrow ((s_i^{(1)} \ll 63) \lor (e_i^{[11:1]} \ll
      52) \vee (z_i^{[54:3]})
10: (f_i) \leftarrow SecOr(Refresh(z_i^{(1)}), (z_i^{(3)}))
11: (f_i) \leftarrow \text{SecAnd}((f_i), (z_i^{(2)}))
12: (x_i) \leftarrow \mathsf{SecAdd}((x_i), (f_i))
13: return (x_i)
```

## Table of Contents

- 6 Appendix Algorithms of Floating-Point Number Arithmetic
- 🕡 Appendix Details of Our Design
  - Simple Tricks
  - SecFPR: Secure FPR
  - SecFprMul: Secure FprMul
  - SecFprAdd: Secure FprAdd

#### **FprMul**

**Input:** FPN 
$$x = (sx, ex, mx)$$

**Input:** FPN 
$$y = (sy, ey, my)$$

Output: FPN product of x and y

1: 
$$s \leftarrow sx \oplus sy$$

2: 
$$e \leftarrow ex + ey - 2100$$

3: 
$$z \leftarrow mx \times my$$

4: 
$$b \leftarrow [z^{[50:1]} \neq 0]$$

5: 
$$z \leftarrow z^{[106:51]} \lor b$$

6: 
$$z' \leftarrow (z \gg 1) \lor z^{(1)}$$

7: 
$$w \leftarrow z^{(106)}$$

8: 
$$z \leftarrow z \oplus (z \oplus z') \wedge (-w)$$

9: 
$$e \leftarrow e + w$$

10: 
$$bx \leftarrow [ex \neq 0], by \leftarrow [ey \neq 0]$$

11: 
$$b \leftarrow bx \wedge by$$

12: 
$$z \leftarrow z \wedge (-b)$$

13: **return** FPR(s, e, z)

We show how we mask the floating-point number multiplication algorithm FprMul.

### **FprMul**

Input: FPN 
$$x = (sx, ex, mx)$$

Input: FPN y = (sy, ey, my)

**Output:** FPN product of x and y

1: 
$$s \leftarrow sx \oplus sy$$

2: 
$$e \leftarrow ex + ey - 2100$$

3: 
$$z \leftarrow mx \times my$$

4: 
$$b \leftarrow [z^{[50:1]} \neq 0]$$

5: 
$$z \leftarrow z^{[106:51]} \lor b$$

6: 
$$z' \leftarrow (z \gg 1) \lor z^{(1)}$$

7: 
$$w \leftarrow z^{(106)}$$

8: 
$$z \leftarrow z \oplus (z \oplus z') \wedge (-w)$$

9: 
$$e \leftarrow e + w$$

10: 
$$bx \leftarrow [ex \neq 0]$$
,  $by \leftarrow [ey \neq 0]$ 

11: 
$$b \leftarrow bx \land by$$

12: 
$$z \leftarrow z \wedge (-b)$$

13: **return** 
$$FPR(s, e, z)$$

We assume  $(sx_i)$  and  $(sy_i)$  are Boolean shares,  $(ex_i)$  and  $(ey_i)$  are 16-bit arithmetic shares, and  $(mx_i)$  and  $(my_i)$  are 128-bit arithmetic shares, which can load the product of two 53-bit values.

#### **FprMul**

**Input:** FPN 
$$x = (sx, ex, mx)$$

**Input:** FPN 
$$y = (sy, ey, my)$$

Output: FPN product of x and y

1: 
$$s \leftarrow sx \oplus sy$$

2: 
$$e \leftarrow ex + ev - 2100$$

3: 
$$z \leftarrow mx \times my$$

4: 
$$b \leftarrow [z^{[50:1]} \neq 0]$$

5. 
$$z \leftarrow z^{[106:51]} \lor b$$

6: 
$$z' \leftarrow (z \gg 1) \lor z^{(1)}$$

7. 
$$w \leftarrow z^{(106)}$$

8: 
$$z \leftarrow z \oplus (z \oplus z') \wedge (-w)$$

9: 
$$e \leftarrow e + w$$

10: 
$$bx \leftarrow [ex \neq 0], by \leftarrow [ey \neq 0]$$

11: 
$$b \leftarrow bx \land by$$

12: 
$$z \leftarrow z \wedge (-b)$$

13: **return** FPR(s, e, z)

These can be operated share-wise.

#### FprMul

**Input:** FPN 
$$x = (sx, ex, mx)$$

**Input:** FPN 
$$y = (sy, ey, my)$$

**Output:** FPN product of x and y

1: 
$$s \leftarrow sx \oplus sy$$

2: 
$$e \leftarrow ex + ey - 2100$$

3: 
$$z \leftarrow mx \times my$$

4: 
$$b \leftarrow [z^{[50:1]} \neq 0]$$

5. 
$$z \leftarrow z^{[106:51]} \lor b$$

6: 
$$z' \leftarrow (z \gg 1) \lor z^{(1)}$$

7: 
$$w \leftarrow z^{(106)}$$

8: 
$$z \leftarrow z \oplus (z \oplus z') \wedge (-w)$$

9: 
$$e \leftarrow e + w$$

10: 
$$bx \leftarrow [ex \neq 0]$$
,  $by \leftarrow [ey \neq 0]$ 

11: 
$$b \leftarrow bx \wedge by$$

12: 
$$z \leftarrow z \wedge (-b)$$

13: **return** 
$$FPR(s, e, z)$$

This is done by SecMult. For further operations, we then apply an A2B to turn them to Boolean shares.

13: return FPR(s, e, z)

# SecFprMul: Secure FprMul

### **FprMul**

Input: FPN 
$$x = (sx, ex, mx)$$
Input: FPN  $y = (sy, ey, my)$ 
Output: FPN product of  $x$  and  $y$ 
1:  $s \leftarrow sx \oplus sy$ 
2:  $e \leftarrow ex + ey - 2100$ 
3:  $z \leftarrow mx \times my$ 
4:  $b \leftarrow \lceil z^{\lceil 50:1 \rceil} \neq 0 \rceil$ 
6:  $z' \leftarrow (z \gg 1) \lor z^{(1)}$ 
7:  $w \leftarrow z^{(106)}$ 
8:  $z \leftarrow z \oplus (z \oplus z') \land (-w)$ 
9:  $e \leftarrow e + w$ 
10:  $bx \leftarrow [ex \neq 0], by \leftarrow [ey \neq 0]$ 
11:  $b \leftarrow bx \land by$ 
12:  $z \leftarrow z \land (-b)$ 

Conditional shift by 50 bits and 51 bits, depending on  $z^{(106)}$ , while preserving the sticky bit. These can be done by SecNonzero and SecOr.

#### FprMul

Input: FPN 
$$x = (sx, ex, mx)$$
Input: FPN  $y = (sy, ey, my)$ 
Output: FPN product of  $x$  and  $y$ 

1:  $s \leftarrow sx \oplus sy$ 
2:  $e \leftarrow ex + ey - 2100$ 
3:  $z \leftarrow mx \times my$ 
4:  $b \leftarrow [z^{[50:1]} \neq 0]$ 
5:  $z \leftarrow z^{[106:51]} \vee b$ 

6:  $z' \leftarrow (z \gg 1) \vee z^{(1)}$ 
7:  $w \leftarrow z^{(106)}$ 
8:  $z \leftarrow z \oplus (z \oplus z') \wedge (-w)$ 
9:  $e \leftarrow e + w$ 
10:  $bx \leftarrow [ex \neq 0]$ ,  $by \leftarrow [ey \neq 0]$ 
11:  $b \leftarrow bx \wedge by$ 
12:  $z \leftarrow z \wedge (-b)$ 
13: **return** FPR( $s, e, z$ )

We observe that we can save one SecOR.

- When shifted by 50 bits, we OR the last bit with  $z^{[50:1]}$ .
- When shifted by 51 bits, we OR the last bit with  $z^{[51:1]}$ .

We can simply OR the the last bit with  $z^{[51:1]}$ , regardless of the conditional shift result.

#### **FprMul**

**Input:** FPN 
$$x = (sx, ex, mx)$$

Input: FPN 
$$y = (sy, ey, my)$$

**Output:** FPN product of x and y

1: 
$$s \leftarrow sx \oplus sy$$

2: 
$$e \leftarrow ex + ey - 2100$$

3: 
$$z \leftarrow mx \times my$$

4: 
$$b \leftarrow [z^{[50:1]} \neq 0]$$

5: 
$$z \leftarrow z^{[106:51]} \lor b$$

6: 
$$z' \leftarrow (z \gg 1) \lor z^{(1)}$$

7. 
$$w \leftarrow z^{(106)}$$

8: 
$$z \leftarrow z \oplus (z \oplus z') \wedge (-w)$$

9: 
$$e \leftarrow e + w$$

10: 
$$bx \leftarrow [ex \neq 0], by \leftarrow [ey \neq 0]$$

11: 
$$b \leftarrow bx \land by$$

12: 
$$z \leftarrow z \land (-b)$$

13: **return** FPR(s, e, z)

This is by adding to any share.

### FprMul

Input: FPN 
$$x = (sx, ex, mx)$$
Input: FPN  $y = (sy, ey, my)$ 
Output: FPN product of  $x$  and  $y$ 
1:  $s \leftarrow sx \oplus sy$ 
2:  $e \leftarrow ex + ey - 2100$ 
3:  $z \leftarrow mx \times my$ 
4:  $b \leftarrow [z^{[50:1]} \neq 0]$ 
5:  $z \leftarrow z^{[106:51]} \lor b$ 
6:  $z' \leftarrow (z \gg 1) \lor z^{(1)}$ 
7:  $w \leftarrow z^{(106)}$ 
8:  $z \leftarrow z \oplus (z \oplus z') \land (-w)$ 
9:  $e \leftarrow e + w$ 
10:  $bx \leftarrow [ex \neq 0]$ ,  $by \leftarrow [ey \neq 0]$ 
11:  $b \leftarrow bx \land by$ 
12:  $z \leftarrow z \land (-b)$ 
13: **return** FPR $(s, e, z)$ 

This is by SecNonzero and SecAnd, and applying the simple trick.

#### **FprMul**

**Input:** FPN 
$$x = (sx, ex, mx)$$
  
**Input:** FPN  $y = (sy, ey, my)$   
**Output:** FPN product of  $x$  and  $y$ 

1: 
$$s \leftarrow sx \oplus sy$$

2: 
$$e \leftarrow ex + ey - 2100$$

3: 
$$z \leftarrow mx \times my$$
  
4:  $b \leftarrow [z^{[50:1]} \neq 0]$   
5:  $z \leftarrow z^{[106:51]} \lor b$ 

6: 
$$z' \leftarrow (z \gg 1) \lor z^{(1)}$$

7: 
$$w \leftarrow z^{(106)}$$

8: 
$$z \leftarrow z \oplus (z \oplus z') \wedge (-w)$$

9: 
$$e \leftarrow e + w$$

10: 
$$bx \leftarrow [ex \neq 0], by \leftarrow [ey \neq 0]$$

11: 
$$b \leftarrow bx \land by$$
  
12:  $z \leftarrow z \land (-b)$ 

13: return 
$$FPR(s, e, z)$$

Now it calls FPR to return a 64-bit Boolean-masked FPN.

#### SecFprMul

```
8: (w_i) \leftarrow (p_i^{(106)})
Input: Shares (sx_i)_{1 \le i \le n}, (ex_i)_{1 \le i \le n}, (mx_i)_{1 \le i \le n}
Input: Shares (sy_i)_{1 \le i \le n}, (ey_i)_{1 \le i \le n}, (my_i)_{1 \le i \le n}
                                                                                  9: (z_i) \leftarrow \text{SecAnd}((z_i), \text{Refresh}((-w_i)))
Output: Boolean shares for the FPN product.
                                                                                 10: (z_i) \leftarrow (z_i' \oplus z_i)
                                                                                 11: (z_i) \leftarrow SecOr((z_i), (b_i))
 1: (s_i) \leftarrow (sx_i \oplus sv_i)
 2: (e_i) \leftarrow (ex_1 + ev_1 - 2100, ex_2 + ev_2, \cdots)
                                                                                 12: (w_i) \leftarrow B2A_{Bit}((w_i))
 3: (p_i) \leftarrow \text{SecMult}((mx_i), (my_i))
                                                                                 13: (e_i) \leftarrow (e_i + w_i)
 4: (p_i) \leftarrow A2B((p_i))
                                                                                 14: (bx_i) \leftarrow SecNonzero((ex_i))
 5: (b_i) \leftarrow \text{SecNonzero}((p_i^{[51:1]}))
                                                                                 15: (bv_i) \leftarrow SecNonzero((ev_i))
                                                                                 16: (d_i) \leftarrow \mathsf{SecAnd}((bx_i), (by_i))
 6: (z_i) \leftarrow (p_i^{[105:51]})
                                                                                 17: (z_i) \leftarrow \text{SecAnd}((z_i), (-d_i^{(1)}))
 7: (z'_i) \leftarrow (p_i^{[105:51]} \oplus p_i^{[106:52]})
                                                                                 18: return SecFPR((s_i), (e_i), (z_i))
```

#### Table of Contents

- Appendix Algorithms of Floating-Point Number Arithmetic
- Appendix Details of Our Design
  - Simple Tricks
  - SecEPR: Secure EPR
  - SecFprMul: Secure FprMul
  - SecFprAdd: Secure FprAdd

### **FprAdd**

**Input:** FPNs x and y

**Output:** FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \vee ((1-(-d)^{(64)}) \wedge x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, y \leftarrow y \oplus m$$

5: Extract (sx, ex, mx) and (sy, ey, my) from x, y, respectively.

6: 
$$mx \leftarrow mx \ll 3$$
,  $my \leftarrow my \ll 3$ 

7: 
$$ex \leftarrow ex - 1078$$
,  $ey \leftarrow ey - 1078$ 

8: 
$$c \leftarrow ex - ev$$

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee \lceil my^{[c:1]} \neq 0 \rceil$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of z set

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** 
$$FPR(sx, ex, z)$$

We show how we mask the floating-point number addition algorithm FprAdd.

#### **FprAdd**

#### **Input:** FPNs x and y

**Output:** FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \vee ((1-(-d)^{(64)}) \wedge x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, y \leftarrow y \oplus m$$

5: Extract (sx, ex, mx) and (sy, ey, my) from x, y, respectively.

my /- my // 3 my /- my //

6:  $mx \leftarrow mx \ll 3$ ,  $my \leftarrow my \ll 3$ 

7:  $ex \leftarrow ex - 1078$ ,  $ey \leftarrow ey - 1078$ 

8:  $c \leftarrow ex - ey$ 

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee \llbracket my^{[c:1]} \neq 0 \rrbracket$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of z set

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** 
$$FPR(sx, ex, z)$$

By the output of SecFprMul, we assume the input shares  $(x_i)$  and  $(y_i)$  are 64-bit Boolean-masked FPNs.

#### **FprAdd**

**Input:** FPNs x and y

Output: FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \vee ((1-(-d)^{(64)}) \wedge x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, y \leftarrow y \oplus m$$

5: Extract 
$$(sx, ex, mx)$$
 and  $(sy, ey, my)$  from  $x, y$ , respectively.

6: 
$$mx \leftarrow mx \ll 3$$
,  $my \leftarrow my \ll 3$ 

7: 
$$ex \leftarrow ex - 1078$$
,  $ey \leftarrow ey - 1078$ 

8: 
$$c \leftarrow ex - ey$$

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee \llbracket my^{[c:1]} \neq 0 \rrbracket$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of z set

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** 
$$FPR(sx, ex, z)$$

The subtraction of two Boolean-masked values can be operated by considering  $x^{[63:1]} - y^{[63:1]} = x^{[63:1]} + (\neg y^{[63:1]}) + 1$ , which takes two SecAdds.

#### **FprAdd**

**Input:** FPNs x and y **Output:** FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \vee ((1-(-d)^{(64)}) \wedge x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, y \leftarrow y \oplus m$$

6: 
$$mx \leftarrow mx \ll 3$$
,  $my \leftarrow my \ll 3$ 

7: 
$$ex \leftarrow ex - 1078$$
,  $ey \leftarrow ey - 1078$ 

8: 
$$c \leftarrow ex - ey$$

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee \lceil my^{[c:1]} \neq 0 \rceil$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** 
$$FPR(sx, ex, z)$$

But since we only need  $(x^{[63:1]} - y^{[63:1]})^{(64)}$ , we only compute  $x^{[63:1]} + (\neg y^{[63:1]})$  and then check the boundary conditions. This saves us one SecAdd.

#### **FprAdd**

**Input:** FPNs x and y

Output: FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \vee ((1-(-d)^{(64)}) \wedge x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, y \leftarrow y \oplus m$$

6: 
$$mx \leftarrow mx \ll 3$$
,  $my \leftarrow my \ll 3$ 

7: 
$$ex \leftarrow ex - 1078$$
,  $ey \leftarrow ey - 1078$ 

8: 
$$c \leftarrow ex - ey$$

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee [my^{[c:1]} \neq 0]$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of z set

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** 
$$FPR(sx, ex, z)$$

That is, 
$$[\![u-v<0]\!] = [\![u+(\neg v)<0]\!] \oplus [\![u+(\neg v)=-1]\!] \oplus [\![u+(\neg v)=2^{63}-1]\!]$$

#### **FprAdd**

**Input:** FPNs x and y

**Output:** FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \vee ((1-(-d)^{(64)}) \wedge x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, v \leftarrow v \oplus m$$

 $my \leftarrow my // 3 my \leftarrow my // 3$ 

6: 
$$mx \leftarrow mx \ll 3$$
,  $my \leftarrow my \ll 3$ 

7: 
$$ex \leftarrow ex - 1078, ey \leftarrow ey - 1078$$

8: 
$$c \leftarrow ex - ey$$

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee [my^{[c:1]} \neq 0]$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** FPR(sx, ex, z)

Moreover, we apply 
$$u + (\neg v) \neq -1 \Leftrightarrow \neg(u + (\neg v)) \neq 0$$
 and  $u + (\neg v) \neq 2^{63} - 1 \Leftrightarrow (u + (\neg v)) \oplus (1 \ll 63) \neq -1 \Leftrightarrow \neg((u + (\neg v)) \oplus (1 \ll 63)) \neq 0$ 

### **FprAdd**

**Input:** FPNs x and y

**Output:** FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \vee ((1-(-d)^{(64)}) \wedge x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, y \leftarrow y \oplus m$$

6: 
$$mx \leftarrow mx \ll 3$$
,  $my \leftarrow my \ll 3$ 

7: 
$$ex \leftarrow ex - 1078$$
,  $ey \leftarrow ey - 1078$ 

8: 
$$c \leftarrow ex - ev$$

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee [my^{[c:1]} \neq 0]$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of z set

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** 
$$FPR(sx, ex, z)$$

Therefore, these operations can be computed by SecNonzero, SecAnd, and SecOr.

#### **FprAdd**

**Input:** FPNs x and y **Output:** FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \lor ((1-(-d)^{(64)}) \land x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, y \leftarrow y \oplus m$$

6: 
$$mx \leftarrow mx \ll 3$$
,  $my \leftarrow my \ll 3$ 

7: 
$$ex \leftarrow ex - 1078, ey \leftarrow ey - 1078$$

8: 
$$c \leftarrow ex - ey$$

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee [my^{[c:1]} \neq 0]$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of z set

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** 
$$FPR(sx, ex, z)$$

Share-wise operations, two B2As to convert  $(ex_i)$  and  $(ey_i)$  to arithmetic shares, and subtractions to any shares.

#### **FprAdd**

**Input:** FPNs x and y

**Output:** FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \lor ((1-(-d)^{(64)}) \land x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, y \leftarrow y \oplus m$$

from x, y, respectively.

6: 
$$mx \leftarrow mx \ll 3$$
,  $my \leftarrow my \ll 3$ 

7: 
$$ex \leftarrow ex - 1078$$
,  $ey \leftarrow ey - 1078$ 

8: 
$$c \leftarrow ex - ey$$

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee \llbracket my^{[c:1]} \neq 0 \rrbracket$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of z set

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** 
$$FPR(sx, ex, z)$$

Subtraction to any share of c by 60 and an A2B to get the MSB of c. Then apply the simple trick.

### **FprAdd**

**Input:** FPNs x and y

Output: FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \vee ((1-(-d)^{(64)}) \wedge x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, y \leftarrow y \oplus m$$

5: Extract (sx, ex, mx) and (sy, ey, my) from x, y, respectively.

6: 
$$mx \leftarrow mx \ll 3$$
,  $my \leftarrow my \ll 3$ 

7: 
$$ex \leftarrow ex - 1078$$
,  $ey \leftarrow ey - 1078$ 

8: 
$$c \leftarrow ex - ev$$

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee \lceil my^{[c:1]} \neq 0 \rceil$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of z set

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** 
$$FPR(sx, ex, z)$$

This is by our gadget SecFprUrsh.

### **FprAdd**

**Input:** FPNs x and y

**Output:** FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \lor ((1-(-d)^{(64)}) \land x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, y \leftarrow y \oplus m$$

5: Extract (sx, ex, mx) and (sy, ey, my) from x, y, respectively.

6: 
$$mx \leftarrow mx \ll 3$$
,  $my \leftarrow my \ll 3$ 

7: 
$$ex \leftarrow ex - 1078$$
,  $ey \leftarrow ey - 1078$ 

8: 
$$c \leftarrow ex - ev$$

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee [my^{[c:1]} \neq 0]$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of z set

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** 
$$FPR(sx, ex, z)$$

A Share-wise operation.

#### **FprAdd**

**Input:** FPNs x and y

**Output:** FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \vee ((1-(-d)^{(64)}) \wedge x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, y \leftarrow y \oplus m$$

5: Extract (sx, ex, mx) and (sy, ey, my) from x, v, respectively.

6: 
$$mx \leftarrow mx \ll 3$$
,  $my \leftarrow my \ll 3$ 

7: 
$$ex \leftarrow ex - 1078$$
,  $ey \leftarrow ey - 1078$ 

8: 
$$c \leftarrow ex - ey$$

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee [my^{[c:1]} \neq 0]$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of z set

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** 
$$FPR(sx, ex, z)$$

Considering the simple trick with  $my + (my \oplus (-my)) \wedge -s$ , where  $-my = (\neg my) + 1$  is derived by an SecAdd. Then add the result to mx.

### **FprAdd**

**Input:** FPNs x and y

**Output:** FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \vee ((1-(-d)^{(64)}) \wedge x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, y \leftarrow y \oplus m$$

6: 
$$mx \leftarrow mx \ll 3$$
,  $my \leftarrow my \ll 3$ 

7: 
$$ex \leftarrow ex - 1078$$
,  $ey \leftarrow ey - 1078$ 

8: 
$$c \leftarrow ex - ev$$

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee \lceil my^{[c:1]} \neq 0 \rceil$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of z set

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** 
$$FPR(sx, ex, z)$$

This is by our gadget SecFprNorm64.

### **FprAdd**

**Input:** FPNs x and y

**Output:** FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \vee ((1-(-d)^{(64)}) \wedge x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, y \leftarrow y \oplus m$$

5: Extract (sx, ex, mx) and (sy, ey, my) from x, y, respectively.

6: 
$$mx \leftarrow mx \ll 3$$
,  $my \leftarrow my \ll 3$ 

7: 
$$ex \leftarrow ex - 1078$$
,  $ey \leftarrow ey - 1078$ 

8: 
$$c \leftarrow ex - ev$$

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee \lceil my^{[c:1]} \neq 0 \rceil$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of z set

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** 
$$FPR(sx, ex, z)$$

A Share-wise operation and a SecNonzero. Add 9 to any share of ex.

### **FprAdd**

**Input:** FPNs x and y

**Output:** FPN sum of x and y

1: 
$$d \leftarrow x^{[63:1]} - v^{[63:1]}$$

2: 
$$cs \leftarrow d^{(64)} \vee ((1-(-d)^{(64)}) \wedge x^{(64)})$$

3: 
$$m \leftarrow (x \oplus y) \land (-cs)$$

4: 
$$x \leftarrow x \oplus m, y \leftarrow y \oplus m$$

5: Extract (sx, ex, mx) and (sy, ey, my) from x, y, respectively.

6: 
$$mx \leftarrow mx \ll 3$$
,  $my \leftarrow my \ll 3$ 

7: 
$$ex \leftarrow ex - 1078$$
,  $ey \leftarrow ey - 1078$ 

8: 
$$c \leftarrow ex - ev$$

9: 
$$b \leftarrow [c < 60]$$

10: 
$$my \leftarrow my \land (-b)$$

11: 
$$my \leftarrow (my \gg c) \vee [my^{[c:1]} \neq 0]$$

12: 
$$s \leftarrow sx \oplus sy$$

13: 
$$z \leftarrow mx + (-1)^s my$$

14: Normalize z, ex to make the 64th bit of z set

15: 
$$z \leftarrow (z \gg 9) \vee [z^{[9:1]} \neq 0]$$

16: 
$$ex \leftarrow ex + 9$$

17: **return** 
$$FPR(sx, ex, z)$$

Finally, it calls FPR to return a 64-bit Boolean-masked FPN.

#### SecFprAdd

```
Input: Boolean shares (x_i)_{1 \le i \le n}
                                                                                            14: (c_i) \leftarrow (ex_i - ev_i)
Input: Boolean shares (v_i)_{1 \le i \le n}
                                                                                            15: (c'_1) \leftarrow A2B((c_1 - 60, c_2, \dots, c_n))
Output: Boolean shares for the FPN sum
                                                                                            16: (my_i) \leftarrow \text{SecAnd}((my_i), (-(c_i'^{(16)})))
1: (xm_i) \leftarrow (x_i^{[63:1]})
                                                                                            17: (mv_i) \leftarrow SecFprUrsh((mv_i), (c_i^{[6:1]}))
2: (ym_i) \leftarrow (\neg y_1^{[63:1]}, y_2^{[63:1]}, \cdots, y_n^{[63:1]})
                                                                                            18: (mv_1') \leftarrow (\neg mv_1, mv_2, \cdots, mv_n)
 3: (d_i) \leftarrow \text{SecAdd}((xm_i), (vm_i))
                                                                                            19: (my') \leftarrow \text{SecAdd}((my'), (1, 0, \dots, 0))
 4: (b_i) \leftarrow \text{SecNonzero}(\neg d_1, d_2, \cdots, d_n)
                                                                                            20: (s_i) \leftarrow (-(sx_i \oplus sy_i))
 5: (b'_1) \leftarrow \text{SecNonzero}(\neg (d_1 \oplus (1 \ll 63)), d_2, \cdots, d_n)
                                                                                            21: (my_i) \leftarrow \text{Refresh}((my_i))
 6: (cs_i) \leftarrow \text{SecAnd}((\neg b_1, b_2, \cdots, b_n), (x_i^{(64)}))
                                                                                            22: (my_i') \leftarrow \text{SecAnd}((my_i \oplus my_i'), (s_i))
                                                                                            23: (mv_i) \leftarrow (mv_i \oplus mv_i')
 7: (cs_i) \leftarrow SecOr((cs_i), (d_i^{(64)} \oplus b_i \oplus b_i'))
                                                                                            24: (z_i) \leftarrow \text{SecAdd}((mx_i), (my_i))
 8: (m_i) \leftarrow \text{SecAnd}((x_i \oplus y_i), (-cs_i))
                                                                                            25: (z_i), (ex_i) \leftarrow SecFprNorm64((z_i), (ex_i))
 9: (x_i) \leftarrow (x_i \oplus m_i), (v_i) \leftarrow (v_i \oplus m_i)
                                                                                            26: (b_i) \leftarrow SecNonzero((z_i^{[10:1]}))
10: Extract (sx_i), (ex_i), (mx_i) and (sy_i), (ey_i), (my_i) from
                                                                                            27: (z_i) \leftarrow (z_i \gg 9)
      (x_i) and (y_i), respectively.
                                                                                            28: (z_i^{(1)}) \leftarrow (b_i)
11: (mx_i) \leftarrow (mx_i \ll 3), (my_i) \leftarrow (my_i \ll 3)
12: (ex_i) \leftarrow B2A((ex_i)), (ev_i) \leftarrow B2A((ev_i))
                                                                                            29: ex_1 \leftarrow ex_1 + 9
                                                                                            30: return SecFPR(Refresh((sx_i)), (ex_i), (z_i))
13: ex_1 \leftarrow ex_1 - 1078, ev_1 \leftarrow ev_1 - 1078.
```