

# 11 Floating Point Operations in ARM

**CPE 221** 

The University of Alabama in Huntsville

Rahul Bhadani April 1, 2025 Introduction to Floating-Point Support in ARMv7

Floating-Point Register Set

Basic Floating-Point Operations

**Data Conversion** 

Working with FPSCR

Loading Floating Data Memory to Multiple Registers

Examples

### Introduction to Floating-Point Support in ARMv7



#### ARMv7 VFP Architecture Overview

- ► ARMv7 provides hardware floating-point support through:
  - ▶ Vector Floating Point (VFP) extension
  - ▶ Advanced Single Instruction, Multiple Data (SIMD), NEON extension
- CPULator supports these extensions
- ▶ VFP supports IEEE 754 single-precision and double-precision operations
- ► Key components:
  - ► Dedicated register set
  - Special instructions for floating-point operations
  - Control and status register (FPSCR)



#### Two Separate Processors





### Floating-Point Register Set



#### Floating-Point Register Organization

- ► ARMv7 VFP provides 32 single-precision registers (S0-S31)
- ► These can be viewed as:
  - ▶ 16 double-precision registers (D0-D15)
  - ▶ S0/S1 overlap with D0, S2/S3 with D1, etc.
- ▶ Advanced implementations support 32 double-precision registers (D0-D31)
- ▶ VFP/NEON registers are separate from general-purpose registers

| Single-precision | S0 | S1 |  |
|------------------|----|----|--|
| Double-precision | D0 |    |  |



#### FPSCR: Floating-Point Status and Control Register

- Special register that controls VFP operation
- Contains fields for:
  - Exception flags (N, Z, C, V)
  - Rounding mode configuration
  - Exception handling controls
  - ▶ Vector length/stride control
- Similar role to the CPSR for integer operations
- Accessible via special instructions:
  - VMRS Move from FPSCR to ARM register
  - VMSR Move to FPSCR from ARM register
  - ▶ APSR\_nzcv (Application Program Status Register, NZCV flags): The APSR holds the condition flags for integer operations. The \_nzcv suffix specifies that only the N (Negative), Z (Zero), C (Carry), and V (Overflow) flags should be updated.



### **Basic Floating-Point Operations**



#### VMOV

VMOV only works with a limited set of constants. VMOV constants are limited to  $\pm m/2^k$ . Bit 7: Sign bit. Bits 6:4:  $0 \le k \le 7$ , Bits 3:0:  $16 \le m \le 31$ 



#### Loading and Storing Floating-Point Values

```
data
    pi_val: .float 3.14159
    double val: .double 1.618
.text
.global _start
start:
    LDR RO, =pi_val
    VLDR SO. [RO]
                         @ Load single-precision constant
    LDR RO, =double_val
    VLDR D1, [RO]
                         @ Load double-precision constant
done: B done
```

Supported in CPULator.

9

11

12 13

14

#### Basic Arithmetic Operations

```
@ Single-precision operations
    VADD.F32 SO, S1, S2
    VSUB.F32 S3, S4, S5
3
                                  S6 =
    VMUL.F32 S6, S7, S8
    VDIV.F32 S9, S10, S11
    @ Double-precision operations
    VADD.F64 DO, D1, D2
                                  D3 =
    VSUB.F64 D3, D4, D5
                                  D6
    VMUL.F64 D6, D7, D8
Q
    VDIV.F64 D9, D10, D11
10
```



#### Division Operation

```
.global _start
    start:
        // Load the floating-point numbers into registers using PC-relative addressing
3
        VLDR.F32 SO, float1 // Load 37.24 into register SO
        VLDR.F32 S1, float2 // Load 7.51 into register S1
5
        // Perform the division: S2 = S0 / S1
        VDIV.F32 S2, S0, S1 // S2 = S0 / S1
        // At this point, S2 contains the result of 37.24 / 7.51
8
        // Exit the program (optional, depending on your environment)
        MOV R7, #1 // syscall: exit
10
    done: B done
11
        // Define the floating-point constants in a literal pool
12
        .ltorg
                 // Place the literal pool here
13
    float1: .float 37.24
14
15
    float2: .float 7.51
```



#### Loading First Floating-Point Value

```
VLDR.F32 SO, float1 // Load 37.24 into register SO
```

- ▶ VLDR.F32: Vector Load instruction for 32-bit floating-point values
- ► S0: Destination register (single-precision floating-point)
- float1: Label referencing memory location with value 37.24
- Uses PC-relative addressing to locate the value



#### Loading Second Floating-Point Value

```
VLDR.F32 S1, float2 // Load 7.51 into register S1
```

- ► Similar to previous instruction
- ▶ Loads 32-bit floating-point value 7.51 from memory
- Places value into register S1
- ▶ Prepares second operand for division operation



#### Performing Floating-Point Division

```
VDIV.F32 S2, S0, S1 // S2 = S0 / S1
```

- ▶ VDIV.F32: Vector Floating-Point Division instruction
- ▶ Divides value in S0 (37.24) by value in S1 (7.51)
- ► Result stored in register S2
- ► Calculates  $37.24 \div 7.51 \approx 4.96$



#### Literal Pool Directive

.ltorg // Place the literal pool here

- ▶ .ltorg: Directive to place a literal pool at this location
- ▶ Literal pool: Storage area for constants not directly encodable in instructions
- Ensures floating-point constants are accessible to VLDR instructions
- ▶ Important for proper memory layout and addressing



#### Floating-Point Constant

```
float1: .float 37.24
float2: .float 7.51
```

- ▶ float1:: Label to reference this memory location
- float 37.24: Allocates space for 32-bit floating-point constant
- ▶ Defines the first operand for our division operation
- Referenced by the first VLDR instruction
- float2:: Label for memory reference
- ▶ .float 7.51: Allocates space for 32-bit floating-point constant
- Defines the second operand (divisor)
- ▶ Referenced by the second VLDR instruction



2

#### Comparison and Conditional Operations

```
@ Compare floating-point values
                                   @ Compare SO with S1
    VCMP.F32 SO, S1
                                  @ Transfer flags to APSR
    VMRS APSR_nzcv, FPSCR
                                  @ Branch if S0 == S1
@ Branch if S0 > S1
    BEQ equal_label
    BGT greater_than_label
5
6
    @ Using VPSEL (ARMv8 but supported in some ARMv7 implementations)
7
    VCMP.F32 SO. S1
    VMRS APSR_nzcv. FPSCR
                                  @ Move SO to S2 if SO > S1
    VMOVGT.F32 S2. S0
10
```



#### **Data Conversion**



#### Converting Between Formats

```
@ Integer to floating-point
                                 @ Move RO value to SO
    VMOV SO, RO
2
    VCVT.F32.S32 S0, S0
                                   Convert signed 32-bit int to float
    @ Floating-point to integer
    VCVT.S32.F32 S1, S0
                                 @ Convert float to signed 32-bit int
                                 Move S1 value to R1
    VMOV R1, S1
6
7
    @ Single to double precision
    VCVT.F64.F32 DO, SO
                                 @ Convert single to double
9
10
    @ Double to single precision
11
    VCVT.F32.F64 S2, D0
                                 @ Convert double to single
12
```



### Working with FPSCR



#### Accessing FPSCR

```
@ Reading FPSCR
    VMRS RO, FPSCR
                                 @ Read FPSCR into RO
    @ Writing to FPSCR
3
                                 @ Write RO to FPSCR
    VMSR FPSCR, RO
5
    @ Moving flags from FPSCR to APSR
6
                                 @ Copy condition flags
    VMRS APSR_nzcv, FPSCR
8
    © Setting rounding mode (example: round to nearest)
9
    MOV RO. #0x0
                                 @ Round to nearest mode
10
    BIC R1, R0, #0x00C00000
                                 @ Clear rounding mode bits
11
                                   Set round to nearest (00)
    ORR R1, R1, #0x00000000
12
    VMSR FPSCR, R1
                                   Update FPSCR
13
```



### **Loading Floating Data Memory to Multiple Registers**



#### **VLDMIA**

Load Multiple FPU Registers, Increment After.

#### Syntax:

VLDMIA Rn!, register list

- ightharpoonup FP registers ightarrow memory, 1st address in Rn
- ▶ Updates Rn only if write-back flag (!) is appended to Rn.

#### Example:

// Copy starting at mem[R0]
VLDMIA RO!,{S0,S1,S2}





#### **VLDMDB**

Load Multiple FPU Registers, Decrement Before.

#### Syntax:

VLDMDB Rn!, register list

- ► FP registers → memory, addresses end just before address in Rn
- ► Must append (!) and always updates Rn.

#### Example:

//Copy ending before mem[RO]
VLDMDB RO!,{SO,S1,S2}





### **Examples**



#### Example 1: Computing the Area of a Circle

11

 $\frac{13}{14}$ 

15

16 17

18

19

20

21

 $\frac{23}{24}$ 

```
.data
pi: .float 3.14159265
                           © Single-precision PI
                           Circle radius
radius: .float 5.0
                           Result storage
area: .float 0.0
.text
.global _start
_start:
O Load the radius and PI into VFP registers
LDR RO, =radius
VLDR SO, [RO]
                           O SO = radius
LDR RO, pi
                           0 S1 = pi
VLDR S1, [RO]
Compute radius squared
                           🖸 S2 🗏 radius * radius
VMUL.F32 S2, S0, S0
Compute area = pi * radius^2
                           S3 = pi * radius 2
VMUL.F32 S3, S1, S2
© Store the result
LDR RO, =area
VSTR S3, [R0]
                           @ Store result to memory
done: B done
```

#### **Example 2: Temperature Conversion**

10 11

14

15

 $\frac{16}{17}$ 

18

19

20

23

```
.data
temp f: .float 98.6
                          Temperature in Fahrenheit
temp_c: .float 0.0
                          Will hold Celsius result
                          Constant for conversion
const_32: .float 32.0
const_5_9: .float 0.5555555 0 5/9 as a floating-point constant
.text
.global _start
start:
Load the Fahrenheit temperature
LDR RO, =temp_f
VLDR SO. [RO]
                          SO = fahrenheit temperature
Load the constants
LDR RO, =const_32
                          0 S1 = 32.0
VLDR S1, [RO]
LDR RO, =const_5_9
0 S3 = F - 32
0 S4 = (F - 32) * 5/9
VSUB.F32 S3, S0, S1
VMUL.F32 S4, S3, S2
Store the result
LDR RO, =temp_c
VSTR S4, [R0]
                          C Store Celsius result
done: B done
```

#### Example 3: Working with FPSCR for Exception Handling

```
data
    val1: .float 1.0
    val2: .float 0.0
                                 @ Will cause division by zero
     .text
     .global _start
     start:
     © Enable floating-point exceptions (division by zero)
    VMRS RO, FPSCR
                                 @ Read current FPSCR
    ORR RO, RO, #(1 << 2) @ Set DZE (Division by Zero) exception
     \hookrightarrow bit
                                 0 Update FPSCR
    VMSR FPSCR, RO
10
    @ Load values
11
    LDR RO, =val1
12
                                 @ SO = 1.0
    VLDR SO, [RO]
13
    LDR RO, =val2
14
                                 @ S1 = 0.0
    VLDR S1, [R0]
15
```



#### Example 3: Working with FPSCR for Exception Handling (Continued)

```
O Try division that will cause exception
                                 @ S2 = S0 / S1 (div by zero)
    VDIV.F32 S2, S0, S1
    @ Check for exception
    VMRS RO, FPSCR
                                 @ Read FPSCR after operation
    TST RO, #(1 << 2)
                                 @ Check if DZC (Division by Zero) flag
     → is set
                                 @ Branch if exception occurred
    BNE division_error
    B continue
                                   No exception, continue
    division error:
    @ Handle the error here
    MOV RO, #1
                                 @ Error code
10
    B end
11
    continue:
    MOV RO, #0
                                   Success code
13
    end:
14
    done: B done
15
```



## The End