| <u>Base</u>         | SIMD&FP             | <u>SVE</u>          | <u>SME</u>          | Index by        |
|---------------------|---------------------|---------------------|---------------------|-----------------|
| <u>Instructions</u> | <u>Instructions</u> | <u>Instructions</u> | <u>Instructions</u> | <b>Encoding</b> |

Pseu

## **BFMAX** (multiple vectors)

Multi-vector BFloat16 floating-point maximum

Determine the maximum of BFloat16 elements of the two or four second source vectors and the corresponding BFloat16 elements of the two or four first source vectors and destructively place the results in the corresponding elements of the two or four first source vectors.

When FPCR.AH is 0, the behavior is as follows:

- Negative zero compares less than positive zero.
- When FPCR.DN is 0, if either element is a NaN, the result is a quiet NaN.
- When FPCR.DN is 1, if either element is a NaN, the result is Default NaN.

When FPCR.AH is 1, the behavior is as follows:

- If both elements are zeros, regardless of the sign of either zero, the result is the second element.
- If either element is a NaN, regardless of the value of FPCR.DN, the result is the second element.

This instruction follows SME2.1 non-widening BFloat16 numerical behaviors corresponding to instructions that place their results in two or four SVE Z vectors.

This instruction is unpredicated.

ID\_AA64SMFR0\_EL1.B16B16 indicates whether this instruction is implemented.

It has encodings from 2 classes: Two registers and Four registers

22

```
Two registers (FEAT_SVE_B16B16)
```

3130292827262524

```
BFMAX { <Zdn1>.H-<Zdn2>.H }, { <Zdn1>.H-<Zdn2>.H }, { <Zm1>.H-<Zm2>

if !HaveSME2() | !IsFeatureImplemented(FEAT_SVE_B16B16) then UNDEFINED
integer dn = UInt(Zdn:'0');
integer m = UInt(Zm:'0');
constant integer nreg = 2;
```

212019181716151413121110 9 8 7 6 5 4 3 2 1 0

## Four registers (FEAT\_SVE\_B16B16)

| 3130292827262524 | 23 | 22 | 21201910 | 171615141: | 2171110 | 901 | 0 5 | 4 3 2 1 0 |
|------------------|----|----|----------|------------|---------|-----|-----|-----------|
| 1 1 0 0 0 0 0 1  | 0  | 0  | 1 Zm     | 0 0 1 0 1  | 1 1 0   | 010 | 0 0 | Zdn 0 0   |

```
3130292827262524 23 22 212019181716151413121110 9 8 7 6 5 4 3 2 1 0
                size<1>size<0>
       BFMAX { <Zdn1>.H-<Zdn4>.H }, { <Zdn1>.H-<Zdn4>.H }, { <Zm1>.H-<Zm4>
   if !HaveSME2() | !IsFeatureImplemented(FEAT SVE B16B16) then UNDEFINED
   integer dn = UInt(Zdn:'00');
   integer m = <u>UInt</u>(Zm:'00');
   constant integer nreg = 4;
Assembler Symbols
<7dn1>
               For the two registers variant: is the name of the first
               scalable vector register of a multi-vector sequence, encoded
               as "Zdn" times 2.
               For the four registers variant: is the name of the first
                scalable vector register of a multi-vector sequence, encoded
                as "Zdn" times 4.
<Zdn4>
               Is the name of the fourth scalable vector register of a multi-
               vector sequence, encoded as "Zdn" times 4 plus 3.
<Zdn2>
               Is the name of the second scalable vector register of a
               multi-vector sequence, encoded as "Zdn" times 2 plus 1.
<Zm1>
               For the two registers variant: is the name of the first
               scalable vector register of a multi-vector sequence, encoded
               as "Zm" times 2.
               For the four registers variant: is the name of the first
                scalable vector register of a multi-vector sequence, encoded
               as "Zm" times 4.
<Zm4>
               Is the name of the fourth scalable vector register of a multi-
```

vector sequence, encoded as "Zm" times 4 plus 3.

< 7.m2 >

Is the name of the second scalable vector register of a multi-vector sequence, encoded as "Zm" times 2 plus 1.

## Operation

```
CheckStreamingSVEEnabled();
constant integer VL = CurrentVL;
constant integer elements = VL DIV 16;
array [0..3] of bits(VL) results;

for r = 0 to nreg-1
    bits(VL) operand1 = Z[dn+r, VL];
    bits(VL) operand2 = Z[m+r, VL];
    for e = 0 to elements-1
        bits(16) element1 = Elem[operand1, e, 16];
        bits(16) element2 = Elem[operand2, e, 16];
        Elem[results[r], e, 16] = BFMax(element1, element2, FPCR[]);

for r = 0 to nreg-1
    Z[dn+r, VL] = results[r];
```

Base<br/>InstructionsSIMD&FP<br/>InstructionsSVE<br/>InstructionsSME<br/>InstructionsIndex by<br/>Encoding

 $Internal\ version\ only: is a\ v33.64,\ AdvSIMD\ v29.12,\ pseudocode\ no\_diffs\_2023\_09\_RC2,\ sve\ v2023-06\_rel\ ;\ Build\ timestamp:\ 2023-09-18T17:56$ 

Sh

Pseu

Copyright © 2010-2023 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.