| by  | Sh          |
|-----|-------------|
| ing | <u>Pseu</u> |

### LDNT1D (scalar plus scalar, consecutive registers)

Contiguous load non-temporal of doublewords to multiple consecutive vectors (scalar index)

Contiguous load non-temporal of doublewords to elements of two or four consecutive vector registers from the memory address generated by a 64-bit scalar base and scalar index which is added to the base address. After each element access the index value is incremented, but the index register is not updated.

Inactive elements will not cause a read from Device memory or signal a fault, and are set to zero in the destination vector.

A non-temporal load is a hint to the system that this data is unlikely to be referenced again soon.

It has encodings from 2 classes: Two registers and Four registers

# Two registers (FEAT SVE2p1)

```
3130292827262524232221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 1 0 0 0 0 0 0 0 Rm 0 1 PNg Rn Zt 1

msz<1>msz<0> N
```

```
LDNT1D { <Zt1>.D-<Zt2>.D }, <PNg>/Z, [<Xn | SP>, <Xm>, LSL #3]
```

```
if !HaveSME2() && !HaveSVE2p1() then UNDEFINED;
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt('1':PNg);
constant integer nreg = 2;
integer t = UInt(Zt:'0');
constant integer esize = 64;
```

# Four registers (FEAT\_SVE2p1)

```
3130292827262524232221201918171615 14 13 121110 9 8 7 6 5 4 3 2 1 0

1 0 1 0 0 0 0 0 0 0 0 Rm 1 1 1 PNg Rn Zt 0 1

msz<1>msz<0> N
```

LDNT1D {  $\langle Zt1 \rangle$ .D- $\langle Zt4 \rangle$ .D },  $\langle PNg \rangle / Z$ , [ $\langle Xn | SP \rangle$ ,  $\langle Xm \rangle$ , LSL #3]

```
if !HaveSME2() && !HaveSVE2p1() then UNDEFINED;
integer n = UInt(Rn);
integer m = UInt(Rm);
integer g = UInt('1':PNg);
constant integer nreg = 4;
integer t = UInt(Zt:'00');
constant integer esize = 64;
```

#### **Assembler Symbols**

<Zt1> For the two registers variant: is the name of the first scalable vector register to be transferred, encoded as "Zt" times 2.

For the four registers variant: is the name of the first scalable vector register to be transferred, encoded as "Zt" times 4.

<Zt4> Is the name of the fourth scalable vector register to be transferred, encoded as "Zt" times 4 plus 3.

<Zt2> Is the name of the second scalable vector register to be transferred, encoded as "Zt" times 2 plus 1.

<PNg> Is the name of the governing scalable predicate register PN8-PN15, with predicate-as-counter encoding, encoded in the "PNg" field.

<Xn|SP> Is the 64-bit name of the general-purpose base register or stack pointer, encoded in the "Rn" field.

<Xm> Is the 64-bit name of the general-purpose offset register, encoded in the "Rm" field.

### **Operation**

else

```
if <a href="HaveSVE2p1">HaveSVE2p1</a>() then <a href="CheckSVEEnabled">CheckSVEEnabled</a>(); else <a href="CheckStreamingSVEEnabled">CheckStreamingSVEEnabled</a>()
constant integer VL = CurrentVL;
constant integer PL = VL DIV 8;
constant integer elements = VL DIV esize;
constant integer mbytes = esize DIV 8;
bits(64) offset;
bits(64) base;
bits(PL) pred = P[g, PL];
bits(PL * nreg) mask = CounterToPredicate(pred<15:0>, PL * nreg);
array [0..3] of bits(VL) values;
boolean contiguous = TRUE;
boolean nontemporal = TRUE;
boolean tagchecked = TRUE;
<u>AccessDescriptor</u> accdesc = <u>CreateAccDescSVE</u> (<u>MemOp_LOAD</u>, nontemporal, co
if !AnyActiveElement (mask, esize) then
     if n == 31 && ConstrainUnpredictableBool(Unpredictable_CHECKSPNONEA
          CheckSPAlignment();
else
     if n == 31 then CheckSPAlignment();
     base = if n == 31 then SP[] else X[n, 64];
     offset = X[m, 64];
for r = 0 to nreg-1
     for e = 0 to elements-1
          if ActivePredicateElement (mask, r * elements + e, esize) then
               bits(64) addr = base + (UInt(offset) + r * elements + e) * m
               Elem[values[r], e, esize] = Mem[addr, mbytes, accdesc];
```

```
\frac{\text{Elem}[\text{values}[\text{r}], \text{ e, esize}] = \frac{\text{Zeros}}{\text{cesize}};
for r = 0 to \text{nreg-1}
\frac{\text{Z}[\text{t+r, VL}] = \text{values}[\text{r}];}
```

#### **Operational information**

If PSTATE.DIT is 1, the timing of this instruction is insensitive to the value of the data being loaded or stored when its governing predicate register contains the same value for each execution.

| <u>Base</u>         | SIMD&FP             | <u>SVE</u>          | <u>SME</u>          | <u>Index by</u> |
|---------------------|---------------------|---------------------|---------------------|-----------------|
| <u>Instructions</u> | <u>Instructions</u> | <u>Instructions</u> | <u>Instructions</u> | Encoding        |

Internal version only: isa v33.64, AdvSIMD v29.12, pseudocode no diffs 2023 09 RC2, sve v2023-06 rel ; Build timestamp: 2023-09-18T17:56

Copyright © 2010-2023 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.

Sh Pseu