BaseSIMD&FPSVESMEIndex byInstructionsInstructionsInstructionsInstructionsEncoding

Sh

Pseu

## CPYFPTN, CPYFMTN, CPYFETN

Memory Copy Forward-only, reads and writes unprivileged and non-temporal. These instructions perform a memory copy. The prologue, main, and epilogue instructions are expected to be run in succession and to appear consecutively in memory: CPYFPTN, then CPYFMTN, and then CPYFETN. CPYFPTN performs some preconditioning of the arguments suitable for using the CPYFMTN instruction, and performs an implementation defined amount of the memory copy. CPYFMTN performs an implementation defined amount of the memory copy. CPYFETN performs the last part of the memory copy.

#### Note

The inclusion of implementation defined amounts of memory copy allows some optimization of the size that can be performed.

The memory copy performed by these instructions is in the forward direction only, so the instructions are suitable for a memory copy only where there is no overlap between the source and destination locations, or where the source address is greater than the destination address.

The architecture supports two algorithms for the memory copy: option A and option B. Which algorithm is used is implementation defined.

#### Note

Portable software should not assume that the choice of algorithm is constant.

After execution of CPYFPTN, option A (which results in encoding PSTATE.C = 0):

- Xs holds the original Xs + saturated Xn.
- Xd holds the original Xd + saturated Xn.
- Xn holds -1\* saturated Xn + an implementation defined number of bytes copied.
- PSTATE.{N,Z,V} are set to {0,0,0}.

After execution of CPYFPTN, option B (which results in encoding PSTATE.C = 1):

- Xs holds the original Xs + an implementation defined number of bytes copied.

- Xd holds the original Xd + an implementation defined number of bytes copied.
- Xn holds the saturated Xn an implementation defined number of bytes copied.
- PSTATE.{N,Z,V} are set to {0,0,0}.

For CPYFMTN, option A (encoded by PSTATE.C = 0), the format of the arguments is:

- Xn is treated as a signed 64-bit number and holds -1\* the number of bytes remaining to be copied in the memory copy in total.
- Xs holds the lowest address that the copy is copied from -Xn.
- Xd holds the lowest address that the copy is made to -Xn.
- At the end of the instruction, the value of Xn is written back with -1\* the number of bytes remaining to be copied in the memory copy in total.

For CPYFMTN, option B (encoded by PSTATE.C = 1), the format of the arguments is:

- Xn holds the number of bytes remaining to be copied in the memory copy in total.
- Xs holds the lowest address that the copy is copied from.
- Xd holds the lowest address that the copy is copied to.
- At the end of the instruction:
  - the value of Xn is written back with the number of bytes remaining to be copied in the memory copy in total.
  - the value of Xs is written back with the lowest address that has not been copied from.
  - the value of Xd is written back with the lowest address that has not been copied to.

For CPYFETN, option A (encoded by PSTATE.C = 0), the format of the arguments is:

- Xn is treated as a signed 64-bit number and holds -1\* the number of bytes remaining to be copied in the memory copy in total.
- Xs holds the lowest address that the copy is copied from -Xn.
- Xd holds the lowest address that the copy is made to -Xn.
- At the end of the instruction, the value of Xn is written back with 0.

For CPYFETN, option B (encoded by PSTATE.C = 1), the format of the arguments is:

- Xn holds the number of bytes remaining to be copied in the memory copy in total.
- Xs holds the lowest address that the copy is copied from.
- Xd holds the lowest address that the copy is copied to.
- At the end of the instruction:
  - the value of Xn is written back with 0.
  - the value of Xs is written back with the lowest address that has not been copied from.

• the value of Xd is written back with the lowest address that has not been copied to.

```
Integer (FEAT_MOPS)
```

```
31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

SZ 0 1 1 0 0 1 op1 0 Rs 1 1 1 1 0 1 Rn Rd

op2
```

```
Epiloque (op1 == 10)
       CPYFETN [<Xd>]!, [<Xs>]!, <Xn>!
Main (op1 == 01)
       CPYFMTN [<Xd>]!, [<Xs>]!, <Xn>!
Prologue (op1 == 00)
       CPYFPTN [<Xd>]!, [<Xs>]!, <Xn>!
   if !IsFeatureImplemented(FEAT_MOPS) | sz != '00' then UNDEFINED;
   integer d = UInt(Rd);
   integer s = UInt(Rs);
   integer n = UInt(Rn);
   bits (4) options = op2;
   boolean rnontemporal = options<3> == '1';
   boolean wnontemporal = options<2> == '1';
   MOPSStage stage;
   case op1 of
       when '00' stage = <a href="MOPSStage_Prologue">MOPSStage_Prologue</a>;
       when '01' stage = MOPSStage Main;
       when '10' stage = MOPSStage_Epilogue;
       otherwise SEE "Memory Copy and Memory Set";
   CheckMOPSEnabled();
   if s == n || s == d || n == d || d == 31 || s == 31 || n == 31 then
       Constraint c = ConstrainUnpredictable (Unpredictable_MOPSOVERLAP31);
       assert c IN {Constraint_UNDEF, Constraint_NOP};
```

For information about the constrained unpredictable behavior of this instruction, see *Architectural Constraints on UNPREDICTABLE behaviors*, and particularly *Memory Copy and Memory Set CPY\**.

EndOfInstruction();

when <a href="Constraint\_UNDEF">Constraint\_UNDEF</a> UNDEFINED;

when Constraint NOP

case c of

### **Assembler Symbols**

< Xd >

For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an encoding of the destination address, encoded in the "Rd" field.

For the prologue variant: is the 64-bit name of the generalpurpose register that holds the destination address and is updated by the instruction, encoded in the "Rd" field.

< Xs >

For the epilogue and main variant: is the 64-bit name of the general-purpose register that holds an encoding of the source address, encoded in the "Rs" field.

For the prologue variant: is the 64-bit name of the generalpurpose register that holds the source address and is updated by the instruction, encoded in the "Rs" field.

<Xn>

For the epilogue variant: is the 64-bit name of the general-purpose register that holds an encoding of the number of bytes to be transferred and is set to zero at the end of the instruction, encoded in the "Rn" field.

For the main variant: is the 64-bit name of the generalpurpose register that holds an encoding of the number of bytes to be transferred, encoded in the "Rn" field.

For the prologue variant: is the 64-bit name of the generalpurpose register that holds the number of bytes to be transferred and is updated by the instruction to encode the remaining size and destination, encoded in the "Rn" field.

# **Operation**

```
constant integer N = MaxBlockSizeCopiedBytes();
bits(64) toaddress = \underline{X}[d, 64];
bits(64) fromaddress = \underline{X}[s, 64];
bits(64) cpysize = X[n, 64];
bits (4) nzcv = PSTATE. <N, Z, C, V>;
bits(8*N) readdata;
boolean implements_option_a = CPYFOptionA();
boolean rprivileged = if options<1> == '1' then <u>AArch64.IsUnprivAccessEdate</u> boolean wprivileged = if options<0> == '1' then <u>AArch64.IsUnprivAccessEdate</u>
AccessDescriptor raccdesc = CreateAccDescMOPS (MemOp_LOAD, rprivileged,
AccessDescriptor waccdesc = CreateAccDescMOPS (MemOp STORE, wprivileged,
if stage == MOPSStage_Prologue then
     if cpysize<63> == '1' then cpysize = 0x7FFFFFFFFFFFFFFF63:0>;
     if implements_option_a then
         nzcv = '0000';
          // Copy in the forward direction offsets the arguments.
          toaddress = toaddress + cpysize;
          fromaddress = fromaddress + cpysize;
          cpysize = Zeros(64) - cpysize;
```

```
else
        nzcv = '0010';
else
    CheckMemCpyParams(stage, implements_option_a, nzcv, options, d, s,
bits (64) stagecpysize = MemCpyStageSize (stage, toaddress, fromaddress,
if implements_option_a then
    while <u>SInt</u>(stagecpysize) != 0 do
        // IMP DEF selection of the block size that is worked on. While
        // implementations might make this constant, that is not assume
        constant integer B = <u>CPYSizeChoice</u>(toaddress, fromaddress, cpys
        assert B <= -1 * <u>SInt</u>(stagecpysize);
        readdata<B*8-1:0> = Mem[fromaddress+cpysize, B, raccdesc];
        Mem[toaddress+cpysize, B, waccdesc] = readdata<B*8-1:0>;
        cpysize = cpysize + B;
        stagecpysize = stagecpysize + B;
        if stage != MOPSStage_Prologue then
            X[n, 64] = cpysize;
else
    while \underline{UInt} (stagecpysize) > 0 do
        // IMP DEF selection of the block size that is worked on. While
        // implementations might make this constant, that is not assume
        constant integer B = <u>CPYSizeChoice</u>(toaddress, fromaddress, cpys
        assert B <= UInt(stagecpysize);</pre>
        readdata < B * 8-1:0 > = Mem[fromaddress, B, raccdesc];
        Mem[toaddress, B, waccdesc] = readdata<B*8-1:0>;
        fromaddress = fromaddress + B;
        toaddress = toaddress + B;
        cpysize = cpysize - B;
        stagecpysize = stagecpysize - B;
        if stage != MOPSStage_Prologue then
             X[n, 64] = cpysize;
             X[d, 64] = toaddress;
            X[s, 64] = fromaddress;
if stage == MOPSStage Prologue then
    X[n, 64] = cpysize;
    X[d, 64] = toaddress;
    X[s, 64] = fromaddress;
    PSTATE.\langle N, Z, C, V \rangle = nzcv;
```

Internal version only: isa v33.64, AdvSIMD v29.12, pseudocode no diffs 2023 09 RC2, sve v2023-06 rel; Build timestamp: 2023-09-18T17:56

**SVE** 

<u>Instructions</u>

**SME** 

**Instructions** 

Index by

**Encoding** 

Sh

<u>Pseu</u>

SIMD&FP

**Instructions** 

Base Instructions

Copyright © 2010-2023 Arm Limited or its affiliates. All rights reserved. This document is Non-Confidential.