Skip to content

Commit

Permalink
[AArch64]SME2 Multi vector Sel Load and Store instructions
Browse files Browse the repository at this point in the history
This patch adds the assembly/disassembly for the following instruction:

   SEL: Multi-vector conditionally select elements from two vectors
        for 2 and 4 registers

Non-constiguous load with stride resgisters:

  LD1B (scalar + immediate): Contiguous load of bytes to multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous load of bytes to multiple strided vectors (scalar index).
  LD1D (scalar + immediate): Contiguous load of doublewords to multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous load of doublewords to multiple strided vectors (scalar index).
  LD1H (scalar + immediate): Contiguous load of halfwords to multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous load of halfwords to multiple strided vectors (scalar index).
  LD1W (scalar + immediate): Contiguous load of words to multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous load of words to multiple strided vectors (scalar index).

  LDNT1B (scalar + immediate): Contiguous load non-temporal of bytes to multiple strided vectors (immediate index).
         (scalar + scalar): Contiguous load non-temporal of bytes to multiple strided vectors (scalar index).
  LDNT1D (scalar + immediate): Contiguous load non-temporal of doublewords to multiple strided vectors (immediate index).
         (scalar + scalar): Contiguous load non-temporal of doublewords to multiple strided vectors (scalar index).
  LDNT1H (scalar + immediate): Contiguous load non-temporal of halfwords to multiple strided vectors (immediate index).
         (scalar + scalar): Contiguous load non-temporal of halfwords to multiple strided vectors (scalar index).
  LDNT1W (scalar + immediate): Contiguous load non-temporal of words to multiple strided vectors (immediate index).
         (scalar + scalar): Contiguous load non-temporal of words to multiple strided vectors (scalar index).

Non-constiguous store with stride resgisters:

  ST1B (scalar + immediate): Contiguous store of bytes from multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous store of bytes from multiple strided vectors (scalar index).
  ST1D (scalar + immediate): Contiguous store of doublewords from multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous store of doublewords from multiple strided vectors (scalar index).
  ST1H (scalar + immediate): Contiguous store of halfwords from multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous store of halfwords from multiple strided vectors (scalar index).
  ST1W (scalar + immediate): Contiguous store of words from multiple strided vectors (immediate index).
       (scalar + scalar): Contiguous store of words from multiple strided vectors (scalar index).

  STNT1B (scalar + immediate): Contiguous store non-temporal of bytes from multiple strided vectors (immediate index).
         (scalar + scalar): Contiguous store non-temporal of bytes from multiple strided vectors (scalar index).
  STNT1D (scalar + immediate): Contiguous store non-temporal of doublewords from multiple strided vectors (immediate index).
         (scalar + scalar): Contiguous store non-temporal of doublewords from multiple strided vectors (scalar index).
  STNT1H (scalar + immediate): Contiguous store non-temporal of halfwords from multiple strided vectors (immediate index).
         (scalar + scalar): Contiguous store non-temporal of halfwords from multiple strided vectors (scalar index).
  STNT1W (scalar + immediate): Contiguous store non-temporal of words from multiple strided vectors (immediate index).
         (scalar + scalar): Contiguous store non-temporal of words from multiple strided vectors (scalar index).

    The reference can be found here:

        https://developer.arm.com/documentation/ddi0602/2022-09

This patch also adds a new SVE vector list to represent the stride loads/stores
ZPRVectorListStrided and the sets of 2 and 4 ZA registers:
ZZ_[b|h|w|d]_strided and ZZZZ_[b|h|w|d]_strided

Differential Revision: https://reviews.llvm.org/D136172
  • Loading branch information
CarolineConcatto committed Nov 10, 2022
1 parent 69665c4 commit ecab1bc
Show file tree
Hide file tree
Showing 68 changed files with 2,973 additions and 68 deletions.
102 changes: 102 additions & 0 deletions llvm/lib/Target/AArch64/AArch64RegisterInfo.td
Expand Up @@ -1274,6 +1274,108 @@ let EncoderMethod = "EncodeRegAsMultipleOf<4>",
}
} // end let EncoderMethod/DecoderMethod

// SME2 strided multi-vector operands

// ZStridedPairs
//
// A group of two Z vectors with strided numbering consisting of:
// Zn+0.T and Zn+8.T
// where n is in the range 0 to 7 and 16 to 23 inclusive, and T is one of B, H,
// S, or D.

// Z0_Z8, Z1_Z9, Z2_Z10, Z3_Z11, Z4_Z12, Z5_Z13, Z6_Z14, Z7_Z15
def ZStridedPairsLo : RegisterTuples<[zsub0, zsub1], [
(trunc (rotl ZPR, 0), 8), (trunc (rotl ZPR, 8), 8)
]>;

// Z16_Z24, Z17_Z25, Z18_Z26, Z19_Z27, Z20_Z28, Z21_Z29, Z22_Z30, Z23_Z31
def ZStridedPairsHi : RegisterTuples<[zsub0, zsub1], [
(trunc (rotl ZPR, 16), 8), (trunc (rotl ZPR, 24), 8)
]>;

// ZStridedQuads
//
// A group of four Z vectors with strided numbering consisting of:
// Zn+0.T, Zn+4.T, Zn+8.T and Zn+12.T
// where n is in the range 0 to 3 and 16 to 19 inclusive, and T is one of B, H,
// S, or D.

// Z0_Z4_Z8_Z12, Z1_Z5_Z9_Z13, Z2_Z6_Z10_Z14, Z3_Z7_Z11_Z15
def ZStridedQuadsLo : RegisterTuples<[zsub0, zsub1, zsub2, zsub3], [
(trunc (rotl ZPR, 0), 4), (trunc (rotl ZPR, 4), 4),
(trunc (rotl ZPR, 8), 4), (trunc (rotl ZPR, 12), 4)
]>;
// Z16_Z20_Z24_Z28, Z17_Z21_Z25_Z29, Z18_Z22_Z26_Z30, Z19_Z23_Z27_Z31
def ZStridedQuadsHi : RegisterTuples<[zsub0, zsub1, zsub2, zsub3], [
(trunc (rotl ZPR, 16), 4), (trunc (rotl ZPR, 20), 4),
(trunc (rotl ZPR, 24), 4), (trunc (rotl ZPR, 28), 4)
]>;

def ZPR2Strided : RegisterClass<"AArch64", [untyped], 256,
(add ZStridedPairsLo, ZStridedPairsHi)> {
let Size = 256;
}
def ZPR4Strided : RegisterClass<"AArch64", [untyped], 512,
(add ZStridedQuadsLo, ZStridedQuadsHi)> {
let Size = 512;
}


class ZPRVectorListStrided<int ElementWidth, int NumRegs, int Stride>
: ZPRVectorList<ElementWidth, NumRegs> {
let Name = "SVEVectorListStrided" # NumRegs # "x" # ElementWidth;
let DiagnosticType = "Invalid" # Name;
let PredicateMethod = "isTypedVectorListStrided<RegKind::SVEDataVector, "
# NumRegs # "," # Stride # "," # ElementWidth # ">";
let RenderMethod = "addStridedVectorListOperands<" # NumRegs # ">";
}

let EncoderMethod = "EncodeZPR2StridedRegisterClass",
DecoderMethod = "DecodeZPR2StridedRegisterClass" in {
def ZZ_b_strided
: RegisterOperand<ZPR2Strided, "printTypedVectorList<0, 'b'>"> {
let ParserMatchClass = ZPRVectorListStrided<8, 2, 8>;
}

def ZZ_h_strided
: RegisterOperand<ZPR2Strided, "printTypedVectorList<0, 'h'>"> {
let ParserMatchClass = ZPRVectorListStrided<16, 2, 8>;
}

def ZZ_s_strided
: RegisterOperand<ZPR2Strided, "printTypedVectorList<0,'s'>"> {
let ParserMatchClass = ZPRVectorListStrided<32, 2, 8>;
}

def ZZ_d_strided
: RegisterOperand<ZPR2Strided, "printTypedVectorList<0,'d'>"> {
let ParserMatchClass = ZPRVectorListStrided<64, 2, 8>;
}
}

let EncoderMethod = "EncodeZPR4StridedRegisterClass",
DecoderMethod = "DecodeZPR4StridedRegisterClass" in {
def ZZZZ_b_strided
: RegisterOperand<ZPR4Strided, "printTypedVectorList<0,'b'>"> {
let ParserMatchClass = ZPRVectorListStrided<8, 4, 4>;
}

def ZZZZ_h_strided
: RegisterOperand<ZPR4Strided, "printTypedVectorList<0,'h'>"> {
let ParserMatchClass = ZPRVectorListStrided<16, 4, 4>;
}

def ZZZZ_s_strided
: RegisterOperand<ZPR4Strided, "printTypedVectorList<0,'s'>"> {
let ParserMatchClass = ZPRVectorListStrided<32, 4, 4>;
}

def ZZZZ_d_strided
: RegisterOperand<ZPR4Strided, "printTypedVectorList<0,'d'>"> {
let ParserMatchClass = ZPRVectorListStrided<64, 4, 4>;
}
}

class ZPRExtendAsmOperand<string ShiftExtend, int RegWidth, int Scale,
bit ScaleAlwaysSame = 0b0> : AsmOperandClass {
let Name = "ZPRExtend" # ShiftExtend # RegWidth # Scale
Expand Down
71 changes: 71 additions & 0 deletions llvm/lib/Target/AArch64/AArch64SMEInstrInfo.td
Expand Up @@ -619,6 +619,77 @@ defm SQRSHRU_VG4_Z4ZI : sme2_sat_shift_vector_vg4<"sqrshru", 0b010>;
defm SQRSHRN_VG4_Z4ZI : sme2_sat_shift_vector_vg4<"sqrshrn", 0b100>;
defm UQRSHRN_VG4_Z4ZI : sme2_sat_shift_vector_vg4<"uqrshrn", 0b101>;
defm SQRSHRUN_VG4_Z4ZI : sme2_sat_shift_vector_vg4<"sqrshrun", 0b110>;

defm SEL_VG2_2ZP2Z2Z: sme2_sel_vector_vg2<"sel">;
defm SEL_VG4_4ZP4Z4Z: sme2_sel_vector_vg4<"sel">;

def LD1B_VG2_M2ZPXX : sme2_ld_vector_vg2_multi_scalar_scalar<0b00, 0b0, ZZ_b_strided, GPR64shifted8, "ld1b">;
def LD1B_VG4_M4ZPXX : sme2_ld_vector_vg4_multi_scalar_scalar<0b00, 0b0, ZZZZ_b_strided, GPR64shifted8, "ld1b">;
defm LD1B_VG2_M2ZPXI : sme2_ld_vector_vg2_multi_scalar_immediate<0b00, 0b0, ZZ_b_strided, simm4s2, "ld1b">;
defm LD1B_VG4_M4ZPXI : sme2_ld_vector_vg4_multi_scalar_immediate<0b00, 0b0, ZZZZ_b_strided, simm4s4, "ld1b">;
def LD1H_VG2_M2ZPXX : sme2_ld_vector_vg2_multi_scalar_scalar<0b01, 0b0, ZZ_h_strided, GPR64shifted16, "ld1h">;
def LD1H_VG4_M4ZPXX : sme2_ld_vector_vg4_multi_scalar_scalar<0b01, 0b0, ZZZZ_h_strided, GPR64shifted16, "ld1h">;
defm LD1H_VG2_M2ZPXI : sme2_ld_vector_vg2_multi_scalar_immediate<0b01, 0b0, ZZ_h_strided, simm4s2, "ld1h">;
defm LD1H_VG4_M4ZPXI : sme2_ld_vector_vg4_multi_scalar_immediate<0b01, 0b0, ZZZZ_h_strided, simm4s4, "ld1h">;
def LD1W_VG2_M2ZPXX : sme2_ld_vector_vg2_multi_scalar_scalar<0b10, 0b0, ZZ_s_strided, GPR64shifted32, "ld1w">;
def LD1W_VG4_M4ZPXX : sme2_ld_vector_vg4_multi_scalar_scalar<0b10, 0b0, ZZZZ_s_strided, GPR64shifted32, "ld1w">;
defm LD1W_VG2_M2ZPXI : sme2_ld_vector_vg2_multi_scalar_immediate<0b10, 0b0, ZZ_s_strided, simm4s2, "ld1w">;
defm LD1W_VG4_M4ZPXI : sme2_ld_vector_vg4_multi_scalar_immediate<0b10, 0b0, ZZZZ_s_strided, simm4s4, "ld1w">;
def LD1D_VG2_M2ZPXX : sme2_ld_vector_vg2_multi_scalar_scalar<0b11, 0b0, ZZ_d_strided, GPR64shifted64, "ld1d">;
def LD1D_VG4_M4ZPXX : sme2_ld_vector_vg4_multi_scalar_scalar<0b11, 0b0, ZZZZ_d_strided, GPR64shifted64, "ld1d">;
defm LD1D_VG2_M2ZPXI : sme2_ld_vector_vg2_multi_scalar_immediate<0b11, 0b0, ZZ_d_strided, simm4s2, "ld1d">;
defm LD1D_VG4_M4ZPXI : sme2_ld_vector_vg4_multi_scalar_immediate<0b11, 0b0, ZZZZ_d_strided, simm4s4, "ld1d">;

def LDNT1B_VG2_M2ZPXX : sme2_ld_vector_vg2_multi_scalar_scalar<0b00, 0b1, ZZ_b_strided, GPR64shifted8, "ldnt1b">;
def LDNT1B_VG4_M4ZPXX : sme2_ld_vector_vg4_multi_scalar_scalar<0b00, 0b1, ZZZZ_b_strided, GPR64shifted8, "ldnt1b">;
defm LDNT1B_VG2_M2ZPXI : sme2_ld_vector_vg2_multi_scalar_immediate<0b00, 0b1, ZZ_b_strided, simm4s2, "ldnt1b">;
defm LDNT1B_VG4_M4ZPXI : sme2_ld_vector_vg4_multi_scalar_immediate<0b00, 0b1, ZZZZ_b_strided, simm4s4, "ldnt1b">;
def LDNT1H_VG2_M2ZPXX : sme2_ld_vector_vg2_multi_scalar_scalar<0b01, 0b1, ZZ_h_strided, GPR64shifted16, "ldnt1h">;
def LDNT1H_VG4_M4ZPXX : sme2_ld_vector_vg4_multi_scalar_scalar<0b01, 0b1, ZZZZ_h_strided, GPR64shifted16, "ldnt1h">;
defm LDNT1H_VG2_M2ZPXI : sme2_ld_vector_vg2_multi_scalar_immediate<0b01, 0b1, ZZ_h_strided, simm4s2, "ldnt1h">;
defm LDNT1H_VG4_M4ZPXI : sme2_ld_vector_vg4_multi_scalar_immediate<0b01, 0b1, ZZZZ_h_strided, simm4s4, "ldnt1h">;
def LDNT1W_VG2_M2ZPXX : sme2_ld_vector_vg2_multi_scalar_scalar<0b10, 0b1, ZZ_s_strided, GPR64shifted32, "ldnt1w">;
def LDNT1W_VG4_M4ZPXX : sme2_ld_vector_vg4_multi_scalar_scalar<0b10, 0b1, ZZZZ_s_strided, GPR64shifted32, "ldnt1w">;
defm LDNT1W_VG2_M2ZPXI : sme2_ld_vector_vg2_multi_scalar_immediate<0b10, 0b1, ZZ_s_strided, simm4s2, "ldnt1w">;
defm LDNT1W_VG4_M4ZPXI : sme2_ld_vector_vg4_multi_scalar_immediate<0b10, 0b1, ZZZZ_s_strided, simm4s4, "ldnt1w">;
def LDNT1D_VG2_M2ZPXX : sme2_ld_vector_vg2_multi_scalar_scalar<0b11, 0b1, ZZ_d_strided, GPR64shifted64, "ldnt1d">;
def LDNT1D_VG4_M4ZPXX : sme2_ld_vector_vg4_multi_scalar_scalar<0b11, 0b1, ZZZZ_d_strided, GPR64shifted64, "ldnt1d">;
defm LDNT1D_VG2_M2ZPXI : sme2_ld_vector_vg2_multi_scalar_immediate<0b11, 0b1, ZZ_d_strided, simm4s2, "ldnt1d">;
defm LDNT1D_VG4_M4ZPXI : sme2_ld_vector_vg4_multi_scalar_immediate<0b11, 0b1, ZZZZ_d_strided, simm4s4, "ldnt1d">;

def ST1B_VG2_M2ZPXX : sme2_st_vector_vg2_multi_scalar_scalar<0b00, 0b0, ZZ_b_strided, GPR64shifted8, "st1b">;
def ST1B_VG4_M4ZPXX : sme2_st_vector_vg4_multi_scalar_scalar<0b00, 0b0, ZZZZ_b_strided, GPR64shifted8, "st1b">;
defm ST1B_VG2_M2ZPXI : sme2_st_vector_vg2_multi_scalar_immediate<0b00, 0b0, ZZ_b_strided, simm4s2, "st1b">;
defm ST1B_VG4_M4ZPXI : sme2_st_vector_vg4_multi_scalar_immediate<0b00, 0b0, ZZZZ_b_strided, simm4s4, "st1b">;
def ST1H_VG2_M2ZPXX : sme2_st_vector_vg2_multi_scalar_scalar<0b01, 0b0, ZZ_h_strided, GPR64shifted16, "st1h">;
def ST1H_VG4_M4ZPXX : sme2_st_vector_vg4_multi_scalar_scalar<0b01, 0b0, ZZZZ_h_strided, GPR64shifted16, "st1h">;
defm ST1H_VG2_M2ZPXI : sme2_st_vector_vg2_multi_scalar_immediate<0b01, 0b0, ZZ_h_strided, simm4s2, "st1h">;
defm ST1H_VG4_M4ZPXI : sme2_st_vector_vg4_multi_scalar_immediate<0b01, 0b0, ZZZZ_h_strided, simm4s4, "st1h">;
def ST1W_VG2_M2ZPXX : sme2_st_vector_vg2_multi_scalar_scalar<0b10, 0b0, ZZ_s_strided, GPR64shifted32, "st1w">;
def ST1W_VG4_M4ZPXX : sme2_st_vector_vg4_multi_scalar_scalar<0b10, 0b0, ZZZZ_s_strided, GPR64shifted32, "st1w">;
defm ST1W_VG2_M2ZPXI : sme2_st_vector_vg2_multi_scalar_immediate<0b10, 0b0, ZZ_s_strided, simm4s2, "st1w">;
defm ST1W_VG4_M4ZPXI : sme2_st_vector_vg4_multi_scalar_immediate<0b10, 0b0, ZZZZ_s_strided, simm4s4, "st1w">;
def ST1D_VG2_M2ZPXX : sme2_st_vector_vg2_multi_scalar_scalar<0b11, 0b0, ZZ_d_strided, GPR64shifted64, "st1d">;
def ST1D_VG4_M4ZPXX : sme2_st_vector_vg4_multi_scalar_scalar<0b11, 0b0, ZZZZ_d_strided, GPR64shifted64, "st1d">;
defm ST1D_VG2_M2ZPXI : sme2_st_vector_vg2_multi_scalar_immediate<0b11, 0b0, ZZ_d_strided, simm4s2, "st1d">;
defm ST1D_VG4_M4ZPXI : sme2_st_vector_vg4_multi_scalar_immediate<0b11, 0b0, ZZZZ_d_strided, simm4s4, "st1d">;

def STNT1B_VG2_M2ZPXX : sme2_st_vector_vg2_multi_scalar_scalar<0b00, 0b1, ZZ_b_strided, GPR64shifted8, "stnt1b">;
def STNT1B_VG4_M4ZPXX : sme2_st_vector_vg4_multi_scalar_scalar<0b00, 0b1, ZZZZ_b_strided, GPR64shifted8, "stnt1b">;
defm STNT1B_VG2_M2ZPXI : sme2_st_vector_vg2_multi_scalar_immediate<0b00, 0b1, ZZ_b_strided, simm4s2, "stnt1b">;
defm STNT1B_VG4_M4ZPXI : sme2_st_vector_vg4_multi_scalar_immediate<0b00, 0b1, ZZZZ_b_strided, simm4s4, "stnt1b">;
def STNT1H_VG2_M2ZPXX : sme2_st_vector_vg2_multi_scalar_scalar<0b01, 0b1, ZZ_h_strided, GPR64shifted16, "stnt1h">;
def STNT1H_VG4_M4ZPXX : sme2_st_vector_vg4_multi_scalar_scalar<0b01, 0b1, ZZZZ_h_strided, GPR64shifted16, "stnt1h">;
defm STNT1H_VG2_M2ZPXI : sme2_st_vector_vg2_multi_scalar_immediate<0b01, 0b1, ZZ_h_strided, simm4s2, "stnt1h">;
defm STNT1H_VG4_M4ZPXI : sme2_st_vector_vg4_multi_scalar_immediate<0b01, 0b1, ZZZZ_h_strided, simm4s4, "stnt1h">;
def STNT1W_VG2_M2ZPXX : sme2_st_vector_vg2_multi_scalar_scalar<0b10, 0b1, ZZ_s_strided, GPR64shifted32, "stnt1w">;
def STNT1W_VG4_M4ZPXX : sme2_st_vector_vg4_multi_scalar_scalar<0b10, 0b1, ZZZZ_s_strided, GPR64shifted32, "stnt1w">;
defm STNT1W_VG2_M2ZPXI : sme2_st_vector_vg2_multi_scalar_immediate<0b10, 0b1, ZZ_s_strided, simm4s2, "stnt1w">;
defm STNT1W_VG4_M4ZPXI : sme2_st_vector_vg4_multi_scalar_immediate<0b10, 0b1, ZZZZ_s_strided, simm4s4, "stnt1w">;
def STNT1D_VG2_M2ZPXX : sme2_st_vector_vg2_multi_scalar_scalar<0b11, 0b1, ZZ_d_strided, GPR64shifted64, "stnt1d">;
def STNT1D_VG4_M4ZPXX : sme2_st_vector_vg4_multi_scalar_scalar<0b11, 0b1, ZZZZ_d_strided, GPR64shifted64, "stnt1d">;
defm STNT1D_VG2_M2ZPXI : sme2_st_vector_vg2_multi_scalar_immediate<0b11, 0b1, ZZ_d_strided, simm4s2, "stnt1d">;
defm STNT1D_VG4_M4ZPXI : sme2_st_vector_vg4_multi_scalar_immediate<0b11, 0b1, ZZZZ_d_strided, simm4s4, "stnt1d">;
}

let Predicates = [HasSME2, HasSMEI16I64] in {
Expand Down

0 comments on commit ecab1bc

Please sign in to comment.