[AArch64] Add support for Qualcomm Oryon processor #91022

wxz2020 · 2024-05-03T22:10:41Z

Oryon is an ARM V8 AArch64 CPU from Qualcomm.

From llvm/llvm-project#91022 Easy enough

pinskia · 2024-05-04T08:18:30Z

Note the corresponding GCC patch:
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650664.html

wxz2020 · 2024-05-06T22:59:15Z

llvm/lib/Target/AArch64/AArch64.td

+def MTEUnsupported : AArch64Unsupported {
+  let F = [HasMTE];
+}
+


It is referenced in AArch64SchedOryon.td on line 33. The Oryon CPU does not support MTE.

…d remove some engineering notes

llvmbot · 2024-05-07T16:48:39Z

@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Wei Zhao (wxz2020)

Changes

Oryon is an ARM V8 AArch64 CPU from Qualcomm.

Patch is 73.24 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/91022.diff

10 Files Affected:

(added) clang/test/Driver/aarch64-oryon-1.c (+19)
(modified) clang/test/Misc/target-invalid-cpu-note.c (+2-2)
(modified) llvm/include/llvm/TargetParser/AArch64TargetParser.h (+5)
(modified) llvm/lib/Target/AArch64/AArch64.td (+5)
(modified) llvm/lib/Target/AArch64/AArch64Processors.td (+30)
(added) llvm/lib/Target/AArch64/AArch64SchedOryon.td (+1664)
(modified) llvm/lib/Target/AArch64/AArch64Subtarget.cpp (+7)
(modified) llvm/lib/TargetParser/Host.cpp (+1)
(modified) llvm/unittests/TargetParser/Host.cpp (+3)
(modified) llvm/unittests/TargetParser/TargetParserTest.cpp (+14-2)

diff --git a/clang/test/Driver/aarch64-oryon-1.c b/clang/test/Driver/aarch64-oryon-1.c
new file mode 100644
index 000000000000..952ba5df74ba
--- /dev/null
+++ b/clang/test/Driver/aarch64-oryon-1.c
@@ -0,0 +1,19 @@
+// RUN: %clang -target aarch64 -mcpu=oryon-1 -### -c %s 2>&1 | FileCheck -check-prefix=Phoenix %s
+// RUN: %clang -target aarch64 -mlittle-endian -mcpu=oryon-1 -### -c %s 2>&1 | FileCheck -check-prefix=Phoenix %s
+// RUN: %clang -target aarch64_be -mlittle-endian -mcpu=oryon-1 -### -c %s 2>&1 | FileCheck -check-prefix=Phoenix %s
+// RUN: %clang -target aarch64 -mtune=oryon-1 -### -c %s 2>&1 | FileCheck -check-prefix=Phoenix-TUNE %s
+// RUN: %clang -target aarch64 -mlittle-endian -mtune=oryon-1 -### -c %s 2>&1 | FileCheck -check-prefix=Phoenix-TUNE %s
+// RUN: %clang -target aarch64_be -mlittle-endian -mtune=oryon-1 -### -c %s 2>&1 | FileCheck -check-prefix=Phoenix-TUNE %s
+// Phoenix: "-cc1"{{.*}} "-triple" "aarch64{{(--)?}}"{{.*}} "-target-cpu" "oryon-1" "-target-feature" "+v8.6a"
+// Phoenix-TUNE: "-cc1"{{.*}} "-triple" "aarch64{{(--)?}}"{{.*}} "-target-cpu" "generic"
+
+// RUN: %clang -target arm64 -mcpu=oryon-1 -### -c %s 2>&1 | FileCheck -check-prefix=ARM64-Phoenix %s
+// RUN: %clang -target arm64 -mlittle-endian -mcpu=oryon-1 -### -c %s 2>&1 | FileCheck -check-prefix=ARM64-Phoenix %s
+// RUN: %clang -target arm64 -mtune=oryon-1 -### -c %s 2>&1 | FileCheck -check-prefix=ARM64-Phoenix-TUNE %s
+// RUN: %clang -target arm64 -mlittle-endian -mtune=oryon-1 -### -c %s 2>&1 | FileCheck -check-prefix=ARM64-Phoenix-TUNE %s
+// ARM64-Phoenix: "-cc1"{{.*}} "-triple" "arm64{{.*}}" "-target-cpu" "oryon-1" "-target-feature" "+v8.6a"
+// ARM64-Phoenix-TUNE: "-cc1"{{.*}} "-triple" "arm64{{.*}}" "-target-cpu" "generic"
+
+// RUN: %clang -target aarch64 -mcpu=oryon-1 -mtune=cortex-a53 -### -c %s 2>&1 | FileCheck -check-prefix=MCPU-MTUNE-Phoenix %s
+// RUN: %clang -target aarch64 -mtune=cortex-a53 -mcpu=oryon-1  -### -c %s 2>&1 | FileCheck -check-prefix=MCPU-MTUNE-Phoenix %s
+// MCPU-MTUNE-Phoenix: "-cc1"{{.*}} "-triple" "aarch64{{.*}}" "-target-cpu" "oryon-1"
diff --git a/clang/test/Misc/target-invalid-cpu-note.c b/clang/test/Misc/target-invalid-cpu-note.c
index 768b243b04e3..a71ebd6a023e 100644
--- a/clang/test/Misc/target-invalid-cpu-note.c
+++ b/clang/test/Misc/target-invalid-cpu-note.c
@@ -5,11 +5,11 @@
 
 // RUN: not %clang_cc1 -triple arm64--- -target-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix AARCH64
 // AARCH64: error: unknown target CPU 'not-a-cpu'
-// AARCH64-NEXT: note: valid target CPU values are: cortex-a34, cortex-a35, cortex-a53, cortex-a55, cortex-a510, cortex-a520, cortex-a520ae, cortex-a57, cortex-a65, cortex-a65ae, cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, cortex-a77, cortex-a78, cortex-a78ae, cortex-a78c, cortex-a710, cortex-a715, cortex-a720, cortex-a720ae, cortex-r82, cortex-r82ae, cortex-x1, cortex-x1c, cortex-x2, cortex-x3, cortex-x4, neoverse-e1, neoverse-n1, neoverse-n2, neoverse-n3, neoverse-512tvb, neoverse-v1, neoverse-v2, neoverse-v3, neoverse-v3ae, cyclone, apple-a7, apple-a8, apple-a9, apple-a10, apple-a11, apple-a12, apple-a13, apple-a14, apple-a15, apple-a16, apple-a17, apple-m1, apple-m2, apple-m3, apple-s4, apple-s5, exynos-m3, exynos-m4, exynos-m5, falkor, saphira, kryo, thunderx2t99, thunderx3t110, thunderx, thunderxt88, thunderxt81, thunderxt83, tsv110, a64fx, carmel, ampere1, ampere1a, ampere1b, cobalt-100, grace{{$}}
+// AARCH64-NEXT: note: valid target CPU values are: cortex-a34, cortex-a35, cortex-a53, cortex-a55, cortex-a510, cortex-a520, cortex-a520ae, cortex-a57, cortex-a65, cortex-a65ae, cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, cortex-a77, cortex-a78, cortex-a78ae, cortex-a78c, cortex-a710, cortex-a715, cortex-a720, cortex-a720ae, cortex-r82, cortex-r82ae, cortex-x1, cortex-x1c, cortex-x2, cortex-x3, cortex-x4, neoverse-e1, neoverse-n1, neoverse-n2, neoverse-n3, neoverse-512tvb, neoverse-v1, neoverse-v2, neoverse-v3, neoverse-v3ae, cyclone, apple-a7, apple-a8, apple-a9, apple-a10, apple-a11, apple-a12, apple-a13, apple-a14, apple-a15, apple-a16, apple-a17, apple-m1, apple-m2, apple-m3, apple-s4, apple-s5, exynos-m3, exynos-m4, exynos-m5, falkor, saphira, kryo, thunderx2t99, thunderx3t110, thunderx, thunderxt88, thunderxt81, thunderxt83, tsv110, a64fx, carmel, ampere1, ampere1a, ampere1b, oryon-1, cobalt-100, grace{{$}}
 
 // RUN: not %clang_cc1 -triple arm64--- -tune-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix TUNE_AARCH64
 // TUNE_AARCH64: error: unknown target CPU 'not-a-cpu'
-// TUNE_AARCH64-NEXT: note: valid target CPU values are: cortex-a34, cortex-a35, cortex-a53, cortex-a55, cortex-a510, cortex-a520, cortex-a520ae, cortex-a57, cortex-a65, cortex-a65ae, cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, cortex-a77, cortex-a78, cortex-a78ae, cortex-a78c, cortex-a710, cortex-a715, cortex-a720, cortex-a720ae, cortex-r82, cortex-r82ae, cortex-x1, cortex-x1c, cortex-x2, cortex-x3, cortex-x4, neoverse-e1, neoverse-n1, neoverse-n2, neoverse-n3, neoverse-512tvb, neoverse-v1, neoverse-v2, neoverse-v3, neoverse-v3ae, cyclone, apple-a7, apple-a8, apple-a9, apple-a10, apple-a11, apple-a12, apple-a13, apple-a14, apple-a15, apple-a16, apple-a17, apple-m1, apple-m2, apple-m3, apple-s4, apple-s5, exynos-m3, exynos-m4, exynos-m5, falkor, saphira, kryo, thunderx2t99, thunderx3t110, thunderx, thunderxt88, thunderxt81, thunderxt83, tsv110, a64fx, carmel, ampere1, ampere1a, ampere1b, cobalt-100, grace{{$}}
+// TUNE_AARCH64-NEXT: note: valid target CPU values are: cortex-a34, cortex-a35, cortex-a53, cortex-a55, cortex-a510, cortex-a520, cortex-a520ae, cortex-a57, cortex-a65, cortex-a65ae, cortex-a72, cortex-a73, cortex-a75, cortex-a76, cortex-a76ae, cortex-a77, cortex-a78, cortex-a78ae, cortex-a78c, cortex-a710, cortex-a715, cortex-a720, cortex-a720ae, cortex-r82, cortex-r82ae, cortex-x1, cortex-x1c, cortex-x2, cortex-x3, cortex-x4, neoverse-e1, neoverse-n1, neoverse-n2, neoverse-n3, neoverse-512tvb, neoverse-v1, neoverse-v2, neoverse-v3, neoverse-v3ae, cyclone, apple-a7, apple-a8, apple-a9, apple-a10, apple-a11, apple-a12, apple-a13, apple-a14, apple-a15, apple-a16, apple-a17, apple-m1, apple-m2, apple-m3, apple-s4, apple-s5, exynos-m3, exynos-m4, exynos-m5, falkor, saphira, kryo, thunderx2t99, thunderx3t110, thunderx, thunderxt88, thunderxt81, thunderxt83, tsv110, a64fx, carmel, ampere1, ampere1a, ampere1b, oryon-1, cobalt-100, grace{{$}}
 
 // RUN: not %clang_cc1 -triple i386--- -target-cpu not-a-cpu -fsyntax-only %s 2>&1 | FileCheck %s --check-prefix X86
 // X86: error: unknown target CPU 'not-a-cpu'
diff --git a/llvm/include/llvm/TargetParser/AArch64TargetParser.h b/llvm/include/llvm/TargetParser/AArch64TargetParser.h
index 04fbaf07adfb..e2682bc4b331 100644
--- a/llvm/include/llvm/TargetParser/AArch64TargetParser.h
+++ b/llvm/include/llvm/TargetParser/AArch64TargetParser.h
@@ -786,6 +786,11 @@ inline constexpr CpuInfo CpuInfos[] = {
                                AArch64::AEK_SHA2, AArch64::AEK_AES,
                                AArch64::AEK_MTE, AArch64::AEK_SB,
                                AArch64::AEK_SSBS, AArch64::AEK_CSSC})},
+    {"oryon-1", ARMV8_6A,
+     (AArch64::ExtensionBitset({AArch64::AEK_AES, AArch64::AEK_CRYPTO,
+                                AArch64::AEK_RAND, AArch64::AEK_SM4,
+                                AArch64::AEK_SHA3, AArch64::AEK_SHA2,
+                                AArch64::AEK_PROFILE}))},
 };
 
 // Name alias.
diff --git a/llvm/lib/Target/AArch64/AArch64.td b/llvm/lib/Target/AArch64/AArch64.td
index 4b2ce0d73949..5708b6173750 100644
--- a/llvm/lib/Target/AArch64/AArch64.td
+++ b/llvm/lib/Target/AArch64/AArch64.td
@@ -85,6 +85,10 @@ def SMEUnsupported : AArch64Unsupported {
                       SME2Unsupported.F);
 }
 
+def MTEUnsupported : AArch64Unsupported {
+  let F = [HasMTE];
+}
+
 let F = [HasPAuth, HasPAuthLR] in
 def PAUnsupported : AArch64Unsupported;
 
@@ -109,6 +113,7 @@ include "AArch64SchedNeoverseN1.td"
 include "AArch64SchedNeoverseN2.td"
 include "AArch64SchedNeoverseV1.td"
 include "AArch64SchedNeoverseV2.td"
+include "AArch64SchedOryon.td"
 
 include "AArch64Processors.td"
 
diff --git a/llvm/lib/Target/AArch64/AArch64Processors.td b/llvm/lib/Target/AArch64/AArch64Processors.td
index f2286ae17dba..eca9eb859448 100644
--- a/llvm/lib/Target/AArch64/AArch64Processors.td
+++ b/llvm/lib/Target/AArch64/AArch64Processors.td
@@ -616,6 +616,27 @@ def TuneAmpere1B : SubtargetFeature<"ampere1b", "ARMProcFamily", "Ampere1B",
                                     FeatureLdpAlignedOnly,
                                     FeatureStpAlignedOnly]>;
 
+def TuneOryon  : SubtargetFeature<"oryon-1", "ARMProcFamily",
+                                    "Oryon",
+                                    "Nuvia Inc Oryon processors", [
+                                    FeatureCrypto,
+                                    FeatureFPARMv8,
+                                    FeatureNEON,
+                                    FeatureFuseAES,
+                                    FeatureFuseAdrpAdd,
+                                    FeatureEnableSelectOptimize,
+                                    FeatureFuseCryptoEOR,
+                                    FeatureFuseAddress,
+                                    FeatureSM4,
+                                    FeatureSHA2,
+                                    FeatureSHA3,
+                                    FeatureAES,
+                                    FeatureFullFP16,
+                                    FeatureFP16FML,
+                                    FeaturePerfMon,
+                                    FeatureSPE,
+                                    FeaturePostRAScheduler,
+                                    HasV8_6aOps]>;
 
 def ProcessorFeatures {
   list<SubtargetFeature> A53  = [HasV8_0aOps, FeatureCRC, FeatureCrypto,
@@ -805,6 +826,11 @@ def ProcessorFeatures {
                                      FeatureSHA3, FeatureAES, FeatureCSSC,
                                      FeatureWFxT, FeatureFullFP16];
 
+  list<SubtargetFeature> Oryon = [HasV8_6aOps, FeatureNEON, FeaturePerfMon,
+                                     FeatureCrypto, FeatureRandGen,
+                                     FeaturePAuth, FeatureSM4, FeatureSHA2,
+                                     FeatureSHA3, FeatureAES];
+
   // ETE and TRBE are future architecture extensions. We temporarily enable them
   // by default for users targeting generic AArch64. The extensions do not
   // affect code generated by the compiler and can be used only by explicitly
@@ -987,3 +1013,7 @@ def : ProcessorModel<"ampere1a", Ampere1Model, ProcessorFeatures.Ampere1A,
 
 def : ProcessorModel<"ampere1b", Ampere1BModel, ProcessorFeatures.Ampere1B,
                      [TuneAmpere1B]>;
+
+// Qualcomm Oryon
+def : ProcessorModel<"oryon-1", OryonModel, ProcessorFeatures.Oryon,
+                       [TuneOryon]>;
diff --git a/llvm/lib/Target/AArch64/AArch64SchedOryon.td b/llvm/lib/Target/AArch64/AArch64SchedOryon.td
new file mode 100644
index 000000000000..e54c46ae69d2
--- /dev/null
+++ b/llvm/lib/Target/AArch64/AArch64SchedOryon.td
@@ -0,0 +1,1664 @@
+//=- AArch64SchedOryon.td - Nuvia Inc Oryon CPU 001 ---*- tablegen -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines the scheduling model for Nuvia Inc Oryon
+// family of processors.
+//
+//===----------------------------------------------------------------------===//
+
+//===----------------------------------------------------------------------===//
+// Pipeline Description.
+
+def OryonModel : SchedMachineModel {
+  let IssueWidth            =  14; // 14 micro-ops dispatched at a time. IXU=6, LSU=4, VXU=4
+  let MicroOpBufferSize     = 376; // 192 (48x4) entries in micro-op re-order buffer in VXU.
+                                   // 120 ((20+20)x3) entries in micro-op re-order buffer in IXU
+                                   // 64  (16+16)x2 re-order buffer in LSU
+                                   // total 373
+  let LoadLatency           =   4; // 4 cycle Load-to-use from L1D$
+                                   // LSU=5 NEON load
+  let MispredictPenalty     =  13; // 13 cycles for mispredicted branch.
+  // Determined via a mix of micro-arch details and experimentation.
+  let LoopMicroOpBufferSize =   0; // Do not have a LoopMicroOpBuffer
+  let PostRAScheduler       =   1; // Using PostRA sched.
+  let CompleteModel         =   1;
+
+  list<Predicate> UnsupportedFeatures = !listconcat(SVEUnsupported.F,
+                                                    SMEUnsupported.F,
+                                                    MTEUnsupported.F,
+                                                    PAUnsupported.F,
+                                                    [HasPAuth, HasCSSC]);
+}
+
+let SchedModel = OryonModel in {
+
+// Issue ports.
+// IXU has 6 ports p0 ~ p5
+// LSU has 4 ports p6 ~ p9(ls0 ~ ls3), p10/p11(std0, std1) has to work with ls0~ls3
+// VXU has 4 ports p12 ~ p15
+
+// cross IXU/LSU/VXU resource group for FMOV P41 of VXU
+// I2V
+def ORYONI4FP0 : ProcResource<1>;
+def ORYONI5FP1 : ProcResource<1>;
+// V2I
+def ORYONFP0I4 : ProcResource<1>;
+def ORYONFP1I5 : ProcResource<1>;
+
+// store 1 for normal store instructions
+def ORYONST0 : ProcResource<1>;
+// store 2 for normal store instructions
+def ORYONST1 : ProcResource<1>;
+
+// Port 0: ALU/Indirect/Direct Branch.
+def ORYONP0 : ProcResource<1>;
+
+// Port 1: ALU/Direct Branch.
+def ORYONP1 : ProcResource<1>;
+
+// Port 2: ALU.
+def ORYONP2 : ProcResource<1>;
+
+// Port 3: ALU.
+def ORYONP3 : ProcResource<1>;
+
+// Port 4: ALU.
+def ORYONP4 : ProcResource<1> {
+    let Super = ORYONI4FP0;
+    let Super = ORYONFP0I4; }
+
+// Port 5: ALU.
+def ORYONP5 : ProcResource<1> {
+    let Super = ORYONI5FP1;
+    let Super = ORYONFP1I5; }
+
+// Port 6: Load/Store. LS0
+def ORYONP6 : ProcResource<1> {
+    let Super = ORYONST0; }
+
+// Port 7: Load/store. LS1
+def ORYONP7 : ProcResource<1> {
+    let Super = ORYONST0; }
+
+// Port 8: Load/Store. LS2
+def ORYONP8 : ProcResource<1> {
+    let Super = ORYONST1; }
+
+// Port 9: Load/store. LS3
+def ORYONP9 : ProcResource<1> {
+    let Super = ORYONST1; }
+
+// Port 10: Load/Store. STD0
+def ORYONP10SD0 : ProcResource<1> {
+    let Super = ORYONST0; }
+
+// Port 11: Load/store. STD1
+def ORYONP11SD1 : ProcResource<1> {
+    let Super = ORYONST1; }
+
+// Port 12: FP/Neon/SIMD/Crypto.
+def ORYONP12FP0 : ProcResource<1> {
+    let Super = ORYONI4FP0;
+    let Super = ORYONFP0I4; }
+
+// Port 13: FP/Neon/SIMD/Crypto.
+def ORYONP13FP1 : ProcResource<1> {
+    let Super = ORYONI5FP1;
+    let Super = ORYONFP1I5; }
+
+// Port 14: FP/Neon/SIMD/Crypto.
+def ORYONP14FP2 : ProcResource<1>;
+
+// Port 15: FP/Neon/SIMD/Crypto.
+def ORYONP15FP3 : ProcResource<1>;
+
+// Define groups for the functional units on each issue port.  Each group
+// created will be used by a WriteRes.
+
+// Integer add/shift/logical/misc. instructions on port I0/I1/I2/I3/I4/I5.
+def ORYONI012345 : ProcResGroup<[ORYONP0, ORYONP1, ORYONP2,
+                                  ORYONP3, ORYONP4, ORYONP5]> {
+  let BufferSize = 120;
+}
+
+// Direct Conditional Branch instructions on ports I0/I1.
+def ORYONI01 : ProcResGroup<[ORYONP0, ORYONP1]> {
+  let BufferSize = 40;
+}
+
+// Indirect/crypto Conditional Branch instructions on ports I0.
+def ORYONI0 : ProcResGroup<[ORYONP0]> {
+  let BufferSize = 20;
+}
+
+// Crypto/CRC/PAU instructions on ports I2.
+def ORYONI2 : ProcResGroup<[ORYONP2]> {
+  let BufferSize = 20;
+}
+
+// Multiply/Multiply-ADD instructions on ports I4/I5.
+def ORYONI45 : ProcResGroup<[ORYONP4, ORYONP5]> {
+  let BufferSize = 40;
+}
+
+// Divide instructions on ports I5.
+def ORYONI5 : ProcResGroup<[ORYONP5]> {
+  let BufferSize = 20;
+}
+
+// Comparison instructions on ports I0/I1/I2/I3.
+def ORYONI0123 : ProcResGroup<[ORYONP0, ORYONP1,
+                                ORYONP2, ORYONP3]> {
+  let BufferSize = 80;
+}
+
+// Load instructions on ports P6/P7/P8/P9.
+def ORYONLD : ProcResGroup<[ORYONP6, ORYONP7, ORYONP8, ORYONP9]> {
+  let BufferSize = 64;
+}
+
+// Store instructions on combo of STA/STD pipes
+def ORYONST : ProcResGroup<[ORYONST0, ORYONST1]> {
+    let BufferSize = 64;
+}
+
+// Arithmetic and CRYP-AED ASIMD/FP instructions on ports FP0/FP1/FP2/FP3.
+def ORYONFP0123 : ProcResGroup<[ORYONP12FP0, ORYONP13FP1,
+                                   ORYONP14FP2, ORYONP15FP3]> {
+  let BufferSize = 192;
+}
+
+// FP Comparison and F/I move instructions on ports FP0/FP1.
+def ORYONFP01 : ProcResGroup<[ORYONP12FP0, ORYONP13FP1]> {
+  let BufferSize = 96;
+}
+
+// FDIV instructions on ports FP3.
+def ORYONFP3 : ProcResGroup<[ORYONP15FP3]> {
+  let BufferSize = 48;
+}
+
+// CRYP-SHA instructions on ports FP1.
+def ORYONFP1 : ProcResGroup<[ORYONP14FP2]> {
+  let BufferSize = 48;
+}
+
+def ORYONFP2 : ProcResGroup<[ORYONP14FP2]> {
+  let BufferSize = 48;
+}
+
+// Reciprocal, Squre root on FP0.
+def ORYONFP0 : ProcResGroup<[ORYONP12FP0]> {
+  let BufferSize = 48;
+}
+
+// cross IXU/LSU/VXU resource group for FMOV P41 of VXU
+// I2V
+def ORYONI2V : ProcResGroup<[ORYONI4FP0, ORYONI5FP1]> {
+    let BufferSize = 40;
+}
+
+// V2I
+def ORYONV2I : ProcResGroup<[ORYONFP0I4, ORYONFP1I5]> {
+    let BufferSize = 96;
+}
+
+// Define commonly used write types for InstRW specializations.
+// All definitions follow the format: ORYONWrite_<NumCycles>Cyc_<Resources>.
+
+// Because of the complexity of Oryon CPU, we skip the following
+// generic definitions and define each instruction specifically
+
+// These WriteRes entries are not used in the Falkor sched model.
+def : WriteRes<WriteImm, []>     { let Unsupported = 1; }
+def : WriteRes<WriteI, []>       { let Unsupported = 1; }
+def : WriteRes<WriteISReg, []>   { let Unsupported = 1; }
+def : WriteRes<WriteIEReg, []>   { let Unsupported = 1; }
+def : WriteRes<WriteExtr, []>    { let Unsupported = 1; }
+def : WriteRes<WriteIS, []>      { let Unsupported = 1; }
+def : WriteRes<WriteID32, []>    { let Unsupported = 1; }
+def : WriteRes<WriteID64, []>    { let Unsupported = 1; }
+def : WriteRes<WriteIM32, []>    { let Unsupported = 1; }
+def : WriteRes<WriteIM64, []>    { let Unsupported = 1; }
+def : WriteRes<WriteBr, []>      { let Unsupported = 1; }
+def : WriteRes<WriteBrReg, []>   { let Unsupported = 1; }
+def : WriteRes<WriteLD, []>      { let Unsupported = 1; }
+def : WriteRes<WriteST, []>      { let Unsupported = 1; }
+def : WriteRes<WriteSTP, []>     { let Unsupported = 1; }
+def : WriteRes<WriteAdr, []>     { let Unsupported = 1; }
+def : WriteRes<WriteLDIdx, []>   { let Unsupported = 1; }
+def : WriteRes<WriteSTIdx, []>   { let Unsupported = 1; }
+def : WriteRes<WriteF, []>       { let Unsupported = 1; }
+def : WriteRes<WriteFCmp, []>    { let Unsupported = 1; }
+def : WriteRes<WriteFCvt, []>    { let Unsupported = 1; }
+def : WriteRes<WriteFCopy, []>   { let Unsupported = 1; }
+def : WriteRes<WriteFImm, []>    { let Unsupported = 1; }
+def : WriteRes<WriteFMul, []>    { let Unsupported = 1; }
+def : WriteRes<WriteFDiv, []>    { let Unsupported = 1; }
+def : WriteRes<WriteVd, []>      { let Unsupported = 1; }
+def : WriteRes<WriteVq, []>      { let Unsupported = 1; }
+def : WriteRes<WriteVLD, []>     { let Unsupported = 1; }
+def : WriteRes<WriteVST, []>     { let Unsupported = 1; }
+def : WriteRes<WriteSys, []>     { let Unsupported = 1; }
+def : WriteRes<WriteBarrier, []> { let Unsupported = 1; }
+def : WriteRes<WriteHint, []>    { let Unsupported = 1; }
+def : WriteRes<WriteLDHi, []>    { let Unsupported = 1; }
+def : WriteRes<WriteAtomic, []>  { let Unsupported = 1; }
+
+// These ReadAdvance entries will be defined in later implementation
+def : ReadAdvance<ReadI,       0>;
+def : ReadAdvance<ReadISReg,   0>;
+def : ReadAdvance<ReadIEReg,   0>;
+def : ReadAdvance<ReadIM,      0>;
+def : ReadAdvance<ReadIMA,     0>;
+def : ReadAdvance<ReadID,      0>;
+def : ReadAdvance<ReadExtrHi,  0>;
+def : ReadAdvance<ReadAdrBase, 0>;
+def : ReadAdvance<ReadVLD,     0>;
+def : ReadAdvance<ReadST,      0>;
+
+
+//IXU resource definition
+// 1 cycles NO pipe
+def ORYONWrite...
[truncated]

asb · 2024-05-16T04:53:33Z

llvm/lib/Target/AArch64/AArch64SchedOryon.td

+                                   // LSU=5 NEON load
+  let MispredictPenalty     =  13; // 13 cycles for mispredicted branch.
+  // Determined via a mix of micro-arch details and experimentation.
+  let LoopMicroOpBufferSize =   0; // Do not have a LoopMicroOpBuffer


Although it may not be microarchitecturally accurate, I wonder if you've benchmarked setting LoopMicroOpBufferSize to a non-zero value. Unless targets override it, partial and runtime loop unrolling aren't enabled unless LoopMicroOpBufferSize is non-zero (and this is the only way it's currently queried and used). See getUnrollingPreferences in BasicTTImpl.h and the target-specific overrides in FooTargetTransformInfo.cpp. In AArch64's case, they only override that decision f or in-order scheduling models. If you look at other models that set this value in-tree you'll see it's become somewhat divorced from microarchitectual reality - e.g. a number of the AArch64 models setting it based on instruction queue size or noting they just copied the value from the A57 model. On the X86 side, it's set to 50-72 for the modern Intel X86 and even up to 512 for Zen (noting it should be higher, but compile time impact is too high).

(To be clear - this is a non-blocking comment. Just thought I'd make the suggestion as I've been stepping through this recently for another target).

Thanks for this attention to detail and bringing up this issue. We had a discussion internally about this setting. At one extreme would be to try to specify all details relevant to instruction scheduling. For micro-op buffers, to be completely architecturally accurate, a scalar value would be insufficient. We certainly wouldn't sign up to extend LLVM in this fashion. We decided to take the other extreme here. While not explicitly varying LoopMcroOpBufferSize, our benchmarking of the effects of varying scheduling settings showed little effect. Our team hasn't done any experiments on non-AArch64 targets varying this specific setting recently. I will the submitter decide what to do. We may decide to do some experiments varying this value. If the processor is in order, it would matter more.

Thanks Joel. FWIW there's been some development in this setting since I posted this. 54e52aa reduced the LoopMicroOpBufferSize for Zen to 96 (Zen3) and 108 (Zen4) so they're no longer outliers with much larger values than anyone else.

I totally agree that in-order cores will be more sensitive to scheduling changes in general. I'll note that as runtime and partial loop unrolling are disabled unless this is set, I'd see the impact as slightly more than just scheduling. In some cases further optimisations trigger on the (partially) unrolled IR which can lead to better code. e.g. setting this reduced dynamic instruction count on some SPEC benchmarks for a RISC-V OoO scheduling model. If you do experiment with the value I'd be interested in your findings.

efriedma-quic · 2024-05-29T17:33:53Z

llvm/lib/Target/AArch64/AArch64SchedOryon.td

+
+//1, 1, 6
+def : InstRW<[ORYONWrite_1Cyc_I012345],
+            (instregex "^ADD(W|X)r(i|r|x)", "^SUB(W|X)r(i|r|x)")>;


Please verify that the modeling for ADDXrx etc. is correct. (This is the issue we discussed internally; leaving a note here so we don't forget.)

Confirmed. It is correct as we discussed before.

wxz2020 · 2024-06-05T15:51:53Z

Any more comments? Otherwise, we will move forward. Thanks,

joelkevinjones

LGTM

… wei_zhao_oryon

[AArch64] Add support for Qualcomm Oryon processor

8aebe46

efriedma-quic requested review from jthackray, davemgreen and efriedma-quic May 3, 2024 22:24

joelkevinjones self-requested a review May 4, 2024 01:28

Sonicadvance1 added a commit to Sonicadvance1/FEX that referenced this pull request May 4, 2024

CPUID: Adds Qualcomm Oryon product name

5f0427c

From llvm/llvm-project#91022 Easy enough

Sonicadvance1 mentioned this pull request May 4, 2024

CPUID: Adds Qualcomm Oryon product name FEX-Emu/FEX#3610

Merged

joelkevinjones requested a review from sjoerdmeijer May 4, 2024 04:11

wxz2020 commented May 6, 2024

View reviewed changes

This comment was marked as outdated.

Sign in to view

Code Review Adjustments -- turn on duplication def instruction ON, an…

241be3c

…d remove some engineering notes

llvmbot added clang Clang issues not falling into any other category backend:AArch64 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels May 7, 2024

asb reviewed May 16, 2024

View reviewed changes

Code Review Adjustments -- Remove unnecessary comments

6b8b9c8

efriedma-quic reviewed May 29, 2024

View reviewed changes

joelkevinjones approved these changes Jun 5, 2024

View reviewed changes

Wei Zhao added 2 commits June 5, 2024 23:16

[AArch64] Add support for Qualcomm Oryon processor

718bbc3

Merge branch 'wei_zhao_oryon' of github.com:wxz2020/llvm-project into…

4e48024

… wei_zhao_oryon

joelkevinjones merged commit 6b9753a into llvm:main Jun 6, 2024
5 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AArch64] Add support for Qualcomm Oryon processor #91022

[AArch64] Add support for Qualcomm Oryon processor #91022

wxz2020 commented May 3, 2024

pinskia commented May 4, 2024

wxz2020 May 6, 2024

This comment was marked as outdated.

llvmbot commented May 7, 2024 •

edited

Loading

asb May 16, 2024 •

edited

Loading

joelkevinjones May 17, 2024

asb May 23, 2024

efriedma-quic May 29, 2024

wxz2020 May 29, 2024

wxz2020 commented Jun 5, 2024

joelkevinjones left a comment

[AArch64] Add support for Qualcomm Oryon processor #91022

[AArch64] Add support for Qualcomm Oryon processor #91022

Conversation

wxz2020 commented May 3, 2024

pinskia commented May 4, 2024

wxz2020 May 6, 2024

Choose a reason for hiding this comment

This comment was marked as outdated.

llvmbot commented May 7, 2024 • edited Loading

asb May 16, 2024 • edited Loading

Choose a reason for hiding this comment

joelkevinjones May 17, 2024

Choose a reason for hiding this comment

asb May 23, 2024

Choose a reason for hiding this comment

efriedma-quic May 29, 2024

Choose a reason for hiding this comment

wxz2020 May 29, 2024

Choose a reason for hiding this comment

wxz2020 commented Jun 5, 2024

joelkevinjones left a comment

Choose a reason for hiding this comment

llvmbot commented May 7, 2024 •

edited

Loading

asb May 16, 2024 •

edited

Loading