Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64] Remove Automatic Enablement of FEAT_F32MM #85203

Merged

Conversation

Stylie777
Copy link
Contributor

When +sve is passed in the command line, if the Architecture being targeted is V8.6A/V9.1A or later, +f32mm is also added. This enables FEAT_32MM, however at the time of writing no CPU's support this. This leads to the FEAT_32MM instructions being compiled for CPU's that do not support them.

This commit removes the automatic enablement, however the option is still able to be used by passing +f32mm.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Mar 14, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Mar 14, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-driver

Author: Jack Styles (Stylie777)

Changes

When +sve is passed in the command line, if the Architecture being targeted is V8.6A/V9.1A or later, +f32mm is also added. This enables FEAT_32MM, however at the time of writing no CPU's support this. This leads to the FEAT_32MM instructions being compiled for CPU's that do not support them.

This commit removes the automatic enablement, however the option is still able to be used by passing +f32mm.


Full diff: https://github.com/llvm/llvm-project/pull/85203.diff

5 Files Affected:

  • (modified) clang/test/Driver/aarch64-sve.c (+4-5)
  • (modified) clang/test/Preprocessor/aarch64-target-features.c (+1-1)
  • (modified) llvm/docs/ReleaseNotes.rst (+1)
  • (modified) llvm/lib/TargetParser/AArch64TargetParser.cpp (-5)
  • (modified) llvm/unittests/TargetParser/TargetParserTest.cpp (+1-8)
diff --git a/clang/test/Driver/aarch64-sve.c b/clang/test/Driver/aarch64-sve.c
index f34b2700deb91c..4a33c2e3c8d367 100644
--- a/clang/test/Driver/aarch64-sve.c
+++ b/clang/test/Driver/aarch64-sve.c
@@ -6,12 +6,11 @@
 // RUN: %clang --target=aarch64 -march=armv8.6a -### -c %s 2>&1 | FileCheck -check-prefix=GENERICV8A-NOSVE %s
 // GENERICV8A-NOSVE-NOT: "-target-feature" "+sve"
 
-// The 32-bit floating point matrix multiply extension is enabled by default
-// for armv8.6-a targets (or later) with SVE, and can optionally be enabled for
-// any target from armv8.2a onwards (we don't enforce not using it with earlier
-// targets).
+// The 32-bit floating point matrix multiply extension is an optional feature
+// that can be used for any target from armv8.2a and onwards. This can be
+// enabled using the `+f32mm` option.`.
 // RUN: %clang --target=aarch64 -march=armv8.6a       -### -c %s 2>&1 | FileCheck -check-prefix=NO-F32MM %s
-// RUN: %clang --target=aarch64 -march=armv8.6a+sve   -### -c %s 2>&1 | FileCheck -check-prefix=F32MM %s
+// RUN: %clang --target=aarch64 -march=armv8.6a+sve+f32mm   -### -c %s 2>&1 | FileCheck -check-prefix=F32MM %s
 // RUN: %clang --target=aarch64 -march=armv8.5a+f32mm -### -c %s 2>&1 | FileCheck -check-prefix=F32MM %s
 // NO-F32MM-NOT: "-target-feature" "+f32mm"
 // F32MM: "-target-feature" "+f32mm"
diff --git a/clang/test/Preprocessor/aarch64-target-features.c b/clang/test/Preprocessor/aarch64-target-features.c
index 6ec4dcd60cf601..ab06cc7983ae85 100644
--- a/clang/test/Preprocessor/aarch64-target-features.c
+++ b/clang/test/Preprocessor/aarch64-target-features.c
@@ -196,7 +196,7 @@
 // CHECK-8_6-NOT: __ARM_FEATURE_SHA3 1
 // CHECK-8_6-NOT: __ARM_FEATURE_SM4 1
 
-// RUN: %clang -target aarch64-none-linux-gnu -march=armv8.6-a+sve -x c -E -dM %s -o - | FileCheck --check-prefix=CHECK-SVE-8_6 %s
+// RUN: %clang -target aarch64-none-linux-gnu -march=armv8.6-a+sve+f32mm -x c -E -dM %s -o - | FileCheck --check-prefix=CHECK-SVE-8_6 %s
 // CHECK-SVE-8_6: __ARM_FEATURE_SVE 1
 // CHECK-SVE-8_6: __ARM_FEATURE_SVE_BF16 1
 // CHECK-SVE-8_6: __ARM_FEATURE_SVE_MATMUL_FP32 1
diff --git a/llvm/docs/ReleaseNotes.rst b/llvm/docs/ReleaseNotes.rst
index 7be51730663bd1..b65a108a9b6a36 100644
--- a/llvm/docs/ReleaseNotes.rst
+++ b/llvm/docs/ReleaseNotes.rst
@@ -74,6 +74,7 @@ Changes to the AMDGPU Backend
 
 Changes to the ARM Backend
 --------------------------
+* FEAT_F32MM is no longer activated by default when using `+sve` on v8.6-A or greater. The feature is still availble and can be using by adding `+f32mm` to the command line options.
 
 Changes to the AVR Backend
 --------------------------
diff --git a/llvm/lib/TargetParser/AArch64TargetParser.cpp b/llvm/lib/TargetParser/AArch64TargetParser.cpp
index e36832f563eed8..71099462d5ecff 100644
--- a/llvm/lib/TargetParser/AArch64TargetParser.cpp
+++ b/llvm/lib/TargetParser/AArch64TargetParser.cpp
@@ -186,11 +186,6 @@ void AArch64::ExtensionSet::enable(ArchExtKind E) {
   // Special cases for dependencies which vary depending on the base
   // architecture version.
   if (BaseArch) {
-    // +sve implies +f32mm if the base architecture is v8.6A+ or v9.1A+
-    // It isn't the case in general that sve implies both f64mm and f32mm
-    if (E == AEK_SVE && BaseArch->is_superset(ARMV8_6A))
-      enable(AEK_F32MM);
-
     // +fp16 implies +fp16fml for v8.4A+, but not v9.0-A+
     if (E == AEK_FP16 && BaseArch->is_superset(ARMV8_4A) &&
         !BaseArch->is_superset(ARMV9A))
diff --git a/llvm/unittests/TargetParser/TargetParserTest.cpp b/llvm/unittests/TargetParser/TargetParserTest.cpp
index 3773f59a3c5af9..a21846f5b4940b 100644
--- a/llvm/unittests/TargetParser/TargetParserTest.cpp
+++ b/llvm/unittests/TargetParser/TargetParserTest.cpp
@@ -2314,13 +2314,6 @@ AArch64ExtensionDependenciesBaseArchTestParams
          {},
          {"aes", "sha2", "sha3", "sm4"}},
 
-        // +sve implies +f32mm if the base architecture is v8.6A+ or v9.1A+, but
-        // not earlier architectures.
-        {AArch64::ARMV8_5A, {"sve"}, {"sve"}, {"f32mm"}},
-        {AArch64::ARMV9A, {"sve"}, {"sve"}, {"f32mm"}},
-        {AArch64::ARMV8_6A, {"sve"}, {"sve", "f32mm"}, {}},
-        {AArch64::ARMV9_1A, {"sve"}, {"sve", "f32mm"}, {}},
-
         // +fp16 implies +fp16fml for v8.4A+, but not v9.0-A+
         {AArch64::ARMV8_3A, {"fp16"}, {"fullfp16"}, {"fp16fml"}},
         {AArch64::ARMV9A, {"fp16"}, {"fullfp16"}, {"fp16fml"}},
@@ -2487,7 +2480,7 @@ AArch64ExtensionDependenciesBaseCPUTestParams
          {}},
         {"cortex-a520",
          {},
-         {"v9.2a",    "bf16",     "crc",     "dotprod", "f32mm",        "flagm",
+         {"v9.2a",    "bf16",     "crc",     "dotprod",        "flagm",
           "fp-armv8", "fullfp16", "fp16fml", "i8mm",    "lse",          "mte",
           "pauth",    "perfmon",  "predres", "ras",     "rcpc",         "rdm",
           "sb",       "neon",     "ssbs",    "sve",     "sve2-bitperm", "sve2"},

Copy link

github-actions bot commented Mar 14, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@Stylie777 Stylie777 force-pushed the make_f32mm_optional_for_arm_v8.6_greater branch from b5a0ad5 to 276e342 Compare March 14, 2024 10:49
@jthackray jthackray self-requested a review March 14, 2024 13:39
@@ -74,6 +74,7 @@ Changes to the AMDGPU Backend

Changes to the ARM Backend
--------------------------
* FEAT_F32MM is no longer activated by default when using `+sve` on v8.6-A or greater. The feature is still availble and can be using by adding `+f32mm` to the command line options.
Copy link
Contributor

@jthackray jthackray Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: typo, s/availble/available/

Also, should be "can be used by adding", I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

@jthackray
Copy link
Contributor

When +sve is passed in the command line, if the Architecture being targeted is V8.6A/V9.1A or later, +f32mm is also added. This enables FEAT_32MM, however at the time of writing no CPU's support this. This leads to the FEAT_32MM instructions being compiled for CPU's that do not support them.

This commit removes the automatic enablement, however the option is still able to be used by passing +f32mm.

Tiny minuscule nit: should be "CPUs" not "CPU's".

When `+sve` is passed in the command line, if the Architecture being targeted is V8.6A/V9.1A or later, `+f32mm` is also added. This enables FEAT_32MM, however at the time of writing no CPUs support this. This leads to the FEAT_32MM instructions being compiled for CPUs that do not support them.

This commit removes the automatic enablement, however the option is still able to be used by passing `+f32mm`.
@Stylie777 Stylie777 force-pushed the make_f32mm_optional_for_arm_v8.6_greater branch from 276e342 to 7a2fbb0 Compare March 14, 2024 13:58
@Stylie777
Copy link
Contributor Author

@jthackray I have updated the commit message as part of the most recent patch set.

Copy link
Contributor

@jthackray jthackray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but you should probably find additional reviewer.

Copy link
Collaborator

@ostannard ostannard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was the one who originally decided that f32mm should be automatically enabled like this, and in hindsight I agree that that was the wrong decision. GCC never implemented this, so this is making us more compatible, not less.

"fp-armv8", "fullfp16", "fp16fml", "i8mm", "lse", "mte",
"pauth", "perfmon", "predres", "ras", "rcpc", "rdm",
"sb", "neon", "ssbs", "sve", "sve2-bitperm", "sve2"},
{"v9.2a", "bf16", "crc", "dotprod", "flagm", "fp-armv8",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these entries not better off one per line? The patch is modifying all lines anyway, might as well make it so that any future modifications would be more modular.

Copy link
Contributor Author

@Stylie777 Stylie777 Mar 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably agree with you but the formatting here was assigned by clang-format. I also think this is outside of the scope of this PR and should be done separately as this is a formatting change rather than a functionality change which this PR is focused on.

@Stylie777 Stylie777 merged commit defc485 into llvm:main Mar 27, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants