Skip to content

Conversation

mshockwave
Copy link
Member

Following up the discussions in the RISC-V LLVM Syncup Meeting on Oct 9, I'm summarizing in this PR the related backgrounds as well as concerns folks have raised during the meeting regarding the future direction of passing tuning features from the Clang driver. Feel free to use this PR as a discussion thread on this topic.

More flexible RISC-V scheduling models

The original problem that motivated this whole body of work was that for a long time, if we wanted to share a single scheduling model among multiple CPUs that only differ in a small number of tuning features (e.g. whether it has fast vrgather), the only way was to create a base TableGen class that is subsequently parameterized and instantiated into a scheduling model for each combination of the said tuning features. This TableGen-only solution doesn't scale in a sense that the number of scheduling models will explode upon adding more tuning features in the future.

I tried to tackle this problem by allowing scheduling model to be configured by subtargetfearue, via a new SchedPredicate called FeatureSchedPredicate (#161888 ). Such that we can create multiple CPU / tune CPU with only a single scheduling model:

def Foo : RISCVTuneProcessorModel<"foo", FooModel>;
def FooWithFastVRGather : RISCVTuneProcessorModel<"foo-fast-vrgather", FooModel, [TuneFastVRGather]>;

The problem is, now CPU / tune CPU -- which maps to -mcpu and -mtune in the Clang driver, respectively -- become the one that doesn't scale, as we have to create a new -mcpu / -mtune for each combination of features & scheduling model.

Ways of passing tuning features at this moment

The problem can boil down into finding a proper way to pass tuning features from the Clang driver (we're only discussing driver here as you can always pass feature to the frontend via -target-feature). As a starter, it's important to know what we're doing at this moment, which can be categorized into three ways:

  • Retrieve tuning features from a CPU name. We currently encoded two unaligned-memory-related subtarget features directly into the RISCVCPUInfo data structure generated by RISC-V's TargetParser. This data structure is queried by driver which generates the corresponding -target-feature upon seeing a matching -mcpu.
  • Explicitly translate some of the driver flags. Flags that fall into category include -ffixed-<register name> and -mrelax.
  • Similar to the previous one, driver flag declarations that are included into the m_riscv_Features_Group will be automatically translated into the corresponding feature names based on their driver flag names. Currently only -msave-restore / -mnosave-restore use this.

It is also worth mentioning that RVI is considering adding Software Optimization Guidance Options -- a concept similar to a RISC-V extension (whose name starts with "O") but only provides uArch/implementation hints to the compiler rather than representing any architecture level contracts. Despite being closely related to what we're discussing here, these options/extensions will go into the march string.

Candidate solutions

We have roughly two candidate solutions on the table now:

New syntax for -mtune

Introducing an alternative syntax to -mtune in addition to the current one: -mtune=<tune cpu name>:+x,-y,+z... where +x,-y,+z is the feature string that will be converted into individual -target-feature flags. Code in this PR contains a prototype of this approach.

Pros:

  • Scale better when there is a large number of tuning features in the future (no matter how many of them it'll always be a single flag)

Cons:

  • Without any special handling, subtracting features with LLVM's current feature string design can be tricky (e.g. -foo,+foo will still give you foo but not when you write +foo,-foo)
    • Alternative approach: we can forbid feature subtraction in the new -mtune string.
  • It's more difficult to handle or reason about implied features or just interactions between different features. ARM/AArch64 ran into a similar problems before with a similar mechanism of using "+feature" in their march/mcpu string.

Toggle tuning features with the existing -mfoo mechanism

Feature foo will be controlled by -mfoo & -mno-foo.

Pros:

  • Adding & subtracting features would be less of a problem compared to the previous -mtune approach, because the mechanism of -mfoo & -mno-foo is well defined.
  • Hide the implementation at the driver level, including populating the implied features or even changing the actual subtarget feature names without breaking the user interface (i.e. driver).

Cons:

  • Basically the opposite of the advantage from the previous -mtune approach: the number of -m flags will increase really fast (we need at least two flags per subtarget features) upon adding more tuning features in the future.

There is another challenge we have to address in both options: GCC compatibility. We have to talk with the GCC folks on this topic as we almost certain want a consistent interface and behavior across these two compilers.

I hope I captured everything we discussed in the meeting. Feel free to point out things that I missed.

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:RISC-V clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' labels Oct 9, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 9, 2025

@llvm/pr-subscribers-clang-driver
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-risc-v

Author: Min-Yih Hsu (mshockwave)

Changes

Following up the discussions in the RISC-V LLVM Syncup Meeting on Oct 9, I'm summarizing in this PR the related backgrounds as well as concerns folks have raised during the meeting regarding the future direction of passing tuning features from the Clang driver. Feel free to use this PR as a discussion thread on this topic.

More flexible RISC-V scheduling models

The original problem that motivated this whole body of work was that for a long time, if we wanted to share a single scheduling model among multiple CPUs that only differ in a small number of tuning features (e.g. whether it has fast vrgather), the only way was to create a base TableGen class that is subsequently parameterized and instantiated into a scheduling model for each combination of the said tuning features. This TableGen-only solution doesn't scale in a sense that the number of scheduling models will explode upon adding more tuning features in the future.

I tried to tackle this problem by allowing scheduling model to be configured by subtargetfearue, via a new SchedPredicate called FeatureSchedPredicate (#161888 ). Such that we can create multiple CPU / tune CPU with only a single scheduling model:

def Foo : RISCVTuneProcessorModel&lt;"foo", FooModel&gt;;
def FooWithFastVRGather : RISCVTuneProcessorModel&lt;"foo-fast-vrgather", FooModel, [TuneFastVRGather]&gt;;

The problem is, now CPU / tune CPU -- which maps to -mcpu and -mtune in the Clang driver, respectively -- become the one that doesn't scale, as we have to create a new -mcpu / -mtune for each combination of features & scheduling model.

Ways of passing tuning features at this moment

The problem can boil down into finding a proper way to pass tuning features from the Clang driver (we're only discussing driver here as you can always pass feature to the frontend via -target-feature). As a starter, it's important to know what we're doing at this moment, which can be categorized into three ways:

  • Retrieve tuning features from a CPU name. We currently encoded two unaligned-memory-related subtarget features directly into the RISCVCPUInfo data structure generated by RISC-V's TargetParser. This data structure is queried by driver which generates the corresponding -target-feature upon seeing a matching -mcpu.
  • Explicitly translate some of the driver flags. Flags that fall into category include -ffixed-&lt;register name&gt; and -mrelax.
  • Similar to the previous one, driver flag declarations that are included into the m_riscv_Features_Group will be automatically translated into the corresponding feature names based on their driver flag names. Currently only -msave-restore / -mnosave-restore use this.

It is also worth mentioning that RVI is considering adding Software Optimization Guidance Options -- a concept similar to a RISC-V extension (whose name starts with "O") but only provides uArch/implementation hints to the compiler rather than representing any architecture level contracts. Despite being closely related to what we're discussing here, these options/extensions will go into the march string.

Candidate solutions

We have roughly two candidate solutions on the table now:

New syntax for -mtune

Introducing an alternative syntax to -mtune in addition to the current one: -mtune=&lt;tune cpu name&gt;:+x,-y,+z... where +x,-y,+z is the feature string that will be converted into individual -target-feature flags. Code in this PR contains a prototype of this approach.

Pros:

  • Scale better when there is a large number of tuning features in the future (no matter how many of them it'll always be a single flag)

Cons:

  • Without any special handling, subtracting features with LLVM's current feature string design can be tricky (e.g. -foo,+foo will still give you foo but not when you write +foo,-foo)
    • Alternative approach: we can forbid feature subtraction in the new -mtune string.
  • It's more difficult to handle or reason about implied features or just interactions between different features. ARM/AArch64 ran into a similar problems before with a similar mechanism of using "+feature" in their march/mcpu string.

Toggle tuning features with the existing -mfoo mechanism

Feature foo will be controlled by -mfoo & -mno-foo.

Pros:

  • Adding & subtracting features would be less of a problem compared to the previous -mtune approach, because the mechanism of -mfoo & -mno-foo is well defined.
  • Hide the implementation at the driver level, including populating the implied features or even changing the actual subtarget feature names without breaking the user interface (i.e. driver).

Cons:

  • Basically the opposite of the advantage from the previous -mtune approach: the number of -m flags will increase really fast (we need at least two flags per subtarget features) upon adding more tuning features in the future.

There is another challenge we have to address in both options: GCC compatibility. We have to talk with the GCC folks on this topic as we almost certain want a consistent interface and behavior across these two compilers.

I hope I captured everything we discussed in the meeting. Feel free to point out things that I missed.


Full diff: https://github.com/llvm/llvm-project/pull/162716.diff

4 Files Affected:

  • (modified) clang/lib/Driver/ToolChains/Arch/RISCV.cpp (+28)
  • (modified) clang/lib/Driver/ToolChains/Arch/RISCV.h (+3)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+1-4)
  • (modified) clang/test/Driver/riscv-cpus.c (+5)
diff --git a/clang/lib/Driver/ToolChains/Arch/RISCV.cpp b/clang/lib/Driver/ToolChains/Arch/RISCV.cpp
index f2e79e71f93d4..f51e3ea6bfe73 100644
--- a/clang/lib/Driver/ToolChains/Arch/RISCV.cpp
+++ b/clang/lib/Driver/ToolChains/Arch/RISCV.cpp
@@ -367,3 +367,31 @@ std::string riscv::getRISCVTargetCPU(const llvm::opt::ArgList &Args,
 
   return Triple.isRISCV64() ? "generic-rv64" : "generic-rv32";
 }
+
+void riscv::addMtuneWithFeatures(const Arg *A, const ArgList &Args,
+                                 ArgStringList &CmdArgs) {
+  // format: -mtune=<tune cpu>[:+a,-b,+c...]
+  StringRef MTune(A->getValue());
+  size_t ColonPos = MTune.find(':');
+  bool HasColon = ColonPos != StringRef::npos;
+
+  StringRef TuneCPU = MTune.take_front(ColonPos);
+  if (TuneCPU == "native")
+    CmdArgs.push_back(Args.MakeArgString(llvm::sys::getHostCPUName()));
+  else
+    CmdArgs.push_back(HasColon ? Args.MakeArgString(TuneCPU) : A->getValue());
+
+  StringRef FeatureStr = HasColon ? MTune.drop_front(ColonPos + 1) : "";
+  // TODO: Emit error
+  assert(!HasColon || !FeatureStr.empty());
+  if (FeatureStr.empty())
+    return;
+  StringRef Feature;
+  do {
+    std::tie(Feature, FeatureStr) = FeatureStr.split(',');
+    // TODO: Emit error
+    assert(Feature.starts_with('-') || Feature.starts_with('+'));
+    CmdArgs.push_back("-target-feature");
+    CmdArgs.push_back(Args.MakeArgString(Feature));
+  } while (!FeatureStr.empty());
+}
diff --git a/clang/lib/Driver/ToolChains/Arch/RISCV.h b/clang/lib/Driver/ToolChains/Arch/RISCV.h
index 388786b9c4c1f..3e43430051cca 100644
--- a/clang/lib/Driver/ToolChains/Arch/RISCV.h
+++ b/clang/lib/Driver/ToolChains/Arch/RISCV.h
@@ -28,6 +28,9 @@ std::string getRISCVArch(const llvm::opt::ArgList &Args,
                          const llvm::Triple &Triple);
 std::string getRISCVTargetCPU(const llvm::opt::ArgList &Args,
                               const llvm::Triple &Triple);
+void addMtuneWithFeatures(const llvm::opt::Arg *A,
+                          const llvm::opt::ArgList &Args,
+                          llvm::opt::ArgStringList &CmdArgs);
 } // end namespace riscv
 } // namespace tools
 } // end namespace driver
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index d326a81feb762..e39a8d4996089 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -2046,10 +2046,7 @@ void Clang::AddRISCVTargetArgs(const ArgList &Args,
 
   if (const Arg *A = Args.getLastArg(options::OPT_mtune_EQ)) {
     CmdArgs.push_back("-tune-cpu");
-    if (strcmp(A->getValue(), "native") == 0)
-      CmdArgs.push_back(Args.MakeArgString(llvm::sys::getHostCPUName()));
-    else
-      CmdArgs.push_back(A->getValue());
+    riscv::addMtuneWithFeatures(A, Args, CmdArgs);
   }
 
   // Handle -mrvv-vector-bits=<bits>
diff --git a/clang/test/Driver/riscv-cpus.c b/clang/test/Driver/riscv-cpus.c
index 5d5fdd72baedb..806c42bdad374 100644
--- a/clang/test/Driver/riscv-cpus.c
+++ b/clang/test/Driver/riscv-cpus.c
@@ -305,6 +305,11 @@
 // RUN: %clang --target=riscv64 -### -c %s 2>&1 -mtune=native | FileCheck -check-prefix=MTUNE-NATIVE %s
 // MTUNE-NATIVE-NOT: "-tune-cpu" "native"
 
+// Checking `-mtune=<tune cpu>:+x,-y,+z...`
+// RUN: %clang --target=riscv64 -### -c %s 2>&1 -mtune=sifive-e20:+xyz,-abc | FileCheck -check-prefix=MTUNE-WITH-FEATURES %s
+// MTUNE-WITH-FEATURES: "-tune-cpu" "sifive-e20"
+// MTUNE-WITH-FEATURES: "-target-feature" "+xyz" "-target-feature" "-abc"
+
 // -mcpu with default -march
 // RUN: %clang --target=riscv64 -### -c %s 2>&1 -mcpu=sifive-e20 | FileCheck -check-prefix=MCPU-SIFIVE-E20 %s
 // MCPU-SIFIVE-E20: "-nostdsysteminc" "-target-cpu" "sifive-e20"

@lenary
Copy link
Member

lenary commented Oct 9, 2025

Thank you for writing this up, and presenting something concrete.

I became convinced on the call that we cannot disallow disabling features, even though it would make our lives easier, and we probably need to be ready for dependencies between tune features, even though we don't have them today.

Without any special handling, subtracting features with LLVM's current feature string design can be tricky (e.g. -foo,+foo will still give you foo but not when you write +foo,-foo)

To dive into your mention of "Without special handling": We spent quite a lot of time/energy creating RISCVISAInfo for handling -march parsing, I think we should expect to do something similar (but less complex!) for handling -mtune parsing, rather than just writing string.split(...). This means we could disallow enabling and disabling the same tune feature in a string, and maybe track conflicts a bit better, to make things easier for the user?

The Arm syntax for mcpu/march (base)[\+(no)?(featurename)]* when written as a regex, where the presence of "no" means disable, and + is just delimiter. This might be clearer than adding meaning to + vs -, but I don't hold this opinion strongly. Arm does (almost?) no conflict tracking of their -march/-mcpu strings, which is IMO part of the reason their situation is complex.

In a bit more detail about may be the right way to do what i'm calling "conflict tracking":

  • Parse set of explicitly requested features
  • Parse set of explicitly rejected features
  • Overlaps between requested and rejected generate errors
  • Calculate implied enabled and disabled features from requested and rejected (separately).
  • Overlaps between implied_disabled and requested or implied_enabled and rejected likely generate warnings (maybe errors)
  • Calculate final set of enabled/disabled features for passing down to backend.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:RISC-V clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants