-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[RISCV][NO-MERGE] Discussions on passing tuning features from the Clang driver #162716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-backend-risc-v Author: Min-Yih Hsu (mshockwave) ChangesFollowing up the discussions in the RISC-V LLVM Syncup Meeting on Oct 9, I'm summarizing in this PR the related backgrounds as well as concerns folks have raised during the meeting regarding the future direction of passing tuning features from the Clang driver. Feel free to use this PR as a discussion thread on this topic. More flexible RISC-V scheduling modelsThe original problem that motivated this whole body of work was that for a long time, if we wanted to share a single scheduling model among multiple CPUs that only differ in a small number of tuning features (e.g. whether it has fast vrgather), the only way was to create a base TableGen class that is subsequently parameterized and instantiated into a scheduling model for each combination of the said tuning features. This TableGen-only solution doesn't scale in a sense that the number of scheduling models will explode upon adding more tuning features in the future. I tried to tackle this problem by allowing scheduling model to be configured by subtargetfearue, via a new SchedPredicate called
The problem is, now CPU / tune CPU -- which maps to Ways of passing tuning features at this momentThe problem can boil down into finding a proper way to pass tuning features from the Clang driver (we're only discussing driver here as you can always pass feature to the frontend via
It is also worth mentioning that RVI is considering adding Software Optimization Guidance Options -- a concept similar to a RISC-V extension (whose name starts with "O") but only provides uArch/implementation hints to the compiler rather than representing any architecture level contracts. Despite being closely related to what we're discussing here, these options/extensions will go into the march string. Candidate solutionsWe have roughly two candidate solutions on the table now: New syntax for
|
Thank you for writing this up, and presenting something concrete. I became convinced on the call that we cannot disallow disabling features, even though it would make our lives easier, and we probably need to be ready for dependencies between tune features, even though we don't have them today.
To dive into your mention of "Without special handling": We spent quite a lot of time/energy creating RISCVISAInfo for handling The Arm syntax for mcpu/march In a bit more detail about may be the right way to do what i'm calling "conflict tracking":
|
Following up the discussions in the RISC-V LLVM Syncup Meeting on Oct 9, I'm summarizing in this PR the related backgrounds as well as concerns folks have raised during the meeting regarding the future direction of passing tuning features from the Clang driver. Feel free to use this PR as a discussion thread on this topic.
More flexible RISC-V scheduling models
The original problem that motivated this whole body of work was that for a long time, if we wanted to share a single scheduling model among multiple CPUs that only differ in a small number of tuning features (e.g. whether it has fast vrgather), the only way was to create a base TableGen class that is subsequently parameterized and instantiated into a scheduling model for each combination of the said tuning features. This TableGen-only solution doesn't scale in a sense that the number of scheduling models will explode upon adding more tuning features in the future.
I tried to tackle this problem by allowing scheduling model to be configured by subtargetfearue, via a new SchedPredicate called
FeatureSchedPredicate
(#161888 ). Such that we can create multiple CPU / tune CPU with only a single scheduling model:The problem is, now CPU / tune CPU -- which maps to
-mcpu
and-mtune
in the Clang driver, respectively -- become the one that doesn't scale, as we have to create a new-mcpu
/-mtune
for each combination of features & scheduling model.Ways of passing tuning features at this moment
The problem can boil down into finding a proper way to pass tuning features from the Clang driver (we're only discussing driver here as you can always pass feature to the frontend via
-target-feature
). As a starter, it's important to know what we're doing at this moment, which can be categorized into three ways:RISCVCPUInfo
data structure generated by RISC-V's TargetParser. This data structure is queried by driver which generates the corresponding-target-feature
upon seeing a matching-mcpu
.-ffixed-<register name>
and-mrelax
.m_riscv_Features_Group
will be automatically translated into the corresponding feature names based on their driver flag names. Currently only-msave-restore
/-mnosave-restore
use this.It is also worth mentioning that RVI is considering adding Software Optimization Guidance Options -- a concept similar to a RISC-V extension (whose name starts with "O") but only provides uArch/implementation hints to the compiler rather than representing any architecture level contracts. Despite being closely related to what we're discussing here, these options/extensions will go into the march string.
Candidate solutions
We have roughly two candidate solutions on the table now:
New syntax for
-mtune
Introducing an alternative syntax to
-mtune
in addition to the current one:-mtune=<tune cpu name>:+x,-y,+z...
where+x,-y,+z
is the feature string that will be converted into individual-target-feature
flags. Code in this PR contains a prototype of this approach.Pros:
Cons:
-foo,+foo
will still give youfoo
but not when you write+foo,-foo
)-mtune
string.Toggle tuning features with the existing
-mfoo
mechanismFeature
foo
will be controlled by-mfoo
&-mno-foo
.Pros:
-mtune
approach, because the mechanism of-mfoo
&-mno-foo
is well defined.Cons:
-mtune
approach: the number of-m
flags will increase really fast (we need at least two flags per subtarget features) upon adding more tuning features in the future.There is another challenge we have to address in both options: GCC compatibility. We have to talk with the GCC folks on this topic as we almost certain want a consistent interface and behavior across these two compilers.
I hope I captured everything we discussed in the meeting. Feel free to point out things that I missed.