-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Preprocessor definition to specify data alignment strict(ness) #32
Comments
FWIW there's a gcc patch posted along those lines |
Thanks for opening this discussion. One thing I'm wondering about is whether it makes sense to couple this primarily to -mstrict-align / -mno-strict-align. The RISC-V profiles proposal finally introduces a way to indicate whether misaligned loads/stores are supported or not (Zicclsm, which once supported by compilers should be queryable via Once you have that capability, might it be more intuitive to have a define dedicated solely to whether misaligned loads/stores are slow or fast? Not defined if Zicclsm isn't present, 1 if slow (potentially also if -mstrict-align was used, taking that as a hint that such accesses should be considered slow), 0 if not slow. I'm wary of derailing this by trying to over-generalise it. But it's probably also worth at least considering if we'd like some naming scheme that reflected the extension that tuning hint is related to. e.g. |
As specified, there's more information provided to user, they can choose to not use it. If a user cared only about -mstrict-align they could just do a #ifdef flag w/o caring for actual value.
My read on Zicclsm is, that a cpu could be Zicclsm complaint despite very slow unaligned accesses as long as it supported them natively. The intent here is to allow riscv common library code to make that distinction, specifically how the code was built. We want to specify if unaligned access is prohibitive, vs. allowed but not desirable vs. as good as regular accesses. Granted I would personally settle for binary strict aligned vs. not semantics, but it seems tooling (gcc at least) already supports a finer grained distinction.
Sure, that's one possible way to go about this. This came up as someone was already starting to write glibc code which needed this build time distinction. FWIW I wasn't aware of this extension and/or that it is still being discussed and wha the times lines of ratification are etc.
Personally I'd like to keep them separate (with people throwing additional defines if needed to) as in this case the extension driving this doesn't really matter as much vs. the actual codegen semantics. |
Assuming you meant "prohibited", I think my point is that getting the prohibited or not bit of information should be doable via probing zicclsm, so this define could be recast to focus just on the undesirable (slow) vs as good as regular access (fast) distinction. That said, I don't know whether zicclsm being added to the RISC-V vocabulary is blocked on the profiles work as a whole being approved. If it does indeed look stuck, then I have a lot of sympathy for moving ahead with something that ignores it for pragmatic reasons, even if it is a shame that we end up with overlapping ways of expressing the same information. |
Sounds good to me. It would be nice to have some clarity about Zicclsm from higher powers that may be. |
I think the profiles spec says it all, and @vineetgarc has summarized it accurately. Of Zicclsm, it says, "Even though mandated, misaligned loads and stores might execute extremely slowly. Standard software distributions should assume their existence only for correctness, not for performance." This means to me that |
How about inverting the polarity and defining __riscv_zicclsm_fast when the extension is present and presumed to be fast? This has a few advantages: it avoids the pejorative description of the slow variants, and it means the misaligned-optimized routines only need to check one macro, rather than checking both __riscv_zicclsm and !__riscv_zicclsm_slow. Obviously, to remain consistent with the RVA profile spec, we need to arrange that the compilers only define __riscv_zicclsm_fast when explicitly tuning for a specific uarch. The default needs to be for this macro to remain undefined. |
Maybe just decouple with zicclsm? and made a more explicitly/specific macro name?
When zicclsm present __riscv_unaligned_access will at least 1. |
+1
This seems like an overkill, considering what the end user of this define (library code writer) will typically target. How about following
|
Just to clarify, that isn't what I was suggesting. I was pointing out that with zicclsm we should already have a way of determining if misaligned loads are supported, which suggests a separate define can instead focus on if they're fast or not. So something like But...opinions can vary ,and the continued success of RISC-V isn't going to depend on whether you have defines with slightly overlapping definitions etc etc. So if there's consensus around some other option I'm not going to drag out the discussion. |
I think we’re actually in agreement. |
The naming of options is really not important to me at least. __riscv_zicclsm could just be a neutral artifact of corresponding -march option (but no actual alignment semantics ? ) The question is what does -mstrict-align do to these and/or define yet something new. |
Perhaps if |
Do we keep different values for __riscv_zicclsm_fast for -mno-strict-align vs. cpu tune slow_unaligned_access=false ? |
I think it's just and
i.e. just booleans. |
Do note that users typically don't specify
This would also be the case if One more point: |
Seems like this is getting far into the GCC implementation details. The important point is that |
I know. It's the default. I don't think it changes what I wrote. |
I support your proposal for this version:
Personally I would like to prevent @vineetgarc could you prepare a PR? I am happy to help pushing this forward :) |
I could but it seems world has moved on :-) |
I thought that still could help people to write some multi version code? |
So this proposal relies on compiler supporting a cpu tune param which can override -mno-strict-align. gcc supports that I'm not sure if llvm does ? And if not, how do we incorporate that into the common c-api doc ? Would it be left as "implementation defined". This is specially important since the default value (in lack of an explicit cmdline toggle) is derived from cpu tune param |
My thought is we only define the mean of the macro, not define the implementation detail, like: e.g. |
Here's the pull request for same: #40 |
riscv-non-isa#32 Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
riscv-non-isa#32 Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
riscv-non-isa#32 Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
riscv-non-isa#32 Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
This was merged upstream as a98e309 |
This patch is a modification of https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610115.html following the discussion on riscv-non-isa/riscv-c-api-doc#32 Distinguish between explicit -mstrict-align and cpu tune param for slow_unaligned_access=true/false. Tested for regressions using rv32/64 multilib with newlib/linux gcc/ChangeLog: * config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins): Generate __riscv_unaligned_avoid with value 1 or __riscv_unaligned_slow with value 1 or __riscv_unaligned_fast with value 1 * config/riscv/riscv.cc (riscv_option_override): Define riscv_user_wants_strict_align. Set riscv_user_wants_strict_align to TARGET_STRICT_ALIGN * config/riscv/riscv.h: Declare riscv_user_wants_strict_align gcc/testsuite/ChangeLog: * gcc.target/riscv/attribute-1.c: Check for __riscv_unaligned_slow or __riscv_unaligned_fast * gcc.target/riscv/attribute-4.c: Check for __riscv_unaligned_avoid * gcc.target/riscv/attribute-5.c: Check for __riscv_unaligned_slow or __riscv_unaligned_fast * gcc.target/riscv/predef-align-1.c: New test. * gcc.target/riscv/predef-align-2.c: New test. * gcc.target/riscv/predef-align-3.c: New test. * gcc.target/riscv/predef-align-4.c: New test. * gcc.target/riscv/predef-align-5.c: New test. * gcc.target/riscv/predef-align-6.c: New test. Signed-off-by: Edwin Lu <ewlu@rivosinc.com> Co-authored-by: Vineet Gupta <vineetg@rivosinc.com>
I would like to propose a compiler generated preprocessor macro to signal data alignment heuristics used by the compiler for codegen.
This could be used by library writers to target code for efficient unaligned access (or not).
Granted we would like to eventually use runtime probing to figure this out, it would help the user to know what the compiler assumptions are anyways (or if user coaxed it into a certain semantics via additional toggles)
At a high level it reflects whether gcc toggle -mstrict-align has been used to build.
Speaking of gcc there' another wrinkle to worry about. gcc driver has a notion of cpu tune param which specifies whether unaligned accesses are generally efficient on the cpu (or not). And a cpu tune with slow_unaligned_access=true will disregard -mno-strict-align, compiler codegen assuming as if -mstrict-align was passed. So the proposed preprocessor macro needs to also reflect this.
Here's the specification:
__riscv_strict_align (only defined if -mstrict-align or cpu tune param slow_unaligned_access=true)
Value 1: if -mstrict-align
value 2: if cpu tune param specifies slow_unaligned_access = true
The text was updated successfully, but these errors were encountered: