Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Preprocessor definition to specify data alignment strict(ness) #32

Closed
vineetgarc opened this issue Feb 7, 2023 · 25 comments
Closed

Comments

@vineetgarc
Copy link

I would like to propose a compiler generated preprocessor macro to signal data alignment heuristics used by the compiler for codegen.
This could be used by library writers to target code for efficient unaligned access (or not).
Granted we would like to eventually use runtime probing to figure this out, it would help the user to know what the compiler assumptions are anyways (or if user coaxed it into a certain semantics via additional toggles)

At a high level it reflects whether gcc toggle -mstrict-align has been used to build.

Speaking of gcc there' another wrinkle to worry about. gcc driver has a notion of cpu tune param which specifies whether unaligned accesses are generally efficient on the cpu (or not). And a cpu tune with slow_unaligned_access=true will disregard -mno-strict-align, compiler codegen assuming as if -mstrict-align was passed. So the proposed preprocessor macro needs to also reflect this.

Here's the specification:
__riscv_strict_align (only defined if -mstrict-align or cpu tune param slow_unaligned_access=true)
Value 1: if -mstrict-align
value 2: if cpu tune param specifies slow_unaligned_access = true

@vineetgarc
Copy link
Author

FWIW there's a gcc patch posted along those lines
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610115.html

@asb
Copy link

asb commented Feb 8, 2023

Thanks for opening this discussion. One thing I'm wondering about is whether it makes sense to couple this primarily to -mstrict-align / -mno-strict-align.

The RISC-V profiles proposal finally introduces a way to indicate whether misaligned loads/stores are supported or not (Zicclsm, which once supported by compilers should be queryable via __riscv_zicclsm as for any other extension).

Once you have that capability, might it be more intuitive to have a define dedicated solely to whether misaligned loads/stores are slow or fast? Not defined if Zicclsm isn't present, 1 if slow (potentially also if -mstrict-align was used, taking that as a hint that such accesses should be considered slow), 0 if not slow.

I'm wary of derailing this by trying to over-generalise it. But it's probably also worth at least considering if we'd like some naming scheme that reflected the extension that tuning hint is related to. e.g. __riscv_zicclsm_slow.

@vineetgarc
Copy link
Author

Thanks for opening this discussion. One thing I'm wondering about is whether it makes sense to couple this primarily to -mstrict-align / -mno-strict-align.

As specified, there's more information provided to user, they can choose to not use it. If a user cared only about -mstrict-align they could just do a #ifdef flag w/o caring for actual value.

The RISC-V profiles proposal finally introduces a way to indicate whether misaligned loads/stores are supported or not (Zicclsm, which once supported by compilers should be queryable via __riscv_zicclsm as for any other extension).

My read on Zicclsm is, that a cpu could be Zicclsm complaint despite very slow unaligned accesses as long as it supported them natively. The intent here is to allow riscv common library code to make that distinction, specifically how the code was built. We want to specify if unaligned access is prohibitive, vs. allowed but not desirable vs. as good as regular accesses. Granted I would personally settle for binary strict aligned vs. not semantics, but it seems tooling (gcc at least) already supports a finer grained distinction.

Once you have that capability, might it be more intuitive to have a define dedicated solely to whether misaligned loads/stores are slow or fast? Not defined if Zicclsm isn't present, 1 if slow (potentially also if -mstrict-align was used, taking that as a hint that such accesses should be considered slow), 0 if not slow.

Sure, that's one possible way to go about this. This came up as someone was already starting to write glibc code which needed this build time distinction. FWIW I wasn't aware of this extension and/or that it is still being discussed and wha the times lines of ratification are etc.

I'm wary of derailing this by trying to over-generalise it. But it's probably also worth at least considering if we'd like some naming scheme that reflected the extension that tuning hint is related to. e.g. __riscv_zicclsm_slow.

Personally I'd like to keep them separate (with people throwing additional defines if needed to) as in this case the extension driving this doesn't really matter as much vs. the actual codegen semantics.

@asb
Copy link

asb commented Feb 8, 2023

We want to specify if unaligned access is prohibitive, vs. allowed but not desirable vs. as good as regular accesses.

Assuming you meant "prohibited", I think my point is that getting the prohibited or not bit of information should be doable via probing zicclsm, so this define could be recast to focus just on the undesirable (slow) vs as good as regular access (fast) distinction.

That said, I don't know whether zicclsm being added to the RISC-V vocabulary is blocked on the profiles work as a whole being approved. If it does indeed look stuck, then I have a lot of sympathy for moving ahead with something that ignores it for pragmatic reasons, even if it is a shame that we end up with overlapping ways of expressing the same information.

@vineetgarc
Copy link
Author

Sounds good to me. It would be nice to have some clarity about Zicclsm from higher powers that may be.

@aswaterman
Copy link
Contributor

I think the profiles spec says it all, and @vineetgarc has summarized it accurately. Of Zicclsm, it says, "Even though mandated, misaligned loads and stores might execute extremely slowly. Standard software distributions should assume their existence only for correctness, not for performance."

This means to me that __riscv_zicclsm is not the right name for a macro that suggests misaligned accesses should be routinely used. __riscv_zicclsm_slow is denotatively much more accurate, but it has the downside of sounding pejorative. That's why I suggested something to the effect of __riscv_avoid_misaligned on the mailing list. At the moment, I don't have a better suggestion that incorporates both the Zicclsm name but avoids the bad connotation.

@aswaterman
Copy link
Contributor

How about inverting the polarity and defining __riscv_zicclsm_fast when the extension is present and presumed to be fast? This has a few advantages: it avoids the pejorative description of the slow variants, and it means the misaligned-optimized routines only need to check one macro, rather than checking both __riscv_zicclsm and !__riscv_zicclsm_slow.

Obviously, to remain consistent with the RVA profile spec, we need to arrange that the compilers only define __riscv_zicclsm_fast when explicitly tuning for a specific uarch. The default needs to be for this macro to remain undefined.

@kito-cheng
Copy link
Collaborator

Maybe just decouple with zicclsm? and made a more explicitly/specific macro name?


__riscv_unaligned_access

  • Undefined: Unknown
  • 0: Unsupported
  • 1: Support, but not guarantee fast or slow.
  • 2: Slow
  • 3: Fast
  • 4: Very fast

When zicclsm present __riscv_unaligned_access will at least 1.

@vineetgarc
Copy link
Author

Maybe just decouple with zicclsm? and made a more explicitly/specific macro name?

+1

__riscv_unaligned_access
Undefined: Unknown
0: Unsupported
1: Support, but not guarantee fast or slow.
2: Slow
3: Fast
4: Very fast

This seems like an overkill, considering what the end user of this define (library code writer) will typically target.
Moreover this needs to tie back to gcc codegen facilities so unless gcc has toggles for such fine granularity it won't know what value to generate (unless we invent new toggles). Anyhow the intention behind all of this was to indicate what compiler is codegen for (default or forced via explicit toggles) vs. a runtime characteristic which is better dynamically probed.

How about following

  • __riscv_unaligned_bad only define if -mstrict-align
  • __riscv_unaligned_slow only define if cpu tune param has slow_unaligned_access=true
  • __riscv_unaligned_fast for fast unaligned case (or maybe even drop this

@asb
Copy link

asb commented Feb 9, 2023

This means to me that __riscv_zicclsm is not the right name for a macro that suggests misaligned accesses should be routinely used.

Just to clarify, that isn't what I was suggesting. I was pointing out that with zicclsm we should already have a way of determining if misaligned loads are supported, which suggests a separate define can instead focus on if they're fast or not. So something like __riscv_zicclsm (already there due to the standard definitions for supported extension) and __riscv_zicclsm_fast appeals to me.

But...opinions can vary ,and the continued success of RISC-V isn't going to depend on whether you have defines with slightly overlapping definitions etc etc. So if there's consensus around some other option I'm not going to drag out the discussion.

@aswaterman
Copy link
Contributor

I think we’re actually in agreement.

@vineetgarc
Copy link
Author

The naming of options is really not important to me at least.
We just need to find a mapping from -mstrict-align and/or gcc cpu tune param slow_unaligned_access to these defines.

__riscv_zicclsm could just be a neutral artifact of corresponding -march option (but no actual alignment semantics ? )
__riscv_zicclsm_fast maps well to gcc cpu tune param slow_unaligned_access=false.

The question is what does -mstrict-align do to these and/or define yet something new.

@asb
Copy link

asb commented Feb 9, 2023

Perhaps if -mstrict-align is passed, then __riscv_zicclsm_fast is unset, on the basis that the user probably wanted to override any assumption that misaligned accesses are cheap to use.

@vineetgarc
Copy link
Author

Do we keep different values for __riscv_zicclsm_fast for -mno-strict-align vs. cpu tune slow_unaligned_access=false ?

@aswaterman
Copy link
Contributor

aswaterman commented Feb 9, 2023

I think it's just __riscv_zicclsm <=> -mno-strict-align

and

__riscv_zicclsm_fast <=> -mno-strict-align && slow_unaligned_access=false

i.e. just booleans.

@vineetgarc
Copy link
Author

vineetgarc commented Feb 9, 2023

think it's just __riscv_zicclsm <=> -mno-strict-align

Do note that users typically don't specify -mno-strict-align

__riscv_zicclsm_fast <=> -mno-strict-align && slow_unaligned_access=false

This would also be the case if -mno-strict-align is not specified with just slow_unaligned_access=false ?

One more point: slow_unaligned_access=true is treated same as -mstrict-align (which actually is the case even today, but I was hoping if the define could make this distinction).

@aswaterman
Copy link
Contributor

This would also be the case if -mno-strict-align is not specified with just slow_unaligned_access=false ?

One more point: slow_unaligned_access=true is treated same as -mstrict-align (which actually is the case even today, but I was hoping if the define could make this distinction).

Seems like this is getting far into the GCC implementation details. The important point is that __riscv_zicclsm_fast should be defined only if misaligned accesses are both legal and high performance.

@aswaterman
Copy link
Contributor

Do note that users typically don't specify -mno-strict-align

I know. It's the default. I don't think it changes what I wrote.

@kito-cheng
Copy link
Collaborator

I support your proposal for this version:

__riscv_unaligned_bad only define if -mstrict-align
__riscv_unaligned_slow only define if cpu tune param has slow_unaligned_access=true
__riscv_unaligned_fast for fast unaligned case (or maybe even drop this

Personally I would like to prevent zicclsm since it's kind of not clear to user, but it's not strongly opinion.

@vineetgarc could you prepare a PR? I am happy to help pushing this forward :)

@vineetgarc
Copy link
Author

I could but it seems world has moved on :-)
Rather than baking this at compile time, dynamic probe is starting to trickle in.

@kito-cheng
Copy link
Collaborator

I thought that still could help people to write some multi version code?

@vineetgarc
Copy link
Author

So this proposal relies on compiler supporting a cpu tune param which can override -mno-strict-align. gcc supports that I'm not sure if llvm does ? And if not, how do we incorporate that into the common c-api doc ? Would it be left as "implementation defined". This is specially important since the default value (in lack of an explicit cmdline toggle) is derived from cpu tune param

@kito-cheng
Copy link
Collaborator

My thought is we only define the mean of the macro, not define the implementation detail, like:

e.g.
__riscv_unaligned_bad: do not doing unaligned access, it might cause trap or extremely slow.
__riscv_unaligned_slow: unaligned access is much slow than aligned access.
__riscv_unaligned_fast: unaligned case is fast.

@vineetgarc
Copy link
Author

vineetgarc commented May 10, 2023

Here's the pull request for same: #40

vineetgarc pushed a commit to vineetgarc/riscv-c-api-doc that referenced this issue May 10, 2023
riscv-non-isa#32

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
vineetgarc pushed a commit to vineetgarc/riscv-c-api-doc that referenced this issue May 11, 2023
riscv-non-isa#32

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
vineetgarc pushed a commit to vineetgarc/riscv-c-api-doc that referenced this issue May 11, 2023
riscv-non-isa#32

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
vineetgarc pushed a commit to vineetgarc/riscv-c-api-doc that referenced this issue May 17, 2023
riscv-non-isa#32

Signed-off-by: Vineet Gupta <vineetg@rivosinc.com>
@vineetgarc
Copy link
Author

This was merged upstream as a98e309

wangliu-iscas pushed a commit to plctlab/patchwork-gcc that referenced this issue Aug 15, 2023
This patch is a modification of
https://gcc.gnu.org/pipermail/gcc-patches/2023-January/610115.html
following the discussion on
riscv-non-isa/riscv-c-api-doc#32

Distinguish between explicit -mstrict-align and cpu tune param
for slow_unaligned_access=true/false.

Tested for regressions using rv32/64 multilib with newlib/linux

gcc/ChangeLog:

	* config/riscv/riscv-c.cc (riscv_cpu_cpp_builtins):
	  Generate __riscv_unaligned_avoid with value 1 or
             __riscv_unaligned_slow with value 1 or
             __riscv_unaligned_fast with value 1
	* config/riscv/riscv.cc (riscv_option_override):
    Define riscv_user_wants_strict_align. Set
    riscv_user_wants_strict_align to TARGET_STRICT_ALIGN
	* config/riscv/riscv.h: Declare riscv_user_wants_strict_align

gcc/testsuite/ChangeLog:

	* gcc.target/riscv/attribute-1.c: Check for
    __riscv_unaligned_slow or __riscv_unaligned_fast
	* gcc.target/riscv/attribute-4.c: Check for
    __riscv_unaligned_avoid
	* gcc.target/riscv/attribute-5.c: Check for
    __riscv_unaligned_slow or __riscv_unaligned_fast
	* gcc.target/riscv/predef-align-1.c: New test.
	* gcc.target/riscv/predef-align-2.c: New test.
	* gcc.target/riscv/predef-align-3.c: New test.
	* gcc.target/riscv/predef-align-4.c: New test.
	* gcc.target/riscv/predef-align-5.c: New test.
	* gcc.target/riscv/predef-align-6.c: New test.

Signed-off-by: Edwin Lu <ewlu@rivosinc.com>
Co-authored-by: Vineet Gupta <vineetg@rivosinc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants