Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Vector ABI] Type size and alignment for vector types. #380

Merged
merged 1 commit into from
Jan 10, 2024

Conversation

kito-cheng
Copy link
Collaborator

@kito-cheng kito-cheng commented May 18, 2023

The issue of Vector alignment is discussed in #347. It is mentioned that aligning to 128 bits might deliver better performance on some RISC-V cores, but this behavior could lead to considerable stack wastage on zve32 and zve64 cores. For instance, in order to ensure a vector value in the stack conforms to the ABI specification, we could potentially waste up to 96 bits per vector object in stack for zve32, and the performance difference isn't always evident across all core implementations.

Therefore, this proposal sets the alignment of vector types to element alignment, to avoid wasting a significant amount of stack space in zve32 and zve64 configurations. Also, since the ABI only specify the minimum alignment and doesn't limit the compiler from adopting higher alignment for specific CPUs.

Fix #347.

@kito-cheng
Copy link
Collaborator Author

Copy link
Contributor

@rofirrim rofirrim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @kito-cheng for this.

The issue of Vector alignment is discussed in #347. It is mentioned that aligning to 128 bytes might deliver better performance on some RISC-V cores,

I think you meant 128 bits? (later on you say 96 bits which would match with bits).

Just to confirm the intent of this change: this is always minimal alignment and a compiler implementation/platform may choose to use a larger value. Also, a user could use __attribute__((aligned(16))) (or similar mechanisms) to enforce larger alignments.

Now this probably only impacts stack variables but if we ever have aggregates of vector type, the alignment will be used there when computing offsets. Maybe I'm missing some other case.

riscv-elf.adoc Outdated Show resolved Hide resolved
@kito-cheng
Copy link
Collaborator Author

I think you meant 128 bits? (later on you say 96 bits which would match with bits).

Yeah, should be bits.

Just to confirm the intent of this change: this is always minimal alignment and a compiler implementation/platform may choose to use a larger value. Also, a user could use attribute((aligned(16))) (or similar mechanisms) to enforce larger alignments.

Now this probably only impacts stack variables but if we ever have aggregates of vector type, the alignment will be used there when computing offsets. Maybe I'm missing some other case.

Yes, compiler always has freedom to use bigger/higher alignment, and this mostly affect stack object, another one that will be affected is attribute(riscv_rvv_vector_bits(N))[1], since with that attribute, rvv type could also live as global variable and live within struct.

[1] https://reviews.llvm.org/D145088

@zhongjuzhe
Copy link

| __rvv_vbool1_t | vbool1_t | VLENB | 1
| __rvv_vbool2_t | vbool2_t | VLENB | 1
| __rvv_vbool4_t | vbool4_t | VLENB | 1
| __rvv_vbool8_t | vbool8_t | VLENB | 1
| __rvv_vbool16_t | vbool16_t | VLENB | 1
| __rvv_vbool32_t | vbool32_t | VLENB | 1
| __rvv_vbool64_t | vbool64_t | VLENB | 1

I understand LLVM mask type size is adjusted as doc said.

However, GCC byte size of these mask type is not adjusted like this.

GCC configuration for BOOL type
vbool1_t VLENB
vbool2_t VLENB / 2
vbool4_t VLENB / 4
vbool8_t VLENB / 8
vbool16_t CEIL (VLENB / 16)
vbool32_t CEIL (VLENB / 32)
vbool64_t CEIL (VLENB / 64)

I am not sure whether this inconsistency between GCC and LLVM is necessary to be documented in the ABI doc.

Otherwise, LGTM. Thanks.

@kito-cheng
Copy link
Collaborator Author

| __rvv_vbool1_t | vbool1_t | VLENB | 1
| __rvv_vbool2_t | vbool2_t | VLENB | 1
| __rvv_vbool4_t | vbool4_t | VLENB | 1
| __rvv_vbool8_t | vbool8_t | VLENB | 1
| __rvv_vbool16_t | vbool16_t | VLENB | 1
| __rvv_vbool32_t | vbool32_t | VLENB | 1
| __rvv_vbool64_t | vbool64_t | VLENB | 1

I understand LLVM mask type size is adjusted as doc said.

However, GCC byte size of these mask type is not adjusted like this.

GCC configuration for BOOL type vbool1_t VLENB vbool2_t VLENB / 2 vbool4_t VLENB / 4 vbool8_t VLENB / 8 vbool16_t CEIL (VLENB / 16) vbool32_t CEIL (VLENB / 32) vbool64_t CEIL (VLENB / 64)

I am not sure whether this inconsistency between GCC and LLVM is necessary to be documented in the ABI doc.

Otherwise, LGTM. Thanks.

Yeah, I know GCC has did some more compact way, that's the reason I add the note at the end:

NOTE: The vector mask types utilize a portion of the space, while the remaining content may be undefined, both in the register and in memory.

Thanks for review :)

@kito-cheng
Copy link
Collaborator Author

Changes:

  • Adjust size for vector mask types

@kito-cheng
Copy link
Collaborator Author

kito-cheng commented Jun 9, 2023

The motivation of updating the vector mask type to more compact size: this allow compiler to store that in compact way without ABI violation.

@kito-cheng
Copy link
Collaborator Author

ping @rofirrim @asb @topperc @preames

I would like to get some feedback from LLVM community before merge :)

@rofirrim
Copy link
Contributor

rofirrim commented Jun 9, 2023

ping @rofirrim @asb @topperc @preames

I would like to get some feedback from LLVM community before merge :)

No objections on my side, everything looks reasonable to me.

riscv-cc.adoc Outdated Show resolved Hide resolved
@kito-cheng
Copy link
Collaborator Author

kito-cheng commented Jun 12, 2023

Changes:

  • Added vector fp16 and vector bfloat16 types.

riscv-cc.adoc Outdated Show resolved Hide resolved
riscv-elf.adoc Outdated Show resolved Hide resolved
riscv-elf.adoc Outdated Show resolved Hide resolved
riscv-elf.adoc Outdated Show resolved Hide resolved
@kito-cheng
Copy link
Collaborator Author

Changes:

  • Add simple generator script for gen the table.
  • Add tuple type.

Copy link

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was reviewing this and I had a few thoughts.

gen_vector_type_infos.py Outdated Show resolved Hide resolved
riscv-cc.adoc Outdated Show resolved Hide resolved
riscv-cc.adoc Show resolved Hide resolved
riscv-elf.adoc Outdated Show resolved Hide resolved
riscv-elf.adoc Outdated Show resolved Hide resolved
gen_vector_type_infos.py Show resolved Hide resolved
riscv-elf.adoc Outdated Show resolved Hide resolved
riscv-elf.adoc Show resolved Hide resolved
The issue of Vector alignment is discussed in #347. It is mentioned that
aligning to 128 bytes might deliver better performance on some RISC-V cores,
but this behavior could lead to considerable stack wastage on zve32 and zve64
cores. For instance, in order to ensure a vector value in the stack conforms
to the ABI specification, we could potentially waste up to 96 bits per vector
object in stack for zve32, and the performance difference isn't always evident
across all core implementations.

Therefore, this proposal sets the alignment of vector types to element
alignment, to avoid wasting a significant amount of stack space in zve32 and
zve64 configurations. Also, since the ABI only specify the minimum alignment
and doesn't limit the compiler from adopting higher alignment for specific CPUs.

Fix #347.
@kito-cheng
Copy link
Collaborator Author

Going to merge this, thanks everyone :)

@kito-cheng kito-cheng merged commit 74ec57e into master Jan 10, 2024
4 checks passed
@kito-cheng kito-cheng deleted the vector-align branch January 10, 2024 02:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ABI alignment of vector types
6 participants