8262355: Support for AVX-512 opmask register allocation. #2768
AVX-512 added 8 new 64 bit opmask registers . These registers allow conditional execution and efficient merging of destination operands. At present cross instruction mask propagation is being done either using a GPR (e.g. vmask_gen patterns in x86.ad) or a vector register (for propagating results of a vector comparison or vector load mask operations).
This base patch extends the register allocator to support allocation of opmask registers. This will facilitate mask propagation across instructions and thus enable emitting efficient instruction sequence over X86 targets supporting AVX-512 feature.
We intend to build a robust optimization framework based on this patch to emit optimized instruction sequence for masked/predicated vector operation for X86 targets supporting AVX-512.
Please review and share your feedback.
Summary of changes:
 : Section 15.1.3 : https://software.intel.com/content/www/us/en/develop/download/intel-64-and-ia-32-architectures-software-developers-manual-volume-1-basic-architecture.html
Good work, Jatin!
I'd like to focus high-level aspects first.
There's a significant amount of crux coming from the fact that masks
Also, my understanding is AArch64/SVE allows predicate registers to be
Second question is about x86 and different mask representations it has:
Regarding the patch itself, RegisterSaver support and related changes
On 28.02.2021 21:40, Jatin Bhateja wrote:
@XiaohongGong also posted Arm SVE predicate register allocation support in panama-vector together with other commits about vector masking support: openjdk/panama-vector#40 last week before this PR. The predicate register allocation part has been tested for some time internally and could be separated from that PR (openjdk/panama-vector@e658f4d). If it helps, we can also propose a patch here in openjdk/jdk.
Yes, AArch64/SVE predicate registers could be larger. I see in Jatin's patch, it has arch dependent type Matcher::predicate_reg_type(), that looks hacky and workable. But I would still prefer a dedicate type, which looks cleaner. Would a dedicate type also work for k-register?
Yes, as @nsjian mentioned above, we added a new mask type mapped to a predicate register. Besides, to make a difference with the old vector IRs that uses vector registers for mask on other platforms, we also added a new abstract IR (
Current register allocation framework can perform allocation at the granularity of 32 bit. Thus in order to allocate a 64 bit register we reserve 2 bits from the register mask of its corresponding live range. Spilling code (MachSpillCopyNode::implementation) also is sensitive to this since a 32 bit def in 64 bit mode spill only 32 bit value.
Opmask register is special in a way such that usable register portion could be 8,16,32 or 64 bit wide depending on the lane type and vector size. Thus in an optimal implementation both allocator and spill code may allocate and spill only the usable portion of the opmask register. This may not be possible in current allocation frame work.
Keeping this added complexity out of implementation, existing patch performs both allocation and spilling at 64 bit granularity. This is why a LONG type is sufficient to represent an Opmask register for X86.
I agree that ARM SVE may have to create a new mask Type since performing the spill and allocation at widest possible mask will be costly. Thus Matcher::perdicate_reg_type() can be used to return the LONG type for X86 and new mask type for ARM SVE. This will prevent any modification in target independent IR.
Also for X86 a mask generating node may have different Ideal type and register for non-AVX512 targets.
Please let me know if there is any disconnect in my understanding here.
For AVX-512 targets any mask generating node will have LONG type and a vector type for non-AVX512 case.
Please elaborate what adverse implication do you see with this approach.
Yes, it may be attractive in the future, but I don't see it as something
That's fine with me.
Matcher::perdicate_reg_type() is not enough to hide the
Considering there's active work going on on SVE support, I'm in favor of
What I'd like to avoid is the situation when different Ideal IR shapes
Customizing node types fits that goal well, but I don't fully understand
(1) having a node which can be of type TypeVect or TypeLong may be
(2) as of now, there's nothing which forbids pre-AVX512 and AVX512
On Mar 3, 2021, at 4:22 AM, Vladimir Ivanov <vladimir.x.ivanov at oracle.com> wrote:
I agree. Would it help to have TypeMask as a #define of TypeLong,
Unfortunately, I don't see how a macro could help here. It would still
I find TypeVMask which is proposed as part of SVE support (Ningsheng
Also, a comment on process: Ningsheng and Xiaohong started with RFC on