instructions: GFNI, VAES and VPCLMULQDQ naming inconsistency with x/sys/cpu #364

vsivsi · 2023-01-13T23:01:41Z

With the addition of the new Ice Lake AVX-512 instruction support in Avo, I've encountered a couple of wrinkles in the naming of the required CPUID features for a given function generated by Avo and the golang x/sys/cpu feature detection support.

A bit of history... Up until this point, all Avo CPUID feature names have:

Faithfully followed the Intel naming of these feature sets, and
Enabled a direct 1:1 mapping between the Avo feature name, and the corresponding golang x/sys/cpu flag.

That is, if an Avo generated function says it requires "AVX2", "AVX512F" and "AVX512DQ", those strings can be directly used to generate code to test for the truth of those requrements. (e.g. a "cpu.X86.Has%s" template can be used to generate: cpu.X86.HasAVX2 && cpu.X86.HasAVX512F && cpu.X86.HasAVX512DQ.)

So here's the icky new problem. The Intel blessed (and Avo adopted) identifiers for the GFNI and VAES instructions are literally that. And there appears to be a good reason for this, because unlike AVX512 dependent features like AVX512_VPOPCNTDQ or AVX512_BITALG, Intel is signalling here that it intends to ship SSE and/or AVX (VEX) encoded forms of these instructions that can run on hardware without AVX512 support enabled.

In fact it apparently already has, shipping SSE encoded GFNI support on the "Intel Atom processor based on Tremont microarchitecture" that lacks AVX/AVX512.

So what's the wrinkle? Golang's standard library has defined and implemented identifiers that lock these features to AVX512 both in naming and in the detection logic. See: https://cs.opensource.google/go/x/sys/+/refs/tags/v0.4.0:cpu/cpu_x86.go;l=133-134

This is arguably a bug because, for example, that "Intel Atom processor based on Tremont microarchitecture" will have perfectly functional SSE encoded GFNI instructions that the Go assembler will happily emit (*but see comment below), but that can never run because the standard library assumes AVX512F is a requirement.

Backing out of this likely will require a breaking change to the semantics of the x/sys/cpu library, namely, eliminating the erroneous X86.HasAVX512GNFI and X86.HasAVX512VAES symbols, and replacing them with X86.GNFI and X86.VAES and new logic to ensure they are enabled when present in the absence of AVX512 (the old versions could be maintained with current semantics and deprecated). This could take a long time, and I think there's a non-zero chance they'll just punt and say it's good enough, which would be annoying, and forever require special case code to map between the proper naming for these things and the golang implementation.

So this isn't really an issue in Avo, so much as an issue for Avo.

The text was updated successfully, but these errors were encountered:

vsivsi · 2023-01-13T23:21:52Z

Actually, I'm unsure how (or if) the Go assembler decides how to select among the VEX and EVEX, encodings for a 256-bit wide instruction. For SSE encoding the omitted "V" prefix (e.g. VPADDQ vs PADDQ) is the signal. Does it always use VEX unless an AVX512 option necessitating EVEX is used (e.g. mask, memory broadcast, ymm > ymm15, etc.)?

vsivsi · 2023-01-13T23:47:21Z

And everything above also applies to VPCLMULQDQ including the misnaming/handling in x/sys/cpu

mmcloughlin · 2023-01-14T19:21:02Z

Thanks for pointing this out. I could consider a change in avo to match. It would be a very minor break, but since GFNI isn't even in a tagged release I suspect it wouldn't affect anyone.

However, I tend to agree that we've got this right in avo and x/sys/cpu is wrong? I agree the Go project is probably not going to think this is worth a breaking change for, though.

In the event of implementing feature checks #168 I don't think having a special-case fixup for those specific ISAs is going to be that bad?

Note that @klauspost's cpuid agrees with avo here:

https://pkg.go.dev/github.com/klauspost/cpuid/v2#FeatureID

mmcloughlin · 2023-01-14T19:26:55Z

Ah! Sorry I'm just now grasping what you're saying. It's not simply an issue of applying a naming transform. x/sys/cpu does not have a flag that indicates the presence of GFNI, it only has one that indicates AVX512F && GFNI.

mmcloughlin · 2023-01-14T19:32:17Z

Actually, I'm unsure how (or if) the Go assembler decides how to select among the VEX and EVEX, encodings for a 256-bit wide instruction. For SSE encoding the omitted "V" prefix (e.g. VPADDQ vs PADDQ) is the signal. Does it always use VEX unless an AVX512 option necessitating EVEX is used (e.g. mask, memory broadcast, ymm > ymm15, etc.)?

If I recall correctly that's exactly what it does. It will use VEX unless it has to use EVEX.

avo's handling of this is not great.

avo/internal/load/load.go

Lines 786 to 799 in 05ed388

    
           // vexevex fixes instructions that have both VEX and EVEX encoded forms with the 
        
           // same operand types. Go uses the VEX encoded form unless EVEX-only features 
        
           // are used. This function will only keep the VEX encoded version in the case 
        
           // where both exist. 
        
           // 
        
           // Note this is somewhat of a hack. There are real reasons to use the EVEX 
        
           // encoded version even when both exist. The main reason to use the EVEX version 
        
           // rather than VEX is to use the registers Z16, Z17, ... and up. However, avo 
        
           // does not implement the logic to distinguish between the two halfs of the 
        
           // vector registers. So in its current state the only reason to need the EVEX 
        
           // version is to encode suffixes, and these are represented by other instruction 
        
           // forms. 
        
           // 
        
           // TODO(mbm): restrict use of vector registers https://github.com/mmcloughlin/avo/issues/146

vsivsi · 2023-01-14T20:58:47Z

Yes, on reflection this is how it would have to work, otherwise it would be impossible to write valid AVX/AVX2 code and ensure that it would not fault on hardware without AVX512.

klauspost · 2023-01-15T12:09:54Z

Yes, there are pure SSE2 (non+VEX)/AVX2 (VEX) versions of these. There are "duplicates":

66 0F 3A 44 /r ib PCLMULQDQ xmm1, xmm2/m128, imm8
VEX.128.66.0F3A.WIG 44 /r ib VPCLMULQDQ xmm1, xmm2, xmm3/m128, imm8
EVEX.128.66.0F3A.WIG 44 /r /ib VPCLMULQDQ xmm1, xmm2, xmm3/m128, imm8

VEX.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8
EVEX.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8

I assume the VEX/EVEX versions are only selected by the assembler when using the extended registers.

In #146 I was looking for a way to restrict extended registry usage.

Example of CPU with GFNI and no AVX

 mockcpu_test.go:177: Opening GenuineIntel0090661_ElkhartLake_02_CPUID.txt
    mockcpu_test.go:180: Name: Intel Atom(R) x6425RE Processor @ 1.90GHz
    mockcpu_test.go:182: Max Function:0x1b
    mockcpu_test.go:184: Max Extended Function:0x80000008
    mockcpu_test.go:185: VendorString: GenuineIntel
    mockcpu_test.go:186: VendorID: Intel
    mockcpu_test.go:187: PhysicalCores: 4
    mockcpu_test.go:188: ThreadsPerCore: 1
    mockcpu_test.go:189: LogicalCores: 4
    mockcpu_test.go:190: Family 6 Model: 150 Stepping: 1
    mockcpu_test.go:191: Features: AESNI,CLMUL,CMOV,CMPXCHG8,CX16,ERMS,FLUSH_L1D,FXSR,FXSROPT,GFNI,IA32_ARCH_CAP,IA32_CORE_CAP,IBPB,LAHF,MD_CLEAR,MMX,MOVBE,MOVDIR64B,MOVDIRI,NX,OSXSAVE,POPCNT,RDRAND,RDSEED,RDTSCP,SHA,SPEC_CTRL_SSBD,SSE,SSE2,SSE3,SSE4,SSE42,SSSE3,STIBP,SYSCALL,SYSEE,VMX,WAITPKG,X87,XGETBV1,XSAVE,XSAVEC,XSAVEOPT,XSAVES
    mockcpu_test.go:192: Microarchitecture level: 2
    mockcpu_test.go:193: Cacheline bytes: 64
    mockcpu_test.go:194: L1 Instruction Cache: 32768 bytes
    mockcpu_test.go:195: L1 Data Cache: 32768 bytes
    mockcpu_test.go:196: L2 Cache: 1572864 bytes
    mockcpu_test.go:197: L3 Cache: 4194304 bytes
    mockcpu_test.go:198: Hz: 1900000000 Hz
    mockcpu_test.go:199: Boost: 1900000000 Hz

vsivsi · 2023-01-18T21:18:07Z

I just checked and there doesn't appear to be an existing issue on the golang project for this problem. I've already prototyped it and the fix is straightforward, so I'm considering filing a new issue over there this week.

vsivsi changed the title ~~GFNI and VAES instruction support CPUID naming inconsistency with x/sys/cpu~~ GFNI, VAES & VPCLMULQDQ instruction support CPUID naming inconsistency with x/sys/cpu Jan 13, 2023

mmcloughlin changed the title ~~GFNI, VAES & VPCLMULQDQ instruction support CPUID naming inconsistency with x/sys/cpu~~ instructions: GFNI, VAES and VPCLMULQDQ naming inconsistency with x/sys/cpu Jan 14, 2023

mmcloughlin added the instructions Instruction database label Jan 14, 2023

vsivsi mentioned this issue Jul 20, 2023

internal/cpu: detect OS support for AVX-512 and CPU support for VAES and VPCLMULQDQ instructions for performance optimization of crypto ciphers golang/go#43925

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

instructions: GFNI, VAES and VPCLMULQDQ naming inconsistency with x/sys/cpu #364

instructions: GFNI, VAES and VPCLMULQDQ naming inconsistency with x/sys/cpu #364

vsivsi commented Jan 13, 2023 •

edited

vsivsi commented Jan 13, 2023 •

edited

vsivsi commented Jan 13, 2023

mmcloughlin commented Jan 14, 2023

mmcloughlin commented Jan 14, 2023

mmcloughlin commented Jan 14, 2023

vsivsi commented Jan 14, 2023

klauspost commented Jan 15, 2023 •

edited

vsivsi commented Jan 18, 2023

instructions: GFNI, VAES and VPCLMULQDQ naming inconsistency with x/sys/cpu #364

instructions: GFNI, VAES and VPCLMULQDQ naming inconsistency with x/sys/cpu #364

Comments

vsivsi commented Jan 13, 2023 • edited

vsivsi commented Jan 13, 2023 • edited

vsivsi commented Jan 13, 2023

mmcloughlin commented Jan 14, 2023

mmcloughlin commented Jan 14, 2023

mmcloughlin commented Jan 14, 2023

vsivsi commented Jan 14, 2023

klauspost commented Jan 15, 2023 • edited

vsivsi commented Jan 18, 2023

vsivsi commented Jan 13, 2023 •

edited

vsivsi commented Jan 13, 2023 •

edited

klauspost commented Jan 15, 2023 •

edited