Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

instructions: GFNI, VAES and VPCLMULQDQ naming inconsistency with x/sys/cpu #364

Open
vsivsi opened this issue Jan 13, 2023 · 8 comments
Open
Labels
instructions Instruction database

Comments

@vsivsi
Copy link
Contributor

vsivsi commented Jan 13, 2023

With the addition of the new Ice Lake AVX-512 instruction support in Avo, I've encountered a couple of wrinkles in the naming of the required CPUID features for a given function generated by Avo and the golang x/sys/cpu feature detection support.

A bit of history... Up until this point, all Avo CPUID feature names have:

  1. Faithfully followed the Intel naming of these feature sets, and
  2. Enabled a direct 1:1 mapping between the Avo feature name, and the corresponding golang x/sys/cpu flag.

That is, if an Avo generated function says it requires "AVX2", "AVX512F" and "AVX512DQ", those strings can be directly used to generate code to test for the truth of those requrements. (e.g. a "cpu.X86.Has%s" template can be used to generate: cpu.X86.HasAVX2 && cpu.X86.HasAVX512F && cpu.X86.HasAVX512DQ.)

So here's the icky new problem. The Intel blessed (and Avo adopted) identifiers for the GFNI and VAES instructions are literally that. And there appears to be a good reason for this, because unlike AVX512 dependent features like AVX512_VPOPCNTDQ or AVX512_BITALG, Intel is signalling here that it intends to ship SSE and/or AVX (VEX) encoded forms of these instructions that can run on hardware without AVX512 support enabled.

In fact it apparently already has, shipping SSE encoded GFNI support on the "Intel Atom processor based on Tremont microarchitecture" that lacks AVX/AVX512.

image

So what's the wrinkle? Golang's standard library has defined and implemented identifiers that lock these features to AVX512 both in naming and in the detection logic. See: https://cs.opensource.google/go/x/sys/+/refs/tags/v0.4.0:cpu/cpu_x86.go;l=133-134

This is arguably a bug because, for example, that "Intel Atom processor based on Tremont microarchitecture" will have perfectly functional SSE encoded GFNI instructions that the Go assembler will happily emit (*but see comment below), but that can never run because the standard library assumes AVX512F is a requirement.

Backing out of this likely will require a breaking change to the semantics of the x/sys/cpu library, namely, eliminating the erroneous X86.HasAVX512GNFI and X86.HasAVX512VAES symbols, and replacing them with X86.GNFI and X86.VAES and new logic to ensure they are enabled when present in the absence of AVX512 (the old versions could be maintained with current semantics and deprecated). This could take a long time, and I think there's a non-zero chance they'll just punt and say it's good enough, which would be annoying, and forever require special case code to map between the proper naming for these things and the golang implementation.

So this isn't really an issue in Avo, so much as an issue for Avo.

@vsivsi
Copy link
Contributor Author

vsivsi commented Jan 13, 2023

  • Actually, I'm unsure how (or if) the Go assembler decides how to select among the VEX and EVEX, encodings for a 256-bit wide instruction. For SSE encoding the omitted "V" prefix (e.g. VPADDQ vs PADDQ) is the signal. Does it always use VEX unless an AVX512 option necessitating EVEX is used (e.g. mask, memory broadcast, ymm > ymm15, etc.)?

@vsivsi
Copy link
Contributor Author

vsivsi commented Jan 13, 2023

And everything above also applies to VPCLMULQDQ including the misnaming/handling in x/sys/cpu

@vsivsi vsivsi changed the title GFNI and VAES instruction support CPUID naming inconsistency with x/sys/cpu GFNI, VAES & VPCLMULQDQ instruction support CPUID naming inconsistency with x/sys/cpu Jan 13, 2023
@mmcloughlin mmcloughlin changed the title GFNI, VAES & VPCLMULQDQ instruction support CPUID naming inconsistency with x/sys/cpu instructions: GFNI, VAES and VPCLMULQDQ naming inconsistency with x/sys/cpu Jan 14, 2023
@mmcloughlin
Copy link
Owner

Thanks for pointing this out. I could consider a change in avo to match. It would be a very minor break, but since GFNI isn't even in a tagged release I suspect it wouldn't affect anyone.

However, I tend to agree that we've got this right in avo and x/sys/cpu is wrong? I agree the Go project is probably not going to think this is worth a breaking change for, though.

In the event of implementing feature checks #168 I don't think having a special-case fixup for those specific ISAs is going to be that bad?

Note that @klauspost's cpuid agrees with avo here:

https://pkg.go.dev/github.com/klauspost/cpuid/v2#FeatureID

@mmcloughlin
Copy link
Owner

Ah! Sorry I'm just now grasping what you're saying. It's not simply an issue of applying a naming transform. x/sys/cpu does not have a flag that indicates the presence of GFNI, it only has one that indicates AVX512F && GFNI.

@mmcloughlin
Copy link
Owner

  • Actually, I'm unsure how (or if) the Go assembler decides how to select among the VEX and EVEX, encodings for a 256-bit wide instruction. For SSE encoding the omitted "V" prefix (e.g. VPADDQ vs PADDQ) is the signal. Does it always use VEX unless an AVX512 option necessitating EVEX is used (e.g. mask, memory broadcast, ymm > ymm15, etc.)?

If I recall correctly that's exactly what it does. It will use VEX unless it has to use EVEX.

avo's handling of this is not great.

avo/internal/load/load.go

Lines 786 to 799 in 05ed388

// vexevex fixes instructions that have both VEX and EVEX encoded forms with the
// same operand types. Go uses the VEX encoded form unless EVEX-only features
// are used. This function will only keep the VEX encoded version in the case
// where both exist.
//
// Note this is somewhat of a hack. There are real reasons to use the EVEX
// encoded version even when both exist. The main reason to use the EVEX version
// rather than VEX is to use the registers Z16, Z17, ... and up. However, avo
// does not implement the logic to distinguish between the two halfs of the
// vector registers. So in its current state the only reason to need the EVEX
// version is to encode suffixes, and these are represented by other instruction
// forms.
//
// TODO(mbm): restrict use of vector registers https://github.com/mmcloughlin/avo/issues/146

@mmcloughlin mmcloughlin added the instructions Instruction database label Jan 14, 2023
@vsivsi
Copy link
Contributor Author

vsivsi commented Jan 14, 2023

Yes, on reflection this is how it would have to work, otherwise it would be impossible to write valid AVX/AVX2 code and ensure that it would not fault on hardware without AVX512.

@klauspost
Copy link
Contributor

klauspost commented Jan 15, 2023

Yes, there are pure SSE2 (non+VEX)/AVX2 (VEX) versions of these. There are "duplicates":

66 0F 3A 44 /r ib PCLMULQDQ xmm1, xmm2/m128, imm8
VEX.128.66.0F3A.WIG 44 /r ib VPCLMULQDQ xmm1, xmm2, xmm3/m128, imm8
EVEX.128.66.0F3A.WIG 44 /r /ib VPCLMULQDQ xmm1, xmm2, xmm3/m128, imm8

VEX.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8
EVEX.256.66.0F3A.WIG 44 /r /ib VPCLMULQDQ ymm1, ymm2, ymm3/m256, imm8

I assume the VEX/EVEX versions are only selected by the assembler when using the extended registers.

In #146 I was looking for a way to restrict extended registry usage.

Example of CPU with GFNI and no AVX
 mockcpu_test.go:177: Opening GenuineIntel0090661_ElkhartLake_02_CPUID.txt
    mockcpu_test.go:180: Name: Intel Atom(R) x6425RE Processor @ 1.90GHz
    mockcpu_test.go:182: Max Function:0x1b
    mockcpu_test.go:184: Max Extended Function:0x80000008
    mockcpu_test.go:185: VendorString: GenuineIntel
    mockcpu_test.go:186: VendorID: Intel
    mockcpu_test.go:187: PhysicalCores: 4
    mockcpu_test.go:188: ThreadsPerCore: 1
    mockcpu_test.go:189: LogicalCores: 4
    mockcpu_test.go:190: Family 6 Model: 150 Stepping: 1
    mockcpu_test.go:191: Features: AESNI,CLMUL,CMOV,CMPXCHG8,CX16,ERMS,FLUSH_L1D,FXSR,FXSROPT,GFNI,IA32_ARCH_CAP,IA32_CORE_CAP,IBPB,LAHF,MD_CLEAR,MMX,MOVBE,MOVDIR64B,MOVDIRI,NX,OSXSAVE,POPCNT,RDRAND,RDSEED,RDTSCP,SHA,SPEC_CTRL_SSBD,SSE,SSE2,SSE3,SSE4,SSE42,SSSE3,STIBP,SYSCALL,SYSEE,VMX,WAITPKG,X87,XGETBV1,XSAVE,XSAVEC,XSAVEOPT,XSAVES
    mockcpu_test.go:192: Microarchitecture level: 2
    mockcpu_test.go:193: Cacheline bytes: 64
    mockcpu_test.go:194: L1 Instruction Cache: 32768 bytes
    mockcpu_test.go:195: L1 Data Cache: 32768 bytes
    mockcpu_test.go:196: L2 Cache: 1572864 bytes
    mockcpu_test.go:197: L3 Cache: 4194304 bytes
    mockcpu_test.go:198: Hz: 1900000000 Hz
    mockcpu_test.go:199: Boost: 1900000000 Hz

@vsivsi
Copy link
Contributor Author

vsivsi commented Jan 18, 2023

I just checked and there doesn't appear to be an existing issue on the golang project for this problem. I've already prototyped it and the fix is straightforward, so I'm considering filing a new issue over there this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
instructions Instruction database
Projects
None yet
Development

No branches or pull requests

3 participants