Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add x86-64 architecture levels (v2-v4) as architectures #2315

Closed

Conversation

Vogtinator
Copy link
Contributor

The x86_64 SysV psABI defines four levels of x86_64 with certain CPU features required for each level. Those definitions are meant to be generically useful and recognized as such by glibc and gcc as well.

For backward-compatibility and to avoid surprises, default to building x86_64 even on v2+ capable machines.

Tested by running rpm --eval %_target_cpu and using rpmbuild with various BuildArch values on a x86_64_v3 host.

This should be less controverisal than adding CPU model specific architectures such as znver1 (#1035).

There might be one slight issue: The psABI document defines x86-64-v3 as
x86-64-v2 + AVX + AVX2 + F16C + FMA + LZCNT + MOVBE + BMI + BMI2 + OSXSAVE, but glibc's hwcaps code only looks at
x86-64-v2 + AVX + AVX2 + F16C + FMA + LZCNT + MOVBE (no BMI, BMI2 or OSXSAVE). Not sure whether that's a bug or for some reason intentional. gcc uses the psABI definition, so code built with -march=x86-64-v3 is not compatible with %_libdir/glibc-hwcaps/x86-64-v3/...

CC @Conan-Kudo @dirkmueller @DimStar77 @mlschroe

@dirkmueller
Copy link
Contributor

dirkmueller commented Dec 8, 2022

can you please include in macros.in also a %x86_64 macro that covers this new family of architectures?

The x86_64 SysV psABI defines four levels of x86_64 with certain CPU features
required for each level. Those definitions are meant to be generically useful
and recognized as such by glibc and gcc as well.

For backward-compatibility and to avoid surprises, default to building x86_64
even on v2+ capable machines.
@Vogtinator
Copy link
Contributor Author

can you please include in macros.in also a %x86_64 macro that covers this new family of architectures?

Done, also fixed missing platform macros.

@pmatilai
Copy link
Member

pmatilai commented Dec 8, 2022

FWIW, I've repeatedly said "no" to attempts to add more assembly into rpm, the only direction for assembly code is out. Use the auxiliary vector for determining features, and for things that are not there, I don't think rpm needs to care.

(I'm not entirely convinced rpm needs to care at all on this level for that matter)

@Conan-Kudo
Copy link
Member

I know @dmach was working on libarch to deal with this problem (of centralizing and simplifying arch detection code). Maybe he's interested in reviving this so we can deal with this and #1035. Because I still want to get this problem sorted out. 😩

@Vogtinator
Copy link
Contributor Author

FWIW, I've repeatedly said "no" to attempts to add more assembly into rpm, the only direction for assembly code is out. Use the auxiliary vector for determining features, and for things that are not there, I don't think rpm needs to care.

FWICT, the auxiliary vector for HWCAPS is no longer really used and applications (including glibc, gcc runtime code) have to resort to methods like this instead. GCC's __builtin_cpu_supports does unfortunately not support all features needed to detect these levels properly.

@dmach
Copy link

dmach commented Dec 8, 2022

I know @dmach was working on libarch to deal with this problem (of centralizing and simplifying arch detection code). Maybe he's interested in reviving this so we can deal with this and #1035. Because I still want to get this problem sorted out. weary

@ffesti managed to put together https://github.com/rpm-software-management/libparch
Not sure what's the latest status though.

macros.in Show resolved Hide resolved
@pmatilai
Copy link
Member

pmatilai commented Dec 9, 2022

FWICT, the auxiliary vector for HWCAPS is no longer really used and applications (including glibc, gcc runtime code) have to resort to methods like this instead.

It's one thing for gcc and glibc getting their hands dirty with details and maybe to always get the latest info you need to do it yourself, but we're not that leading edge here.

Got any pointers indicating hwcaps being deprecated? I haven't looked at details but I remember fairly recent activity around HWCAPS, such as https://sourceware.org/pipermail/libc-alpha/2020-May/113757.html.

@Vogtinator
Copy link
Contributor Author

FWICT, the auxiliary vector for HWCAPS is no longer really used and applications (including glibc, gcc runtime code) have to resort to methods like this instead.

It's one thing for gcc and glibc getting their hands dirty with details and maybe to always get the latest info you need to do it yourself, but we're not that leading edge here.

Got any pointers indicating hwcaps being deprecated? I haven't looked at details but I remember fairly recent activity around HWCAPS, such as https://sourceware.org/pipermail/libc-alpha/2020-May/113757.html.

I meant AT_HWCAPs for x86_64, which just stopped being extended with new features at some point. AT_HWCAP has only pre-x86_64 features and AT_HWCAP2 has just HWCAP2_RING3MWAIT and HWCAP2_FSGSBASE . On arm64 and riscv it's much more useful instead.

@pmatilai
Copy link
Member

pmatilai commented Dec 9, 2022

Ack.

In the meanwhile I managed to dig up some additional background on this, eg
https://sourceware.org/pipermail/libc-alpha/2020-July/116135.html, https://lwn.net/Articles/844831/ and https://antlarr.io/2021/03/hackweek-20-glibc-hwcaps-in-opensuse/ (which even routes back here 😆 ).

Indeed these progressive levels (no matter how artificial it may be) is the first glimpse of sanity in the arch space that I've seen in a while. This is something that can reasonably be packaged, and handled by package managers operating on various levels as well.

So. Ack for the concept at least, I'll not let this get stuck the way the znver stuff did, because there's hope here. Thanks for the initiative, and the education. I hadn't followed the hwcap side at all really.

I'd still try to see if this really isn't exported by glibc somehow, eg couldn't we take advantage of the glibc-hwcaps feature somehow (and if that leaves other libc's stuck with one level of x86_64, I'm not going to cry a river).

@dirkmueller
Copy link
Contributor

If the main concern is the addition of assembly, there are options:

  • https://github.com/anrieff/libcpuid which is packaged in many distributions already for handling cpu feature detection via cpuid
  • cough parsing /proc/cpuinfo for level determination

@pmatilai
Copy link
Member

pmatilai commented Dec 9, 2022

Well. Of course it's exported by glibc through the dlopen() search path: we just build + place dso's into the paths where glibc looks for and then dlopen() that rpm_platform.so (or such) from rpmrc.c. At the minimum, the dso merely needs to return an arch string, but there are almost certainly some other interesting possibilities in that direction that would allow getting rid of other crap in rpmrc.c. Plus, perhaps things like dynamic rpmlib() provides and who knows.

Edit: as for libcpuid: that's an x86_64 specific thing, but the "asm crap" issue is not specific to that platform. The main issue is with rpm not having to know this level of detail, and the glibc-hwcaps seems like a nice platform agnostic way to outsource that knowledge: rpm then only needs to know a platform exists and map a name for it.

@pmatilai
Copy link
Member

pmatilai commented Dec 9, 2022

After some further processing:

Seeing now a concrete way out of this arch detection madness, I can accept the current version as-is, just considering it a temporary measure (for some definition of temporary). I don't want to turn this PR into a "lets invent a whole new layer, quick" kind of thing.
Let's see if there's further feedback over the weekend or so, and then I'll just hit merge.

@Vogtinator
Copy link
Contributor Author

About the glibc mismatch:

The missing BMI and BMI2 are included in git master with https://sourceware.org/pipermail/libc-alpha/2022-October/142395.html meanwhile. OSXSAVE was still missing, I sent a patch for that: https://sourceware.org/pipermail/libc-alpha/2022-December/143936.html

Once that is merged, glibc, GCC and RPM have matching definitions. LLVM does for some reason not mention OSXSAVE in its documentation, but in the code it's there, so that matches too.

@Vogtinator
Copy link
Contributor Author

For backward-compatibility and to avoid surprises, default to building x86_64 even on v2+ capable machines.
Tested by running rpm --eval %_target_cpu and using rpmbuild with various BuildArch values on a x86_64_v3 host.

Looks like my fix for the platform macros generation broke that part a bit. While with BuildArch: x86_64_v3 it spits out an x86_64_v3 RPM, it uses the x86_64 optflags still. The only way to end up with x86_64_vX optflags is by using --target x86_64_vX. What's the best way to end up with better semantics?

@pmatilai
Copy link
Member

The missing BMI and BMI2 are included in git master with https://sourceware.org/pipermail/libc-alpha/2022-October/142395.html meanwhile. OSXSAVE was still missing, I sent a patch for that: https://sourceware.org/pipermail/libc-alpha/2022-December/143936.html

Once that is merged, glibc, GCC and RPM have matching definitions.

This is actually a fine example why such voodoo needs to be determined in exactly one place that everybody else then uses. It's also a fine example why glibc is a much better place for it than rpm: those guys clearly know what they're talking about, to me "OSXSAVE" sounds like some Mac thing 😆

@Vogtinator
Copy link
Contributor Author

For backward-compatibility and to avoid surprises, default to building x86_64 even on v2+ capable machines.
Tested by running rpm --eval %_target_cpu and using rpmbuild with various BuildArch values on a x86_64_v3 host.

Looks like my fix for the platform macros generation broke that part a bit. While with BuildArch: x86_64_v3 it spits out an x86_64_v3 RPM, it uses the x86_64 optflags still. The only way to end up with x86_64_vX optflags is by using --target x86_64_vX. What's the best way to end up with better semantics?

The reason for that is that BuildArch only sets %_target_cpu, but not anything else like %_target or %optflags:

rpm/build/parseSpec.c

Lines 1012 to 1025 in e2c504c

for (x = 0; x < spec->BACount; x++) {
/* Skip if not arch is not compatible. */
if (!rpmMachineScore(RPM_MACHTABLE_BUILDARCH, spec->BANames[x]))
continue;
rpmPushMacro(NULL, "_target_cpu", NULL, spec->BANames[x], RMIL_RPMRC);
spec->BASpecs[index] = parseSpec(spec->specFile, spec->flags, spec->buildRoot, 1);
if (spec->BASpecs[index] == NULL) {
spec->BACount = index;
goto errxit;
}
rpmPopMacro(NULL, "_target_cpu");
index++;
}

IMO this can be treated as a separate bug (or is this intentional?). Opinions?

@mlschroe
Copy link
Contributor

See issue #2319

@mlschroe
Copy link
Contributor

(Bottom line being: only use BuildArch for BuildArch: noarch. All other uses are not tested and also not documented.)

@Vogtinator
Copy link
Contributor Author

The missing BMI and BMI2 are included in git master with https://sourceware.org/pipermail/libc-alpha/2022-October/142395.html meanwhile. OSXSAVE was still missing, I sent a patch for that: https://sourceware.org/pipermail/libc-alpha/2022-December/143936.html
Once that is merged, glibc, GCC and RPM have matching definitions.

This is actually a fine example why such voodoo needs to be determined in exactly one place that everybody else then uses.

Agreed. Mostly because the feature detection is platform and language specific and annoying to perform though - not because it's easy to get wrong. The list of features is rather short and clearly defined in a single place, so I'm not concerned about inconsistencies as much.

It's also a fine example why glibc is a much better place for it than rpm: those guys clearly know what they're talking about,

Funnily enough in this case glibc was the only place that got it wrong in a way that actually matters (missing BMI/BMI2)...

ATM glibc is in sync even without my patch for OSXSAVE. It turns out that in some other files the absence of OSXSAVE actually marks features such as AVX as unusable, so checking for those is enough.

@Vogtinator Vogtinator changed the title RFC: Add x86-64 architecture levels (v2-v4) as architectures Add x86-64 architecture levels (v2-v4) as architectures Dec 15, 2022
@Vogtinator
Copy link
Contributor Author

See issue #2319

Perfect match - so indeed a mostly independent topic.

@mlschroe
Copy link
Contributor

What's the state of this pull request? Can this be merged now?
I would need to add support for the new architectures to libsolv as well.

@pmatilai
Copy link
Member

pmatilai commented Jan 9, 2023

Merged manually (cd46c17) to tweak the commit message a bit, the lines were overflowing in 'git log' view because of the indentation it adds and looked ugly.

Thanks for suggesting a way out of this particular arch madness, and the patch!

@Vogtinator
Copy link
Contributor Author

GCC's __builtin_cpu_supports does unfortunately not support all features needed to detect these levels properly.

Looks like since GCC 12.2 it actually can do __builtin_cpu_supports("x86-64-v3"): gcc-mirror/gcc@8ea2925#diff-64afd6a149a7486d332f3d75ac9479f58d0fb674be6920a6e7db0e54cd205e70R38

It's just not documented... Is that an option for RPM? I guess not. It's not implemented by clang.

@marxin
Copy link
Contributor

marxin commented Mar 10, 2023

It's just not documented... Is that an option for RPM? I guess not. It's not implemented by clang.

It's documented here: https://gcc.gnu.org/onlinedocs/gcc/x86-Built-in-Functions.html#index-_005f_005fbuiltin_005fcpu_005fsupports-1 (though we wrongly documented that under __builtin_cpu_is for GCC 12.x).

Well, that's a pity that clang does not support that. It would make the implementation much simpler!

@marxin
Copy link
Contributor

marxin commented Mar 10, 2023

FWICT, the auxiliary vector for HWCAPS is no longer really used and applications (including glibc, gcc runtime code) have to resort to methods like this instead. GCC's __builtin_cpu_supports does unfortunately not support all features needed to detect these levels properly.

What feature do you miss? Note that using __builtin_cpu_supports would be much cleaner code and it's supported also by LLVM.

@marxin
Copy link
Contributor

marxin commented Mar 10, 2023

It's not implemented by clang.

llvm/llvm-project#59961

@Vogtinator
Copy link
Contributor Author

FWICT, the auxiliary vector for HWCAPS is no longer really used and applications (including glibc, gcc runtime code) have to resort to methods like this instead. GCC's __builtin_cpu_supports does unfortunately not support all features needed to detect these levels properly.

What feature do you miss? Note that using __builtin_cpu_supports would be much cleaner code and it's supported also by LLVM.

Of the one mentioned here:

AVX + AVX2 + F16C + FMA + LZCNT + MOVBE + BMI + BMI2 + OSXSAVE

F16C, LZCNT, MOVBE and OSXSAVE aren't exposed through __builtin_cpu_supports, except by using x86-64-v3.

@marxin
Copy link
Contributor

marxin commented Mar 14, 2023

Hmm, I see and you are right. Well, I hope you can use __builtin_cpu_supports("x86-64-v3") in the future once LLVM implements it. It's very unfortunate we've added a useful builti-in, but projects are still forced to parse cpuinfo itself :/

@MaskRay
Copy link

MaskRay commented Aug 25, 2023

Hmm, I see and you are right. Well, I hope you can use __builtin_cpu_supports("x86-64-v3") in the future once LLVM implements it. It's very unfortunate we've added a useful builti-in, but projects are still forced to parse cpuinfo itself :/

I created https://reviews.llvm.org/D158811 to implement __builtin_cpu_supports("x86-64") (and v2 v3 v4). It will hopefully be part of Clang 18.

edit: landed. will be part of Clang 18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants