Target feature runtime #2725

gnzlbg · 2019-07-15T12:28:51Z

This RFC proposes exporting the target-feature detection macros (e.g. is_x86_feature_detected!) from libcore, enabling all Rust libraries, including #![no_std] libraries, to use them. It works out the details of the implementation, and propose to make parts of it public to allow #![no_std] binaries and libstd to provide their own target-feature-detection run-times via these APIs.

This RFC can be stabilized in stages. We can stabilize the exporting of the target-feature detection macros from libcore without stabilizing anything else, since the standard library and #![no_std] libraries can both use unstable Rust features. Once we start working on the stabilization of #![no_std] binaries, we can revisit the stabilization of these APIs.

bjorn3 · 2019-07-18T10:23:59Z

text/0000-target-feature-runtime.md

+SSE3. In Rust, we call `x86_64` the target architecture "family", and extensions
+like SSE2 or SSE3 "target-features".
+
+Many Rust applications compiled for `x86_64-unknonw-linux-gnu` do want to use


Suggested change

Many Rust applications compiled for `x86_64-unknonw-linux-gnu` do want to use

Many Rust applications compiled for `x86_64-unknown-linux-gnu` do want to use

bjorn3 · 2019-07-18T10:29:12Z

text/0000-target-feature-runtime.md

+* Should the `libstd` run-time be overridable? For example, by only providing it
+  if no other crate in the dependency graph provides a runtime ? This would be a
+  forward-compatible extension, but no use case considered requires it.
+


Does core::detect::is_target_feature_detected cache the result of core::detect::TargetFeatureRuntime::is_feature_detected or is the later responsible for caching itself?

gnzlbg · 2019-07-18T14:50:31Z

The later is responsible for caching. This allows the runtime to control everything. Caching often requires making decisions about how to cache, eg using a global relaxed atomic bitset, a mutex, thread locals, etc. There is no one size fits all for no_std binaries.

therealprof · 2019-07-23T09:49:04Z

Sounds good to me.

One thing I couldn't quite find in the RFC is how this actually bridges between runtime and compile-time detection, i.e. how the macro in libcore will help to move crates over to no_std without introducing runtime overhead for details known at compile time as mentioned in the introduction. Also I'm confused as to how this will work in practise since using the macro in a condition might still cause code to be emitted, causing the generation of instructions which are denied by later stages of the compilation or by the linker due to incompatibility with the target.

And regarding the feature detection, is everyone supposed to roll their own or is there going to be a good default implementation like for the panic handling and liballoc?

gnzlbg · 2019-07-23T11:57:08Z

i.e. how the macro in libcore will help to move crates over to no_std without introducing runtime overhead for details known at compile time

The macros already do that, e.g., the libstd docs guarantees that the macros resolve at compile-time if the feature is known to be enabled at compile-time.

using the macro in a condition might still cause code to be emitted,

This is already the case even if you use compile-time feature detection via if cfg!(target_feature = "avx") { ... }, where the macro there expands either to true or false. It is up to the users to choose an optimization level / backend that removes dead code.

If you want to make sure that no run-time detection actually ever happens, you could implement your own run-time that just calls panics, e.g., by calling unreachable!(), or if you know that no run-time detection macro will be called in your program you could go as far as using core::hint::unreachable_unchecked if you are feeling brave.

are denied by later stages of the compilation or by the linker due to incompatibility with the target.

I'm not sure if this will answer your question, but the macros are only available for targets that support them. If a target does not support them, there is no macro for users to use, so what you describe cannot currently happen.

That is, if you want to use is_x86_feature_detected!, you need to guard the use with a #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] or the code won't compile on any other target because that macro does not exist.

And regarding the feature detection, is everyone supposed to roll their own or is there going to be a good default implementation like for the panic handling and liballoc?

Currently, the std::detect implementation is available in crates.io as the std_detect crate, and is quiet configurable. If that does not solve your use case, you can roll your own, and others might find it useful.

therealprof · 2019-07-23T12:16:19Z

This is already the case even if you use compile-time feature detection via if cfg!(target_feature = "avx") { ... }, where the macro there expands either to true or false. It is up to the users to choose an optimization level / backend that removes dead code.

That seems like a bad idea. People not being able to use dev builds because optimized code paths using this feature will not be linkable due to code not being removed.

We do see a lot of such problems in no_std land today, e.g. CAS instructions finding their way into code compiled for ARM Thumb v6-M, not even to mention that unused code emitted in dev builds which is not collected is always troublesome for embedded due to limited flash size...

That is, if you want to use is_x86_feature_detected!, you need to guard the use with a #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] or the code won't compile on any other target because that macro does not exist.

It would be way more convenient if that was implicit (or at least there's a separate version with implicit feature gates). Requiring to manually pair those seems like a perfect way to cause seemingly random build errors and wrong code paths being used. Done properly this would also eliminate my concern above...

gnzlbg · 2019-07-23T13:40:00Z

That seems like a bad idea. People not being able to use dev builds because optimized code paths using this feature will not be linkable due to code not being removed. [...]

The runtime target-feature detection macros currently only allow detecting features for which this cannot happen.

It would be way more convenient if that was implicit (or at least there's a separate version with implicit feature gates). Requiring to manually pair those seems like a perfect way to cause seemingly random build errors and wrong code paths being used. Done properly this would also eliminate my concern above...

This RFC does not require doing that.

gnzlbg · 2019-07-25T07:38:08Z

@therealprof this RFC only proposes allowing users to configure the run-time component of the target-feature detection system. That system has already been RFC'ed, and is implemented on both stable and nightly for half a dozen targets (including embedded ones) and dozens of target features. This RFC does not change anything about how any of that currently works. That is out-of-scope.

If you have questions about that system, the docs are usually a good place to start, but you can also go through the RFCs (e.g. the std::arch RFC or the target-feature RFC).

therealprof · 2019-07-25T07:51:18Z

It's okay, I was reading a bit more into the preamble than this RFC is about. It's still a useful proposal, just not as useful for me as I'd had hoped from my initial reading. 😉

gnzlbg · 2019-07-25T08:23:53Z

It's still a useful proposal, just not as useful for me as I'd had hoped from my initial reading.

It might be out-of-scope/offtopic for this RFC, but would it be possible for you to open an issue in this repo about which problem you need to solve ? We can try to figure out how to do that there.

gnzlbg · 2019-09-09T10:27:37Z

text/0000-target-feature-runtime.md

+  requires executing privileged CPU instructions that are illegal to execute
+  from user-space code. User-space applications query the available
+  target-feature set from the operating system. Often, they might also want to
+  cache the result to avoid repeating system calls.


@lu-zero also mentioned another use-case here: they want to control feature detection at run-time via, e.g., environment variables.

I can imagine that one use case of this would be, for example, to disable AVX-512 usage via such an environment variable, but maybe they can comment here on what they have in mind.

Exactly, the idea is to disable features the cpu supports (or seems to support) at runtime. It is useful for debugging or to work around hardware bugs.

whitequark · 2019-11-14T14:59:55Z

Overall I think this RFC would be very helpful for embedded targets. As written it does not impose any cost on the targets that they do not opt in.

Should the API use a TargetFeature enum or be stringly-typed like the macros and use string literals?

If the target feature runtime API would be stringly typed and pattern match on string literals, then any implementation of it would include these literals in .rodata. To me that seems very undesirable, given that they are used as opaque tokens. Further, there is no compile time checking of the strings representing a valid target feature, such that a typo can result in a very hard to find bug.

We could provide a "cache" in libcore, and an API for users or only for the standard library, to initialize this cache externally, e.g., during the standard library initialization routine.

I think libcore could provide a cache as a building block rather than something fixed with mutable state. This cache could look like a struct generic over target feature runtimes, like impl<T: core::detect::TargetFeatureRuntime> core::detect::TargetFeatureRuntime for TargetFeatureCache<T>. It would use atomics that are available on most targets, and be absent if the target lacks any. There are several benefits of having this cache within libcore:

It could exhaustively match over the core::detect::TargetFeature enum to robustly track its evolution.
It would be able to allocate the smallest amount of cache storage (an array of atomics) that is needed to represent the flags on any specific target.
It would be able to use appropriate per-target memory orderings.

I think almost every case where a cache is used should work just fine with atomics. By making this cache a building block, it is easy to convert it to a thread-local one: add a wrapper that allocates the cache as a thread local, then delegate to it. Similarly, it can be reset by reinitializing. In the cases where features should not be cached at all (for example, in a system that frequently migrates threads between cores with different feature sets--sometimes with irritating consequences), it can be omitted.

The one case not covered by my proposal is targets without atomics. I think targets without atomics that are currently supported lack any runtime features anyway (since those tend to be the absolute smallest cores like Cortex-M0), so there's no real downside. If it turns out that a cache is useful for those targets, then we could later expose it on such targets under a different name and without atomics, with the caller responsible for synchronization.

How does it fit with the Roadmap? Does it fit with the Roadmap at all? Would it fit with any future Roadmap?

This RFC would have fit into the "embedded productivity" part of the 2019 roadmap. I'm not sure about the 2020 roadmap, but so far it sounds like "finish what we started in 2019" would be a large aspect of it, so it probably fits there, too.

jethrogb · 2020-07-14T07:04:20Z

I'm against adding yet another "global functionality" attribute like target_feature_detection_runtime. This kind of thing should be implemented at the language level, for example with #2492.

joshtriplett · 2020-09-30T17:18:04Z

@rust-lang/lang discussed this today. Consensus:

We feel like most of this isn't lang; the only lang bit is the new pseudo-global runtime mechanism. The rest is libs.
We do want a generalized mechanism for trait-based pseudo-globals across the whole crate graph, but we don't want to make the perfect the enemy of the good here.
We don't oppose adding this pseudo-global runtime mechanism.
We'd like to delegate the remainder of this to @rust-lang/libs.
We'd like to be included on the tracking issue and the discussion for future stabilization. If by that time we have a new trait-based pseudo-global mechanism, we'd want to see this migrated to that before stabilization.

lu-zero · 2022-12-07T18:09:48Z

What should we do to progress on this issue?

gnzlbg added 8 commits July 12, 2019 17:34

Initial commit: target-feature-runtime RFC

292b022

Add example, use non-exhaustive enum, constraints->use cases

59463e1

Improve summary

a354f98

Add introduction; do not allow overriding libstd runtime

ed24e71

Reword intro

7cdaff8

Note that not everything must be stabilized simultaneously

4aced3b

Reword summary

8396166

Reword text

7105ae6

gnzlbg mentioned this pull request Jul 16, 2019

target-feature detection run-time rust-embedded/wg#366

Open

bjorn3 reviewed Jul 18, 2019

View reviewed changes

newpavlov mentioned this pull request Aug 29, 2019

rand_chacha and reducing dependencies rust-random/rand#872

Closed

gnzlbg mentioned this pull request Sep 6, 2019

Add an env override to std_detect rust-lang/stdarch#804

Closed

3 tasks

gnzlbg commented Sep 9, 2019

View reviewed changes

Centril added T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Sep 9, 2019

newpavlov mentioned this pull request Jan 5, 2020

Enable sha1 and sha2 AArch64 extensions from asm-hashes RustCrypto/hashes#97

Merged

newpavlov mentioned this pull request Feb 16, 2020

Add Cargo feature for CPU randomness rust-random/getrandom#133

Merged

KodrAus added the Libs-Tracked Libs issues that are tracked on the team's project board. label Jul 29, 2020

newpavlov mentioned this pull request Aug 6, 2020

cpuid not supported in sgx environment RustCrypto/hashes#183

Closed

newpavlov mentioned this pull request Nov 18, 2020

Unify the aes, aesni, aes-ctr, and aes-soft crates RustCrypto/block-ciphers#200

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Target feature runtime #2725

Target feature runtime #2725

gnzlbg commented Jul 15, 2019 •

edited

Loading

bjorn3 Jul 18, 2019

bjorn3 Jul 18, 2019

gnzlbg commented Jul 18, 2019 via email

therealprof commented Jul 23, 2019

gnzlbg commented Jul 23, 2019

therealprof commented Jul 23, 2019

gnzlbg commented Jul 23, 2019 •

edited

Loading

gnzlbg commented Jul 25, 2019 •

edited

Loading

therealprof commented Jul 25, 2019 •

edited

Loading

gnzlbg commented Jul 25, 2019

gnzlbg Sep 9, 2019

lu-zero Sep 9, 2019

whitequark commented Nov 14, 2019

jethrogb commented Jul 14, 2020

joshtriplett commented Sep 30, 2020

lu-zero commented Dec 7, 2022

	Many Rust applications compiled for `x86_64-unknonw-linux-gnu` do want to use
	Many Rust applications compiled for `x86_64-unknown-linux-gnu` do want to use

Target feature runtime #2725

Are you sure you want to change the base?

Target feature runtime #2725

Conversation

gnzlbg commented Jul 15, 2019 • edited Loading

bjorn3 Jul 18, 2019

Choose a reason for hiding this comment

bjorn3 Jul 18, 2019

Choose a reason for hiding this comment

gnzlbg commented Jul 18, 2019 via email

therealprof commented Jul 23, 2019

gnzlbg commented Jul 23, 2019

therealprof commented Jul 23, 2019

gnzlbg commented Jul 23, 2019 • edited Loading

gnzlbg commented Jul 25, 2019 • edited Loading

therealprof commented Jul 25, 2019 • edited Loading

gnzlbg commented Jul 25, 2019

gnzlbg Sep 9, 2019

Choose a reason for hiding this comment

lu-zero Sep 9, 2019

Choose a reason for hiding this comment

whitequark commented Nov 14, 2019

jethrogb commented Jul 14, 2020

joshtriplett commented Sep 30, 2020

lu-zero commented Dec 7, 2022

gnzlbg commented Jul 15, 2019 •

edited

Loading

gnzlbg commented Jul 23, 2019 •

edited

Loading

gnzlbg commented Jul 25, 2019 •

edited

Loading

therealprof commented Jul 25, 2019 •

edited

Loading