Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Target feature runtime #2725

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

gnzlbg
Copy link
Contributor

@gnzlbg gnzlbg commented Jul 15, 2019

Rendered


This RFC proposes exporting the target-feature detection macros (e.g. is_x86_feature_detected!) from libcore, enabling all Rust libraries, including #![no_std] libraries, to use them. It works out the details of the implementation, and propose to make parts of it public to allow #![no_std] binaries and libstd to provide their own target-feature-detection run-times via these APIs.

This RFC can be stabilized in stages. We can stabilize the exporting of the target-feature detection macros from libcore without stabilizing anything else, since the standard library and #![no_std] libraries can both use unstable Rust features. Once we start working on the stabilization of #![no_std] binaries, we can revisit the stabilization of these APIs.

SSE3. In Rust, we call `x86_64` the target architecture "family", and extensions
like SSE2 or SSE3 "target-features".

Many Rust applications compiled for `x86_64-unknonw-linux-gnu` do want to use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Many Rust applications compiled for `x86_64-unknonw-linux-gnu` do want to use
Many Rust applications compiled for `x86_64-unknown-linux-gnu` do want to use

* Should the `libstd` run-time be overridable? For example, by only providing it
if no other crate in the dependency graph provides a runtime ? This would be a
forward-compatible extension, but no use case considered requires it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Does core::detect::is_target_feature_detected cache the result of core::detect::TargetFeatureRuntime::is_feature_detected or is the later responsible for caching itself?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 18, 2019 via email

@therealprof
Copy link

Sounds good to me.

One thing I couldn't quite find in the RFC is how this actually bridges between runtime and compile-time detection, i.e. how the macro in libcore will help to move crates over to no_std without introducing runtime overhead for details known at compile time as mentioned in the introduction. Also I'm confused as to how this will work in practise since using the macro in a condition might still cause code to be emitted, causing the generation of instructions which are denied by later stages of the compilation or by the linker due to incompatibility with the target.

And regarding the feature detection, is everyone supposed to roll their own or is there going to be a good default implementation like for the panic handling and liballoc?

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 23, 2019

i.e. how the macro in libcore will help to move crates over to no_std without introducing runtime overhead for details known at compile time

The macros already do that, e.g., the libstd docs guarantees that the macros resolve at compile-time if the feature is known to be enabled at compile-time.


using the macro in a condition might still cause code to be emitted,

This is already the case even if you use compile-time feature detection via if cfg!(target_feature = "avx") { ... }, where the macro there expands either to true or false. It is up to the users to choose an optimization level / backend that removes dead code.

If you want to make sure that no run-time detection actually ever happens, you could implement your own run-time that just calls panics, e.g., by calling unreachable!(), or if you know that no run-time detection macro will be called in your program you could go as far as using core::hint::unreachable_unchecked if you are feeling brave.


are denied by later stages of the compilation or by the linker due to incompatibility with the target.

I'm not sure if this will answer your question, but the macros are only available for targets that support them. If a target does not support them, there is no macro for users to use, so what you describe cannot currently happen.

That is, if you want to use is_x86_feature_detected!, you need to guard the use with a #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] or the code won't compile on any other target because that macro does not exist.

And regarding the feature detection, is everyone supposed to roll their own or is there going to be a good default implementation like for the panic handling and liballoc?

Currently, the std::detect implementation is available in crates.io as the std_detect crate, and is quiet configurable. If that does not solve your use case, you can roll your own, and others might find it useful.

@therealprof
Copy link

This is already the case even if you use compile-time feature detection via if cfg!(target_feature = "avx") { ... }, where the macro there expands either to true or false. It is up to the users to choose an optimization level / backend that removes dead code.

That seems like a bad idea. People not being able to use dev builds because optimized code paths using this feature will not be linkable due to code not being removed.

We do see a lot of such problems in no_std land today, e.g. CAS instructions finding their way into code compiled for ARM Thumb v6-M, not even to mention that unused code emitted in dev builds which is not collected is always troublesome for embedded due to limited flash size...

That is, if you want to use is_x86_feature_detected!, you need to guard the use with a #[cfg(any(target_arch = "x86", target_arch = "x86_64"))] or the code won't compile on any other target because that macro does not exist.

It would be way more convenient if that was implicit (or at least there's a separate version with implicit feature gates). Requiring to manually pair those seems like a perfect way to cause seemingly random build errors and wrong code paths being used. Done properly this would also eliminate my concern above...

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 23, 2019

That seems like a bad idea. People not being able to use dev builds because optimized code paths using this feature will not be linkable due to code not being removed. [...]

The runtime target-feature detection macros currently only allow detecting features for which this cannot happen.

It would be way more convenient if that was implicit (or at least there's a separate version with implicit feature gates). Requiring to manually pair those seems like a perfect way to cause seemingly random build errors and wrong code paths being used. Done properly this would also eliminate my concern above...

This RFC does not require doing that.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 25, 2019

@therealprof this RFC only proposes allowing users to configure the run-time component of the target-feature detection system. That system has already been RFC'ed, and is implemented on both stable and nightly for half a dozen targets (including embedded ones) and dozens of target features. This RFC does not change anything about how any of that currently works. That is out-of-scope.

If you have questions about that system, the docs are usually a good place to start, but you can also go through the RFCs (e.g. the std::arch RFC or the target-feature RFC).

@therealprof
Copy link

therealprof commented Jul 25, 2019

It's okay, I was reading a bit more into the preamble than this RFC is about. It's still a useful proposal, just not as useful for me as I'd had hoped from my initial reading. 😉

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Jul 25, 2019

It's still a useful proposal, just not as useful for me as I'd had hoped from my initial reading.

It might be out-of-scope/offtopic for this RFC, but would it be possible for you to open an issue in this repo about which problem you need to solve ? We can try to figure out how to do that there.

requires executing privileged CPU instructions that are illegal to execute
from user-space code. User-space applications query the available
target-feature set from the operating system. Often, they might also want to
cache the result to avoid repeating system calls.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lu-zero also mentioned another use-case here: they want to control feature detection at run-time via, e.g., environment variables.

I can imagine that one use case of this would be, for example, to disable AVX-512 usage via such an environment variable, but maybe they can comment here on what they have in mind.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, the idea is to disable features the cpu supports (or seems to support) at runtime. It is useful for debugging or to work around hardware bugs.

@Centril Centril added T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. labels Sep 9, 2019
@whitequark
Copy link
Member

Overall I think this RFC would be very helpful for embedded targets. As written it does not impose any cost on the targets that they do not opt in.

Should the API use a TargetFeature enum or be stringly-typed like the macros and use string literals?

If the target feature runtime API would be stringly typed and pattern match on string literals, then any implementation of it would include these literals in .rodata. To me that seems very undesirable, given that they are used as opaque tokens. Further, there is no compile time checking of the strings representing a valid target feature, such that a typo can result in a very hard to find bug.

We could provide a "cache" in libcore, and an API for users or only for the standard library, to initialize this cache externally, e.g., during the standard library initialization routine.

I think libcore could provide a cache as a building block rather than something fixed with mutable state. This cache could look like a struct generic over target feature runtimes, like impl<T: core::detect::TargetFeatureRuntime> core::detect::TargetFeatureRuntime for TargetFeatureCache<T>. It would use atomics that are available on most targets, and be absent if the target lacks any. There are several benefits of having this cache within libcore:

  • It could exhaustively match over the core::detect::TargetFeature enum to robustly track its evolution.
  • It would be able to allocate the smallest amount of cache storage (an array of atomics) that is needed to represent the flags on any specific target.
  • It would be able to use appropriate per-target memory orderings.

I think almost every case where a cache is used should work just fine with atomics. By making this cache a building block, it is easy to convert it to a thread-local one: add a wrapper that allocates the cache as a thread local, then delegate to it. Similarly, it can be reset by reinitializing. In the cases where features should not be cached at all (for example, in a system that frequently migrates threads between cores with different feature sets--sometimes with irritating consequences), it can be omitted.

The one case not covered by my proposal is targets without atomics. I think targets without atomics that are currently supported lack any runtime features anyway (since those tend to be the absolute smallest cores like Cortex-M0), so there's no real downside. If it turns out that a cache is useful for those targets, then we could later expose it on such targets under a different name and without atomics, with the caller responsible for synchronization.

How does it fit with the Roadmap? Does it fit with the Roadmap at all? Would it fit with any future Roadmap?

This RFC would have fit into the "embedded productivity" part of the 2019 roadmap. I'm not sure about the 2020 roadmap, but so far it sounds like "finish what we started in 2019" would be a large aspect of it, so it probably fits there, too.

@jethrogb
Copy link
Contributor

I'm against adding yet another "global functionality" attribute like target_feature_detection_runtime. This kind of thing should be implemented at the language level, for example with #2492.

@KodrAus KodrAus added the Libs-Tracked Libs issues that are tracked on the team's project board. label Jul 29, 2020
@joshtriplett
Copy link
Member

@rust-lang/lang discussed this today. Consensus:

  • We feel like most of this isn't lang; the only lang bit is the new pseudo-global runtime mechanism. The rest is libs.
  • We do want a generalized mechanism for trait-based pseudo-globals across the whole crate graph, but we don't want to make the perfect the enemy of the good here.
  • We don't oppose adding this pseudo-global runtime mechanism.
  • We'd like to delegate the remainder of this to @rust-lang/libs.
  • We'd like to be included on the tracking issue and the discussion for future stabilization. If by that time we have a new trait-based pseudo-global mechanism, we'd want to see this migrated to that before stabilization.

@lu-zero
Copy link

lu-zero commented Dec 7, 2022

What should we do to progress on this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Libs-Tracked Libs issues that are tracked on the team's project board. T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants