Fix #[inline(always)] on closures with target feature 1.1 #111836

calebzulawski · 2023-05-22T05:03:00Z

Fixes #108655. I think this is the most obvious solution that isn't overly complicated. The comment includes more justification, but I think this is likely better than demoting the #[inline(always)] to #[inline], since existing code is unaffected.

rustbot · 2023-05-22T05:03:07Z

r? @davidtwco

(rustbot has picked a reviewer for you, use r? to override)

calebzulawski · 2023-05-22T05:03:18Z

r? @workingjubilee

wesleywiser · 2023-07-06T14:54:38Z

@calebzulawski we can re-roll reviewer if you'd like 🙂

workingjubilee · 2023-07-06T18:45:31Z

I've gotten rid of my other hundreds of notifications, I can work on this finally.

workingjubilee

I know I basically suggested it but now that I have a good look at the code and am a few months of thinking about target features wiser, I'm concerned about the behavior of this approach for this particular, admittedly somewhat contrived example:

#![feature(target_feature_11)]

use core::arch::x86_64::*;

#[target_feature(enable = "avx")]
pub unsafe fn escape(a: f64, b: f64, c: f64, d: f64) -> impl Fn() -> __m256d {
    #[inline(always)]
    move || _mm256_set_pd(a, b, c, d)
}

#[target_feature(enable = "avx")]
pub unsafe fn way_out() -> fn(__m256d) -> i32 {
    #[inline(always)]
    move |a| _mm256_movemask_pd(a)
}

pub fn unsafe_haven(a: f64, b: f64, c: f64, d: f64) -> i32 {
    // Problem: Even though this code declared
    // that it met escape()'s and way_out()'s unsafe preconditions,
    // THIS function doesn't have the target features!
    let escapee = unsafe { escape(a, b, c, d) };
    let escaping_avx_type = escapee();
    let opening = unsafe { way_out() };
    opening(escaping_avx_type)
}

workingjubilee · 2023-07-07T08:27:50Z

tests/ui/rfcs/rfc-2396-target_feature-11/issue-108655-inline-always-closure.rs

+        #[inline(always)]
+        move || {}


Thinking about this forced me to check if you can annotate closures with target_feature(enable). (You cannot, fortunately.)

workingjubilee · 2023-07-07T08:50:56Z

compiler/rustc_codegen_ssa/src/codegen_attrs.rs

+    // would result in this closure being compiled without the inherited target features, but this
+    // is probably a poor usage of `#[inline(always)]` and easily avoided by not using the attribute.
+    if tcx.features().target_feature_11
+        && tcx.is_closure(did.to_def_id())


...apparently is_closure will return true if this is a generator, also. I frankly have no idea how that should work, but dropping the features should remain safe in that case, at least...

workingjubilee · 2023-07-07T09:39:02Z

compiler/rustc_codegen_ssa/src/codegen_attrs.rs

+    // its parent function, which effectively inherits the features anyway. Boxing this closure
+    // would result in this closure being compiled without the inherited target features, but this
+    // is probably a poor usage of `#[inline(always)]` and easily avoided by not using the attribute.


Boxing seems like a waste, yes, but now that I am thinking about it, this seems like it could result in confusing behavior in the "escaping closure" case, when that would result, instead of the IIFE? Does that even make sense?

Inlining with closures is unfortunately always confusing. Box<dyn FnOnce()>, for example, implements FnOnce itself:

rust/library/alloc/src/boxed.rs

Lines 2003 to 2009 in c4083fa

impl<Args: Tuple, F: FnOnce<Args> + ?Sized, A: Allocator> FnOnce<Args> for Box<F, A> {

type Output = <F as FnOnce<Args>>::Output;

extern "rust-call" fn call_once(self, args: Args) -> Self::Output {

<F as FnOnce<Args>>::call_once(*self, args)

}

}

This call_once doesn't have any inline attribute at all! Therefore, the boxed closure's call_once inlines into this call_once, and then it's up in the air after that.

calebzulawski · 2023-07-16T22:41:03Z

unsafe_haven in this example is unsound, since you can call it without ever checking for avx, but I'll assume you have e.g. a is_x86_feature_detected!("avx") in there making it sound for the sake of argument.

I don't think there is actually anything wrong here. The #[inline(always)] apply to the closures (which are of course functions) and don't propagate to any of the contents, even if the function is trivial. So even with both closures inlined, the avx intrinsics are still not #[inline(always)], and without target features enabled in unsafe_haven, you simply run into #53069. You've checked that it's safe to call the intrinsics, they just can't inline into the larger function since it's missing the target features.

workingjubilee · 2023-07-16T23:18:23Z

Yes, I'm handwaving feature detection for this example. Technically it's not unsound until someone actually calls it. :^)

The "ultimate" question seems to be if this is truly preferable over demoting the inline hint from inline(always) to inline? In this case we're erasing target features, which sounds like it is more likely to compromise soundness, miscompile, or have unexpected effects for the programmer, than simply downgrading the inlining, which allows the codegen backend to examine the situation and choose to reject inlining on a case-by-case basis, but preserves the feature annotations that the backend would need to use in order to make decisions about whether the inlining would be correct.

Currently, the target_feature semantics are not hints, they're directives, and that's part of why enable is unsafe. However, the inline semantics are hints, and can be disregarded. However-however, the closure case is something we are, honestly, ad hoc making up as we go along.

Relevant issues and commits:

calebzulawski · 2023-07-16T23:33:27Z

I'm not sure if inline(always) is really a hint or not. As far as I can tell, LLVM's alwaysinline errors if the function fails to inline (I guess this only applies to direct function calls).

Regardless, I think this behavior is probably best for now because I'm sure there is existing code with inline(always) closures inside target_feature functions and I'd rather avoid breaking that existing code. On top of that, I think preferring the inline over the target features is at least justifiable, if not the best option (otherwise, it's impossible to use inline(always), and I'm sure someone will come across that at some point and be frustrated)

workingjubilee · 2023-07-17T00:04:54Z

I'm not sure if inline(always) is really a hint or not. As far as I can tell, LLVM's alwaysinline errors if the function fails to inline (I guess this only applies to direct function calls).

It is because we say it is, as I understand it. LLVM is allowed to choose to not error on that case and simply silently ignore it, and as I understand it has in the past, and as you observed, it only applies to direct calls.

I guess the matter of indirection is most of what's really becoming pertinent, now that I think about it:

If we do this, then "featureful inlining" stops before the closure, but if we don't, it continues into the closure, but the closure itself may not get inlined. So if there is some reason that the closure's exterior gets "outlined" anyways, like the Box<dyn FnOnce(A) -> R> case, we can't rely on the features being present on the caller in order to justify erasing the closure. This seems particularly punishing in the case of ABI mismatches where if the closure preserved the features then inline-into-closure would be acceptable. As we see from llvm/llvm-project@7c3cf4c the actual ABI determination requires accounting for both caller and callee, and thus so does the inlining strategy, thus I fear we can't reason along the lines of "well, the caller of this closure will have all the features anyways so inlining will save us".

I might be wrong, obviously. However, one thing I am confident about is that we should not have to guess: This needs, at minimum, codegen tests in order to validate the LLVMIR is what we expect for both the direct call and indirect call cases, and we're going to need enough nesting that we can see all the consequences. This will help clarify what LLVM actually does, illuminate which approaches might actually lead to performance regressions, and catch whether LLVM decides to change its mind.

workingjubilee · 2023-07-17T00:06:40Z

I recommend making a tests/codegen/target-feature directory.

There is no way this is going to be the last of these.

calebzulawski · 2023-07-17T00:29:17Z

To take a step back for a moment, extending #[target_feature] to the closure is implicit, but adding #[inline(always)] is explicit. When the user places the inline attribute on the closure inside a target feature function, it really should be emitted because the user asked for it. The user didn't specifically request the target features be added to the closure, so I think the inlining should win out.

That said, like the added comment indicates, using inline(always) results in inheriting the target features as well, in all but the pathological case. I think it's not worth worrying ourselves about the closure escaping when it's almost certainly a mistake/bug/unnecessary using #[inline] at all on a boxed closure. We could even get really fancy and emit a lint.

I'm basically saying it's not worth overthinking it. I'm confident this change won't do anything unsound, it might not have completely optimal codegen in unusual edge cases, but I think it's easy to work around. At worst, this behavior could be adjusted in a follow up PR, since it's just codegen and not language semantics :)

workingjubilee · 2023-07-17T00:30:56Z

I agree (re: "At worst, this behavior could be adjusted in a follow up PR, since it's just codegen"), I just still want to see codegen tests so that if LLVM changes their inlining rules again for target features we can catch it. :^)

workingjubilee · 2023-07-17T00:37:05Z

@rustbot author

…features

calebzulawski · 2023-07-17T02:07:27Z

All tests good 🙂

workingjubilee · 2023-07-23T00:15:52Z

Let's give this a whirl. @bors r+ rollup=never

bors · 2023-07-23T00:15:54Z

📌 Commit cdb9de7 has been approved by workingjubilee

It is now in the queue for this repository.

bors · 2023-07-23T00:16:06Z

⌛ Testing commit cdb9de7 with merge 1c44af9...

bors · 2023-07-23T02:03:16Z

☀️ Test successful - checks-actions
Approved by: workingjubilee
Pushing 1c44af9 to master...

rust-timer · 2023-07-23T03:50:53Z

Finished benchmarking commit (1c44af9): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

This benchmark run did not return any relevant results for this metric.

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-3.4%	[-3.4%, -3.4%]	1
All ❌✅ (primary)	-	-	0

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 651.172s -> 651.296s (0.02%)

Fix #[inline(always)] on closures with target feature 1.1

ddb16e2

rustbot assigned davidtwco May 22, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 22, 2023

rustbot assigned workingjubilee and unassigned davidtwco May 22, 2023

workingjubilee reviewed Jul 7, 2023

View reviewed changes

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 17, 2023

Add codegen test ensuring always-inline closures don't bypass target …

cdb9de7

…features

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jul 23, 2023

bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 23, 2023

bors merged commit 1c44af9 into rust-lang:master Jul 23, 2023
12 checks passed

rustbot added this to the 1.73.0 milestone Jul 23, 2023

calebzulawski mentioned this pull request Sep 24, 2023

Stabilize target_feature_11 #116114

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix #[inline(always)] on closures with target feature 1.1 #111836

Fix #[inline(always)] on closures with target feature 1.1 #111836

calebzulawski commented May 22, 2023 •

edited

Loading

rustbot commented May 22, 2023

calebzulawski commented May 22, 2023

wesleywiser commented Jul 6, 2023

workingjubilee commented Jul 6, 2023

workingjubilee left a comment

workingjubilee Jul 7, 2023

workingjubilee Jul 7, 2023

workingjubilee Jul 7, 2023

calebzulawski Jul 16, 2023

calebzulawski commented Jul 16, 2023

workingjubilee commented Jul 16, 2023

calebzulawski commented Jul 16, 2023

workingjubilee commented Jul 17, 2023

workingjubilee commented Jul 17, 2023

calebzulawski commented Jul 17, 2023

workingjubilee commented Jul 17, 2023 •

edited

Loading

workingjubilee commented Jul 17, 2023

calebzulawski commented Jul 17, 2023

workingjubilee commented Jul 23, 2023

bors commented Jul 23, 2023

bors commented Jul 23, 2023

bors commented Jul 23, 2023

rust-timer commented Jul 23, 2023

	impl<Args: Tuple, F: FnOnce<Args> + ?Sized, A: Allocator> FnOnce<Args> for Box<F, A> {
	type Output = <F as FnOnce<Args>>::Output;

	extern "rust-call" fn call_once(self, args: Args) -> Self::Output {
	<F as FnOnce<Args>>::call_once(*self, args)
	}
	}

Fix #[inline(always)] on closures with target feature 1.1 #111836

Fix #[inline(always)] on closures with target feature 1.1 #111836

Conversation

calebzulawski commented May 22, 2023 • edited Loading

rustbot commented May 22, 2023

calebzulawski commented May 22, 2023

wesleywiser commented Jul 6, 2023

workingjubilee commented Jul 6, 2023

workingjubilee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

calebzulawski commented Jul 16, 2023

workingjubilee commented Jul 16, 2023

calebzulawski commented Jul 16, 2023

workingjubilee commented Jul 17, 2023

workingjubilee commented Jul 17, 2023

calebzulawski commented Jul 17, 2023

workingjubilee commented Jul 17, 2023 • edited Loading

workingjubilee commented Jul 17, 2023

calebzulawski commented Jul 17, 2023

workingjubilee commented Jul 23, 2023

bors commented Jul 23, 2023

bors commented Jul 23, 2023

bors commented Jul 23, 2023

rust-timer commented Jul 23, 2023

Overall result: no relevant changes - no action needed

calebzulawski commented May 22, 2023 •

edited

Loading

workingjubilee commented Jul 17, 2023 •

edited

Loading