Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#[inline] on generic functions #102539

Open
nikic opened this issue Oct 1, 2022 · 5 comments
Open

#[inline] on generic functions #102539

nikic opened this issue Oct 1, 2022 · 5 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such

Comments

@nikic
Copy link
Contributor

nikic commented Oct 1, 2022

Common Rust wisdom says that #[inline] does not need to be placed on small generic functions. This is because generic functions will get monomorphized in each crate anyway, so the attribute is not necessary for cross-crate inlining.

However, we also know that in practice placing #[inline] on generic functions does help optimization, even for tiny functions where the additional inlinehint this gives to LLVM really shouldn't be relevant. What gives? I believe there are two complications:

The main problem is that #[inline] forces an instantiation of the function in each CGU, while generic functions are normally only instantiated once per crate. This means that a definition of generic functions is available to crate-local LTO, but not to the pre-link optimization pipeline. Especially for trivial generic functions, this may significantly hamper pre-link optimization, and post-link optimization may not be able to recover from this.

The second complication occurs when optimizing for size. In this case, we currently enable -Z share-generics by default, which means that generic functions only get monomorphized once and are exported for downstream crates. This means that the function definition is not available even to crate-local LTO. It only becomes available during full cross-crate LTO.

The second point is something we can fix: We probably should not be enabling -Z share-generics by default in any optimized builds, including size-optimized builds.

The first one is trickier, as instantiating monomorphizations in each CGU by default is likely not desirable. Possibly we should just stop considering whether a function is generic or not when it comes to #[inline] placement.

@nikic
Copy link
Contributor Author

nikic commented Oct 1, 2022

I did a bit of crude data analysis, based on calls to non-inlined small functions in ThinLTO inputs. Here's the result: https://gist.github.com/nikic/c1d441ca6468b8c2ee114ff4a11f5667

Some of these are not supposed to be inlined (various panic functions) and there are a lot of drop_in_place functions that really should be inlined but aren't due to noinline annotations (fixed by #102099).

But there are also some obvious candidates where per CGU instantiation is likely beneficial, and we'd probably want to add an #[inline] annotation. Cell::replace() looks like one of the worst offenders. From the ecosystem, SmallVec::size() stands out to me.

# Third 30 entries excluding drop_in_place and various panic helpers.
n=20649 s=3  <core::cell::Cell<isize>>::replace
n=18672 s=1  <alloc::vec::Vec<u8> as core::ops::drop::Drop>::drop
n=8691  s=1  <core::ptr::unique::Unique<u8> as core::convert::Into<core::ptr::non_null::NonNull<u8>>>::into
n=6194  s=7  <[u8] as core::cmp::PartialEq>::eq
n=5741  s=3  <core::cell::Cell<usize>>::replace
n=5350  s=1  <rustc_query_system::dep_graph::graph::DepNodeIndex as core::convert::Into<rustc_data_structures::profiling::QueryInvocationId>>::into
n=4693  s=2  alloc::alloc::box_free::<rustc_ast::ast::Ty, alloc::alloc::Global>
n=1950  s=1  <[rustc_middle::ty::subst::GenericArg; 8] as smallvec::Array>::size
n=1930  s=2  <rustc_ast::ptr::P<rustc_ast::ast::Expr> as core::ops::deref::Deref>::deref
n=1760  s=1  <[rustc_middle::ty::Ty; 8] as smallvec::Array>::size
n=1740  s=2  <rustc_errors::diagnostic_builder::DiagnosticBuilder<rustc_errors::ErrorGuaranteed>>::emit
n=1699  s=1  rustc_data_structures::sync::assert_sync::<rustc_middle::ty::context::tls::ImplicitCtxt>
n=1560  s=1  <rustc_hir_analysis::check::inherited::Inherited as core::ops::deref::Deref>::deref
n=1509  s=1  <core::hash::BuildHasherDefault<rustc_hash::FxHasher> as core::default::Default>::default
n=1498  s=2  alloc::alloc::box_free::<rustc_ast::ast::Pat, alloc::alloc::Global>
n=1481  s=6  <&str as core::convert::Into<alloc::borrow::Cow<str>>>::into
n=1119  s=2  <rustc_ast::ptr::P<rustc_ast::ast::Ty> as core::ops::deref::Deref>::deref
n=1112  s=1  <[rustc_middle::ty::sty::Binder<rustc_middle::ty::sty::ExistentialPredicate>; 8] as smallvec::Array>::size
n=1047  s=6  smallvec::infallible::<()>
n=923   s=2  <rustc_target::abi::TyAndLayout<rustc_middle::ty::Ty> as core::ops::deref::Deref>::deref
n=874   s=6  <std::path::Path>::new::<std::ffi::os_str::OsString>
n=850   s=1  <alloc::vec::Vec<rustc_span::span_encoding::Span> as core::ops::drop::Drop>::drop
n=848   s=6  _$LT$alloc..string..String$u20$as$u20$core..clone..Clone$GT$::clone::h3a6e79de4ee6835e
n=825   s=2  alloc::alloc::box_free::<syn::expr::Expr, alloc::alloc::Global>
n=779   s=2  <tracing_core::metadata::Metadata>::fields
n=761   s=3  <hashbrown::set::HashSet<rustc_target::asm::InlineAsmReg, core::hash::BuildHasherDefault<rustc_hash::FxHasher>>>::insert
n=706   s=1  <rustc_span::def_id::DefId as core::borrow::Borrow<rustc_span::def_id::DefId>>::borrow
n=696   s=3  <rustc_data_structures::atomic_ref::AtomicRef<fn(rustc_span::def_id::LocalDefId)> as core::ops::deref::Deref>::deref
n=694   s=5  alloc::alloc::box_free::<dyn core::error::Error + core::marker::Send + core::marker::Sync, alloc::alloc::Global>

@Kobzol
Copy link
Contributor

Kobzol commented Oct 1, 2022

Do you think that performing thin/fat LTO for librustc_driver (e.g. #101403) could help with this in general?

@bjorn3
Copy link
Member

bjorn3 commented Oct 1, 2022

Generic functions are not included in the same cgu as the user by default. Partitioning produces two cgu's for each module before merging. One containing functions that are actually part of the module and one containing all generic functions used by the module. #[inline] moves functions from the "generics" cgu to the "normal" cgu AFAIK and as such is not useless at all.

@jachris
Copy link
Contributor

jachris commented Oct 26, 2022

I closed #103499 as a duplicate, but reading this again, I'm not so sure. If I read it correctly, this issue is not so much about the behavior of -Zshare_generics=on, but rather about the general placement of generic inline functions, in particular for -Zshare_generics=off.

We probably should not be enabling -Z share-generics by default in any optimized builds, including size-optimized builds.

What are your thoughts on excluding inline generics from sharing?

Personally, I would have expected that instantiations of generic inline functions behave the same way as inline functions, and was surprised to learn that this is not the case. But then again, I just recently took a look at how generics are exactly codegened, so take that with a grain of salt.

@Dylan-DPC Dylan-DPC added the C-enhancement Category: An issue proposing an enhancement or a PR with one. label Feb 4, 2023
thomcc pushed a commit to tcdi/postgrestd that referenced this issue Feb 10, 2023
Mark Cell::replace() as #[inline]

Giving this a try based on rust-lang/rust#102539 (comment).
@veber-alex
Copy link
Contributor

Can we have some compiler heuristic that applies #[inline] automatically?
Even if it is very conservative it will help in some cases to alleviate the need for developers to do it manually.

@workingjubilee workingjubilee added the C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such label Oct 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. C-optimization Category: An issue highlighting optimization opportunities or PRs implementing such
Projects
None yet
Development

No branches or pull requests

7 participants