Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suboptimal inlining decisions #49541

Open
glandium opened this issue Mar 31, 2018 · 11 comments
Open

Suboptimal inlining decisions #49541

glandium opened this issue Mar 31, 2018 · 11 comments
Labels
A-codegen Area: Code generation C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-llvm Working group: LLVM backend code generation

Comments

@glandium
Copy link
Contributor

glandium commented Mar 31, 2018

I suspect this can happen in more cases, but here is how I observed this:

pub fn foo() -> Box<[u8]> {
    vec![0].into_boxed_slice()
}

This compiles to:

  sub rsp, 56
  lea rdx, [rsp + 8]
  mov edi, 1
  mov esi, 1
  call __rust_alloc@PLT
  test rax, rax
  je .LBB2_1
  mov byte ptr [rax], 0
  mov edx, 1
  add rsp, 56
  ret
.LBB2_1:
  (snip oom handling)

Which is pretty much to the point.

Now duplicate the function, so that you now have two functions calling into_boxed_slice(), and the compiler decides not to inline it at all anymore. Which:

  • adds the full blown Vec::into_boxed_slice implementation (63 lines of assembly)
  • adds ptr::drop_in_place
  • and changes the function above to:
  sub rsp, 56
  lea rdx, [rsp + 8]
  mov edi, 1
  mov esi, 1
  call __rust_alloc@PLT
  test rax, rax
  je .LBB4_1
  mov byte ptr [rax], 0
  mov qword ptr [rsp + 8], rax
  mov qword ptr [rsp + 16], 1
  mov qword ptr [rsp + 24], 1
  lea rdi, [rsp + 8]
  call <alloc::vec::Vec<T>>::into_boxed_slice
  add rsp, 56
  ret
.LBB4_1:
  (snip oom handling)

The threshold to stop inlining seems pretty low for this particular case, and even if it might make sense for some uses across the codebase to not be inlined, when the result of inlining is clearly beneficial, it would be good if we could still inline the calls where it's a win.

@glandium
Copy link
Contributor Author

Random thought: maybe this calls for an annotation to force-inline at the caller level.

@nox
Copy link
Contributor

nox commented Mar 31, 2018

Cc @rust-lang/wg-codegen

@oli-obk oli-obk added the WG-llvm Working group: LLVM backend code generation label Mar 31, 2018
@oli-obk
Copy link
Contributor

oli-obk commented Mar 31, 2018

I'd rather not consider more annotations. We already need to tell ppl that, yes there is #[inline(always)], no it will not make your code faster, it's a scalpel for edge cases.

We could probably more aggressively inline in the presence of constant information.

@nox
Copy link
Contributor

nox commented Mar 31, 2018

Now duplicate the function, so that you now have two functions calling into_boxed_slice(), and the compiler decides not to inline it at all anymore.

I wonder how that interacts with #49479.

@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Mar 31, 2018

MergeFunctions could help in the case where the entire callers are identical (edit:) and it runs before inlining. Which is not a very interesting case IMO for something like into_boxed_slice which probably is called from many different places.

I believe the root cause of this issue is that LLVM's inlining cost heuristic special cases functions with internal linkage and just one call site (because in those cases, inlining doesn't increase code size since you can eliminate the function entirely). If there are multiple call sites, IIRC it doesn't do anything to account for the fact that inlining all call sites would still allow eliminating the function. Probably because it can't actually know whether all call sites will inline it. Seems difficult to solve in general.

@oli-obk
Copy link
Contributor

oli-obk commented Mar 31, 2018

If inlining decreases the code size of the caller, we should always inline regardless of heuristics. Idk how to detect such cases before actually doing the inlining

@hanna-kruppe
Copy link
Contributor

That too is part of the inlining heuristic. Of course, it's only an heuristic, and it could always be better, but tuning it further is notoriously fickle.

@nox
Copy link
Contributor

nox commented Mar 31, 2018

All the functions involved in conversions between Vec<T> and Box<[T]> and between String and Box<str> should at least be annotated with #[inline], IMO. And for this purpose, shouldn't <Vec<T>>::shrink_to_fit be #[inline] and have its own self.capacity() != self.len check?

@glandium
Copy link
Contributor Author

@oli-obk

I'd rather not consider more annotations. We already need to tell ppl that, yes there is #[inline(always)], no it will not make your code faster, it's a scalpel for edge cases.

The problem is that there is no scalpel for edge cases when the called function is not annotated at all, which the case here.

@rkruppe

If there are multiple call sites, IIRC it doesn't do anything to account for the fact that inlining all call sites would still allow eliminating the function. Probably because it can't actually know whether all call sites will inline it.

One way to look at it is that part of the problem is this all or nothing property of inlining. Either the function is always inlined or not (AIUI). I'd argue it should be decided case by case. There may be both cases where it makes sense for the function not to be inlined and cases where it doesn't, in the same codebase.

nox added a commit to nox/rust that referenced this issue Apr 1, 2018
This helps with the specific problem described in rust-lang#49541, obviously without
making any large change to how inlining works in the general case.

Everything involved in the conversions is made `#[inline]`, except for the
`<Vec<T>>::into_boxed_slice` entry point which is made `#[inline(always)]`
after checking that duplicating the function mentioned in the issue prevented
its inlining if I only annotate it with `#[inline]`.

For the record, that function was:

```rust
pub fn foo() -> Box<[u8]> {
    vec![0].into_boxed_slice()
}
```

To help the inliner's job, we also hoist a `self.capacity() != self.len` check
in `<Vec<T>>::shrink_to_fit` and mark it as `#[inline]` too.
@hanna-kruppe
Copy link
Contributor

hanna-kruppe commented Apr 1, 2018

@glandium You misunderstand, inlining in LLVM is a per-call-site decision (though it is true that most of the heuristic only looks at the function to inline, not at the call site).

kennytm added a commit to kennytm/rust that referenced this issue Apr 16, 2018
Inline most of the code paths for conversions with boxed slices

This helps with the specific problem described in rust-lang#49541, obviously without making any large change to how inlining works in the general case.

Everything involved in the conversions is made `#[inline]`, except for the `<Vec<T>>::into_boxed_slice` entry point which is made `#[inline(always)]` after checking that duplicating the function mentioned in the issue prevented its inlining if I only annotate it with
`#[inline]`.

For the record, that function was:

```rust
pub fn foo() -> Box<[u8]> {
    vec![0].into_boxed_slice()
}
```

To help the inliner's job, we also hoist a `self.capacity() != self.len` check in `<Vec<T>>::shrink_to_fit` and mark it as `#[inline]` too.
@XAMPPRocky XAMPPRocky added C-enhancement Category: An issue proposing an enhancement or a PR with one. A-codegen Area: Code generation T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jun 29, 2018
@steveklabnik
Copy link
Member

Triage; no change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-enhancement Category: An issue proposing an enhancement or a PR with one. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. WG-llvm Working group: LLVM backend code generation
Projects
None yet
Development

No branches or pull requests

6 participants