Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Opt should know x/2 <= x - x/2 is true for udiv #91527

Open
scottmcm opened this issue May 8, 2024 · 8 comments
Open

Opt should know x/2 <= x - x/2 is true for udiv #91527

scottmcm opened this issue May 8, 2024 · 8 comments

Comments

@scottmcm
Copy link

scottmcm commented May 8, 2024

Today, this doesn't optimize away https://llvm.godbolt.org/z/jPjo6M858

define noundef zeroext i1 @demo(i32 noundef %x) unnamed_addr #0 {
start:
  %_2 = udiv i32 %x, 2
  %_3 = sub i32 %x, %_2
  %_0 = icmp ule i32 %_2, %_3
  ret i1 %_0
}

but that could just be return true, as proven in https://alive2.llvm.org/ce/z/dxHhft.

Indeed, it's always true for any denominator greater than zero. Proof: https://alive2.llvm.org/ce/z/CmkN_m

This is important for "half but it needs to be rounded up" cases. x/2 <= x is already understood to be true, but when you need the bigger half instead, the optimizer is more confused. (I lost the original non-minimized example, sorry, but it was something like splitting an n-element slice into n/2 and n-n/2 slices and ending up with the bounds checking not optimizing out.)


Maybe start by getting sub nuw for this? Like

   %_2 = udiv i32 %x, 2
-  %_3 = sub i32 %x, %_2
+  %_3 = sub nuw i32 %x, %_2

Proof for that narrower part: https://alive2.llvm.org/ce/z/xoCDe5

@dtcxzyw
Copy link
Member

dtcxzyw commented May 9, 2024

Emm, it seems like this pattern is rare. Do you encounter this in any real-world rust applications?

Maybe start by getting sub nuw for this? Like

Yeah, we can handle this pattern in ValueTracking like #85555.

@scottmcm
Copy link
Author

scottmcm commented May 9, 2024

Do you encounter this in any real-world rust applications?

Ah, here we go, I remade what I'd been trying to do: https://rust.godbolt.org/z/TczW1Ka1d

I was trying to get same-length mutable slices to the front and the back of a rust slice, so I could walk them together and mutate them both. (And yes I was intentionally allowing the skipping of the middle element for an odd-length slice.) Doing that in safe rust code means going through split_at_mut, and thus I ended up at

pub fn front_and_back_halves(x: &mut [i32]) -> (&mut [i32], &mut [i32]) {
    let n = x.len();
    let (a, b) = x.split_at_mut(n - n/2);
    (&mut a[..n/2], b)
}

expecting that I wouldn't need unsafe code because LLVM would handle it, but it turned out that the a[..n/2] bounds check doesn't optimize out:

define void @front_and_back_halves(ptr dead_on_unwind noalias nocapture noundef writable writeonly sret([32 x i8]) align 8 dereferenceable(32) %_0, ptr noalias noundef nonnull align 4 %x.0, i64 noundef %x.1) unnamed_addr #0 {
start:
  %_41 = lshr i64 %x.1, 1
  %_3 = sub i64 %x.1, %_41
  %_7.i = icmp ugt i64 %_41, %_3
  br i1 %_7.i, label %bb3.i, label %"_ZN106_$LT$core..ops..range..Range$LT$usize$GT$$u20$as$u20$core..slice..index..SliceIndex$LT$$u5b$T$u5d$$GT$$GT$9index_mut17h7c9c67143f1b9d5cE.exit"

bb3.i:                                            ; preds = %start
  tail call void @_ZN4core5slice5index24slice_end_index_len_fail17h07f57a356f69bc93E(i64 noundef %_41, i64 noundef %_3, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @alloc_43779ed79010680daba275a097b13dc0) #2, !noalias !3
  unreachable

"_ZN106_$LT$core..ops..range..Range$LT$usize$GT$$u20$as$u20$core..slice..index..SliceIndex$LT$$u5b$T$u5d$$GT$$GT$9index_mut17h7c9c67143f1b9d5cE.exit": ; preds = %start
  %0 = getelementptr inbounds i32, ptr %x.0, i64 %_3
  store ptr %x.0, ptr %_0, align 8
  %1 = getelementptr inbounds i8, ptr %_0, i64 8
  store i64 %_41, ptr %1, align 8
  %2 = getelementptr inbounds i8, ptr %_0, i64 16
  store ptr %0, ptr %2, align 8
  %3 = getelementptr inbounds i8, ptr %_0, i64 24
  store i64 %_41, ptr %3, align 8
  ret void
}

And it's not enough to just make the n - n/2 be sub nuw: https://rust.godbolt.org/z/PT3hnbe8o


So I agree that humans probably don't write this comparison much, but it easily happens in bounds checks.

(Similarly, split_at_mut(n/2) would also give n/2- and n - n/2-length slices.)


Out of curiosity, I also tried https://rust.godbolt.org/z/4o366q4s1

    let (a, b) = x.split_at_mut(n/2);
    (a, &mut b[n%2..])

But that seems much worse because despite still having a bounds check it also doesn't realize that the lengths of the two returned slices are the same. (As evidenced by writing out two different SSA values as the lengths of the returned slices. In the earlier example note that both lengths are store i64 %_41 because LLVM at least knows they're the same length, even if the bounds check didn't optimize out.)

@Explorer09
Copy link

How about the x / 2 <= (x + 1) / 2 case? When it come to "half but rounded up", I would write the expression as this: ((x + 1) / 2)

@scottmcm
Copy link
Author

scottmcm commented May 13, 2024

@Explorer09 x.len() in rust can be usize::MAX (albeit only for slices of ZSTs), which would make doing it as (x + 1)/2 wrap.

That inspired me to try .split_at_mut(n/2 + n%2); too, but nope (https://rust.godbolt.org/z/Y96EzdcGh) that didn't optimize out either.

EDIT: Oh, but it looks like trunk can simplify that one https://llvm.godbolt.org/z/s5hPGbbnP -- amusingly, it simplifies it to n - n/2 🙃

@Explorer09
Copy link

@scottmcm Good point. Thanks for in the info about the SIZE_MAX where (x + 1) / 2 won't work (because of overflow).
I stand corrected.

@scottmcm
Copy link
Author

scottmcm commented May 17, 2024

Actually, I was thinking about your message more and thought that I should try doing that (x + 1) / 2 approach in a larger type to see if it would work that way. After all, if I zext to a larger type where I can add nuw, it ought to be more obvious that x/2 <= (x+1)/2 -- definitely more obvious than n/2 <= n - n/2.

Sadly, no luck. While it would be legal (https://alive2.llvm.org/ce/z/cU7chM) to optimize the check away, it doesn't even though it figures out both add nuw and trunc nuw: https://llvm.godbolt.org/z/zMKW5PP7x

@Explorer09
Copy link

@scottmcm
I think another pattern that might me worth mentioning is this: (x / 2) + (x % 2) (assuming x is an unsigned integer), or the bitwise equivalent (x >> 1) + (x & 1)

@scottmcm
Copy link
Author

@Explorer09 yup, see #91527 (comment) for that one :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants