Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LoopUnroll] Introduce PragmaUnrollFullMaxIterations as a hard cap on how many iterations we try to unroll #78648

Merged
merged 9 commits into from
Feb 6, 2024

Conversation

modiking
Copy link
Contributor

@modiking modiking commented Jan 19, 2024

Fixes PR77842 where UBSAN causes pragma full unroll to try and unroll INT_MAX times. This sets a cap to make sure we don't attempt this and crash the compiler.

Testing:
ninja check-all with new test

@modiking modiking requested review from nikic and fhahn January 19, 2024 00:17
@modiking modiking self-assigned this Jan 19, 2024
@modiking modiking marked this pull request as ready for review January 19, 2024 00:19
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 19, 2024

@llvm/pr-subscribers-llvm-transforms

Author: None (modiking)

Changes

Fixes PR77842 where UBSAN causes pragma full unroll to try and unroll INT_MAX times. This sets a cap to make sure we don't attempt this and crash the compiler.

Testing:
ninja check-all with new test


Full diff: https://github.com/llvm/llvm-project/pull/78648.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Utils/LoopUnroll.cpp (+12)
  • (added) llvm/test/Transforms/LoopUnroll/pr77842.ll (+37)
diff --git a/llvm/lib/Transforms/Utils/LoopUnroll.cpp b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
index ee6f7b35750af0f..8529fa1db18e187 100644
--- a/llvm/lib/Transforms/Utils/LoopUnroll.cpp
+++ b/llvm/lib/Transforms/Utils/LoopUnroll.cpp
@@ -109,6 +109,10 @@ UnrollVerifyLoopInfo("unroll-verify-loopinfo", cl::Hidden,
 #endif
                     );
 
+static cl::opt<unsigned>
+    UnrollMaxIterations("unroll-max-iterations", cl::init(1'000'000),
+                        cl::Hidden,
+                        cl::desc("Maximum allowed iterations to unroll."));
 
 /// Check if unrolling created a situation where we need to insert phi nodes to
 /// preserve LCSSA form.
@@ -453,6 +457,14 @@ LoopUnrollResult llvm::UnrollLoop(Loop *L, UnrollLoopOptions ULO, LoopInfo *LI,
     }
   }
 
+  // Certain cases with UBSAN can cause trip count to be calculated as INT_MAX,
+  // Block unrolling at a reasonable limit so that the compiler doesn't hang
+  // trying to unroll the loop. See PR77842
+  if (ULO.Count > UnrollMaxIterations) {
+    LLVM_DEBUG(dbgs() << "Won't unroll; trip count is too large\n");
+    return LoopUnrollResult::Unmodified;
+  }
+
   using namespace ore;
   // Report the unrolling decision.
   if (CompletelyUnroll) {
diff --git a/llvm/test/Transforms/LoopUnroll/pr77842.ll b/llvm/test/Transforms/LoopUnroll/pr77842.ll
new file mode 100644
index 000000000000000..834033bbe3618eb
--- /dev/null
+++ b/llvm/test/Transforms/LoopUnroll/pr77842.ll
@@ -0,0 +1,37 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py
+; RUN: opt -passes=loop-unroll -disable-output -debug-only=loop-unroll %s 2>&1 | FileCheck %s
+
+; Validate that loop unroll full doesn't try to fully unroll values whose trip counts are too large.
+
+; CHECK: Exiting block %cont23: TripCount=2147483648, TripMultiple=0, BreakoutTrip=0
+; CHECK-NEXT: Won't unroll; trip count is too large
+
+target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+target triple = "x86_64-redhat-linux-gnu"
+
+define void @foo(i64 %end) {
+entry:
+  br label %loopheader
+
+loopheader:
+  %iv = phi i64 [ 0, %entry ], [ %iv_new, %backedge ]
+  %exit = icmp eq i64 %iv, %end
+  br i1 %exit, label %for.cond.cleanup.loopexit, label %cont23
+
+for.cond.cleanup.loopexit:
+  ret void
+
+cont23:
+  %exitcond241 = icmp eq i64 %iv, 2147483647
+  br i1 %exitcond241, label %handler.add_overflow, label %backedge
+
+handler.add_overflow:
+  unreachable
+
+backedge: ; preds = %cont23
+  %iv_new = add i64 %iv, 1
+  br label %loopheader, !llvm.loop !0
+}
+
+!0 = distinct !{!0, !1}
+!1 = !{!"llvm.loop.unroll.full"}

llvm/lib/Transforms/Utils/LoopUnroll.cpp Outdated Show resolved Hide resolved
llvm/test/Transforms/LoopUnroll/pr77842.ll Outdated Show resolved Hide resolved
llvm/test/Transforms/LoopUnroll/pr77842.ll Outdated Show resolved Hide resolved
llvm/lib/Transforms/Utils/LoopUnroll.cpp Outdated Show resolved Hide resolved
if (UP.Count > UnrollMaxIterations) {
LLVM_DEBUG(dbgs() << "Won't unroll; trip count is too large\n");
return LoopUnrollResult::Unmodified;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works, but I don't think it fits well into the way the unroll count is currently computed. Rather than checking this after the fact, we should integrate this into the unroll count calculation. In particular, inside https://github.com/llvm/llvm-project/blob/88b1087035fd397996837e35d579d808d6b3f28c/llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp#L784-L785 we should apply a pragma-specific iteration limit. This will also ensure that even if we decline to perform a full unroll to an unreasonable count, we can still chose to perform a partial or runtime unroll to a lower iteration count.

@modiking
Copy link
Contributor Author

@nikic Appreciate the thorough review on the best way to get this done. Right now, with locking the full unroll under Pragma the repro unrolls 2048 times which is fine but fairly slow to complete (155s on my SkyLake).

Thoughts on a way to cap this in some way or is this alright?

@modiking
Copy link
Contributor Author

Friendly ping @nikic

@nikic
Copy link
Contributor

nikic commented Jan 30, 2024

@nikic Appreciate the thorough review on the best way to get this done. Right now, with locking the full unroll under Pragma the repro unrolls 2048 times which is fine but fairly slow to complete (155s on my SkyLake).

Thoughts on a way to cap this in some way or is this alright?

I think the general behavior is fine -- we still get the aggressive unrolling expected from #pragma unroll, while staying within the pragma unroll threshold.

For the test case, it should be possible to reduce the amount of unrolling by setting -pragma-unroll-threshold to a lower value.

Though I'm surprised that unrolling to 2048 times takes 155s -- is this with a debug build of LLVM by chance?

@modiking
Copy link
Contributor Author

Though I'm surprised that unrolling to 2048 times takes 155s -- is this with a debug build of LLVM by chance?

I checked again and I think you're right on that, with a release build it's much faster:

~/llvm-project/llvm/test/Transforms/LoopUnroll# time ~/llvm-project/build-rel/bin/llvm-lit pr77842.ll 
-- Testing: 1 tests, 1 workers --
PASS: LLVM :: Transforms/LoopUnroll/pr77842.ll (1 of 1)

Testing Time: 6.25s

Total Discovered Tests: 1
  Passed: 1 (100.00%)

real    0m6.486s
user    0m6.430s
sys     0m0.083s

@nikic
Copy link
Contributor

nikic commented Feb 1, 2024

I still think you should use -pragma-unroll-threshold in the test to limit the size. It's not really useful to have a 12-thousand line test vs one that tests the same thing on smaller limits.

A possible alternative would be to use the loop-unroll-full pass instead of the loop-unroll pass. I think that one should just stop unrolling entirely after your change.

@modiking
Copy link
Contributor Author

modiking commented Feb 1, 2024

I still think you should use -pragma-unroll-threshold in the test to limit the size. It's not really useful to have a 12-thousand line test vs one that tests the same thing on smaller limits.

A possible alternative would be to use the loop-unroll-full pass instead of the loop-unroll pass. I think that one should just stop unrolling entirely after your change.

Makes sense. Good suggestion with loop-unroll-full, confirmed it fails without this PR and works correctly with

@modiking
Copy link
Contributor Author

modiking commented Feb 2, 2024

@nikic Hmm interesting enough it's still trying to partially unroll with -loop-unroll-full. Looking into it more, this condition isn't bailing out:

  // Do not attempt partial/runtime unrolling in FullLoopUnrolling
  if (OnlyFullUnroll && !(UP.Count >= MaxTripCount)) {
    LLVM_DEBUG(
        dbgs() << "Not attempting partial/runtime unroll in FullLoopUnroll.\n");
    return LoopUnrollResult::Unmodified;
  }

Because TripCount == 2147483648 so MaxTripCount == 0 while partial unroll sets UP.Count == 2048. Scanning through I think we should also catch this case and add a check for UP.Count < TripCount. WDYT?

@nikic
Copy link
Contributor

nikic commented Feb 2, 2024

@modiking Yeah, I think that would make sense. Assuming it doesn't break something else...

@modiking
Copy link
Contributor Author

modiking commented Feb 5, 2024

@nikic Change passes unit tests, also checked that both are needed as it fails a unit test if we only switch to UP.Count < TripCount

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

llvm/lib/Transforms/Scalar/LoopUnrollPass.cpp Outdated Show resolved Hide resolved
@nikic
Copy link
Contributor

nikic commented Feb 5, 2024

Change passes unit tests, also checked that both are needed as it fails a unit test if we only switch to UP.Count < TripCount

Yeah, currently we need both as they are mutually exclusive -- we only set MaxTripCount if TripCount is not available. Does make we wonder if it would make things cleaner if we just always computed MaxTripCount. But that's unrelated to this patch...

@modiking
Copy link
Contributor Author

modiking commented Feb 5, 2024

Thanks for the review!

Change passes unit tests, also checked that both are needed as it fails a unit test if we only switch to UP.Count < TripCount

Yeah, currently we need both as they are mutually exclusive -- we only set MaxTripCount if TripCount is not available. Does make we wonder if it would make things cleaner if we just always computed MaxTripCount. But that's unrelated to this patch...

That would be cleaner. But yeah a different patch

@modiking modiking changed the title [LoopUnroll] Introduce UnrollMaxIterations as a hard cap on how many iterations we try to unroll [LoopUnroll] Introduce PragmaUnrollFullMaxIterations as a hard cap on how many iterations we try to unroll Feb 5, 2024
@modiking modiking merged commit 99ddd77 into llvm:main Feb 6, 2024
3 of 4 checks passed
@dwblaikie
Copy link
Collaborator

we're seeing a failure with this patch internally (don't have a reproducer yet, just letting you know in case it's actionable in the mean time while we continue investigating)

An existing loop with a static bound (256) with a #pragma unroll worked without the patch, but the resulting program times out with this patch applied. Putting an explicit bound on the unroll (which should be identical to the static bound in the loop) like #pragma unroll 256 doesn't time out anymore.

The contents of the loop does include a couple of gotos, a continue and a return (all under various conditions), if that's relevant/helpful.

@modiking
Copy link
Contributor Author

modiking commented Feb 14, 2024

we're seeing a failure with this patch internally (don't have a reproducer yet, just letting you know in case it's actionable in the mean time while we continue investigating)

An existing loop with a static bound (256) with a #pragma unroll worked without the patch, but the resulting program times out with this patch applied. Putting an explicit bound on the unroll (which should be identical to the static bound in the loop) like #pragma unroll 256 doesn't time out anymore.

The contents of the loop does include a couple of gotos, a continue and a return (all under various conditions), if that's relevant/helpful.

Interesting, no certainty on the answer here but there's 2 changes here:

  1. Detect super high full unroll and bail
  2. Add an additional condition on loop-unroll-full phase that blocks it from partially unrolling in certain cases (UP.Count < TripCount)

Could be worthwhile to see which one of these is causing the issue. Also possible this is an existing issue and was just exposed by this change

@dwblaikie
Copy link
Collaborator

It might be that this test times out if the loop isn't fully unrolled. (removing the unroll pragma causes the program to timeout - FWIW, this is is something BPF related - there's some comment about the pragma being required for some verification - I don't really understand BPF enough to explain it)

If I remove this part of your change:

if (OnlyFullUnroll && (UP.Count < TripCount || UP.Count < MaxTripCount)) {

And switch it back to the old code, the program finishes in roughly the expected time (~3-4 minutes, rather than hitting our 10min test timeout).

@modiking
Copy link
Contributor Author

Gotcha, that seems to be the issue then.

To me this seems like it's a source side fix to ensure full unrolling via specifying the trip count or using #pragma clang loop unroll(full). cc @nikic

@dwblaikie
Copy link
Collaborator

So not sure I'm following. The condition that was added in this patch that seems to be relevant, if I'm understanding correctly, is to add UP.Count < TripCount as a condition to bailout/abort all the unrolling, if the trip count exceeds the count on the unroll pragma, I guess?

But there is no count on the unroll pragma in this case - so why would that condition fire? (I'm poking around in a debugger now to better understand it)

(& I tried to find some documentation, but couldn't - to figure out what #pragma clang loop unroll means, and how it differs from #pragma clang loop unroll(full) but didn't find any docs - do you have a link to some?)

@dwblaikie
Copy link
Collaborator

Oh, I found https://releases.llvm.org/4.0.0/tools/clang/docs/AttributeReference.html#pragma-unroll-pragma-nounroll

Which says "Specifying #pragma unroll without a parameter directs the loop unroller to attempt to fully unroll the loop if the trip count is known at compile time and attempt to partially unroll the loop if the trip count is not known at compile time" The loop condition is simple/known here (though the conditional goto/returns - maybe those complicate things?)

Oh, the docs also say "#pragma unroll and #pragma unroll value have identical semantics to #pragma clang loop unroll(full) and #pragma clang loop unroll_count(value) respectively. " - so that doesn't sound like #pragma clang loop unroll(full) would help the situation?

@modiking
Copy link
Contributor Author

Oh, the docs also say "#pragma unroll and #pragma unroll value have identical semantics to #pragma clang loop unroll(full) and #pragma clang loop unroll_count(value) respectively. " - so that doesn't sound like #pragma clang loop unroll(full) would help the situation?

Oh interesting, I was't quite certain myself on what the exact semantics are. Thanks for digging it up.

The condition that was added in this patch that seems to be relevant, if I'm understanding correctly, is to add UP.Count < TripCount as a condition to bailout/abort all the unrolling, if the trip count exceeds the count on the unroll pragma, I guess?

The way the code is structured is that the unroll trip count is either stored in TripCount or in MaxTripCount depending on how it was calculated. Before, only checking against MaxTripCount would cause #pragma clang loop unroll(full) to also perform partial unrolling under -loop-unroll-full.

Actually, I wonder if that's what causing the change where previously it would try and do a partial unroll but that no longer happens. What's the output with -debug-only=loop-unroll?

@dwblaikie
Copy link
Collaborator

I haven't done proper A/B debugging before and after this patch yet - but you might be right that the code is currently was previously falling under partial unrolling - but I don't know why it would count as partial unrolling?

The code here

if (PInfo.PragmaFullUnroll && TripCount != 0) {
// Certain cases with UBSAN can cause trip count to be calculated as
// INT_MAX, Block full unrolling at a reasonable limit so that the compiler
// doesn't hang trying to unroll the loop. See PR77842
if (TripCount > PragmaUnrollFullMaxIterations) {
LLVM_DEBUG(dbgs() << "Won't unroll; trip count is too large\n");
return std::nullopt;
}
return TripCount;
}
if (PInfo.PragmaEnableUnroll && !TripCount && MaxTripCount &&
MaxTripCount <= UP.MaxUpperBound)
return MaxTripCount;

If I'm reading this right, it doesn't look like it treats PragmaFullUnroll and PragmaEnableUnroll equally - or perhaps clang is mis-lowering pragma without a bound & it should be lowering it to pragma full unroll?

The LLVM docs: https://llvm.org/docs/TransformMetadata.html#loop-unrolling don't seem to be super clear about what the different IR metadata means?

@nikic
Copy link
Contributor

nikic commented Feb 15, 2024

Oh, the docs also say "#pragma unroll and #pragma unroll value have identical semantics to #pragma clang loop unroll(full) and #pragma clang loop unroll_count(value) respectively. " - so that doesn't sound like #pragma clang loop unroll(full) would help the situation?

You're looking at some very old docs there. The current ones are https://clang.llvm.org/docs/AttributeReference.html#pragma-unroll-pragma-nounroll and they have the s/full/enable typo fixed.

The unrolling heuristics are very complex and it's hard to make any definitive statements without a reproducer. Based on what you say, I suspect that this patch has moved a partial unroll from happening in the full unroll pass to the runtime unroll pass and that exposed a second order compile-time issue in your case. Or it might be something else entirely...

@dwblaikie
Copy link
Collaborator

Oh, the docs also say "#pragma unroll and #pragma unroll value have identical semantics to #pragma clang loop unroll(full) and #pragma clang loop unroll_count(value) respectively. " - so that doesn't sound like #pragma clang loop unroll(full) would help the situation?

You're looking at some very old docs there. The current ones are https://clang.llvm.org/docs/AttributeReference.html#pragma-unroll-pragma-nounroll and they have the s/full/enable typo fixed.
Oh, right, thanks for spotting that.

Still - going to that documentation, then going down into the linked LanguageExtensions documentation mentions this: "If unroll(full) is specified the unroller will attempt to fully unroll the loop if the trip count is known at compile time identically to unroll(enable). However, with unroll(full) the loop will not be unrolled if the loop count is not known at compile time."

So that sounds like "unroll(full)" should unroll in fewer cases than "unroll(enable)"? But in this case, with this patch applied, it seems like the loop gets unrolled only with "unroll(full)" and not with "unroll(enable)"

The unrolling heuristics are very complex and it's hard to make any definitive statements without a reproducer. Based on what you say, I suspect that this patch has moved a partial unroll from happening in the full unroll pass to the runtime unroll pass and that exposed a second order compile-time issue in your case. Or it might be something else entirely...

Yep, we're working on reproducers & trying to talk through what we have until then in case it's helpful in the mean time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pragma clang loop unroll(full) and UBSAN causes hangs attempting to unroll INT_MAX+1 times
5 participants