-
Notifications
You must be signed in to change notification settings - Fork 10.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BOLT] Fix local out-of-range stub issue that leads to infinite loop in LongJmp pass #73918
[BOLT] Fix local out-of-range stub issue that leads to infinite loop in LongJmp pass #73918
Conversation
More detail for the infinite loop: BB has a local stub .LStub1111 that is out of range, and the execution path entered the problematic code piece.
After the problematic code,
likely because of we did not remove the relation between BB
Then Bolt was stuck in the loop of back-and-forth conversions between |
bolt/lib/Passes/LongJmp.cpp
Outdated
TgtSym = BC.MIB->getTargetSymbol(*TgtBB->begin()); | ||
TgtBB = BB.getSuccessor(TgtSym, BI); | ||
BB.replaceSuccessor(TgtBB, TgtBB->getSuccessor(TgtSym, BI), BI.Count, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TgtBB->getSuccessor(TgtSym, BI) might return NULL, since the veneer/stub "successor" might be the other function (e.g. function call was replaced by call to the veneer and jump from it). replaceSuccessor would fail in this case. So replaceSuccessor logic and further execution count calculations must be call under "TgtBB->getSuccessor(TgtSym, BI)" condition. The last getsuccessor might be be called outside this condition, since NULL is expected return in such case.
bolt/lib/Passes/LongJmp.cpp
Outdated
// local stub. We replace with the target of the local stub instead | ||
// of creating a stub to jump to another stub. | ||
// e.g. | ||
// change the out-of-range stub |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably just text would be more useful here, just add smth about that we need to also return BB's successors to previous state i.e. set it to the target of stub in case it was a local stub.
Thanks for the patch @linsinan1995 ! I leaved a couple of comments above, I think there might be a problem with this patch and it needs a small adjustment. |
06f4b4d
to
cd5c4f4
Compare
Hi @yota9 Thank you for pointing out the issue. An update has been made. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM Thanks for the fix! Just fix a small nit please.
Let's wait 1 more day, maybe META team has some more comments, then fill free to commit :)
"At least equal or greater than the branch count."); | ||
TgtBB->setExecutionCount(TgtBB->getExecutionCount() - BI.Count); | ||
} | ||
TgtBB = TgtBBSucc; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please add new line after }
Can you please retitle to an imperative statement containing what you're changing? Your branch name fix-longjmp-out-of-range-local-stub is a good title. |
bolt/lib/Passes/LongJmp.cpp
Outdated
TgtSym = BC.MIB->getTargetSymbol(*TgtBB->begin()); | ||
TgtBB = BB.getSuccessor(TgtSym, BI); | ||
auto *TgtBBSucc = TgtBB->getSuccessor(TgtSym, BI); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also replace auto with BinaryBasicBlock please, we prefer not to use auto.
cd5c4f4
to
f03a10d
Compare
Hi @aaupov . it is a bit tricky to build a case that has a local out-of-range stub. any suggestion? |
@aaupov We don't have a tests for LongJmp usually. This case is quite hard to reproduce. I think we should continue without it for now.. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for this fix, @linsinan1995. The only thing I would do differently is to add an assert that TgtSym is not null after line 208 "TgtSym = ....getTargetSymbol(*TgtBB->begin())", so we are not taken by surprise if we change the contents of a stub and all of a sudden the first instruction is not the branch anymore, in which case we will silently return nullptr to TgtSym and break LongJmp. There are a lot of assumptions in that line, which, admittedly, is part of the original code and not your code.
f03a10d
to
b5b8c8e
Compare
Added. Investigating bugs related to LongJmp is quite troublesome. Adding an assertion to prevent the continuation of the subsequent logic when TgtSym is null is indeed very helpful for debugging. Thanks for the suggestion. |
bolt/lib/Passes/LongJmp.cpp
Outdated
TgtSym = BC.MIB->getTargetSymbol(*TgtBB->begin()); | ||
TgtBB = BB.getSuccessor(TgtSym, BI); | ||
assert(TgtSym && "First instruction is expected to be a branch."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first instruction doesn't have to be a branch. It is probably would be ADRP one for stub. More proper would be "Expected first instruction to contain a target symbol" or smth like that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Changed.
b5b8c8e
to
f5b3bd9
Compare
If a local stub is out-of-range, at LongJmp we will try to find another local stub first. However, The original implementation do not work as expected and it leads to an infinite loop between replaceTargetWithStub and fixBranches. After this patch, we first convert the target of BB back to the target of the local stub, and then look up for other valid local stubs and so on.
f5b3bd9
to
e3f587f
Compare
If a local stub is out-of-range, at LongJmp we will try to find another local stub first. However, there seems to be a problem with the code logic which leads to an infinite loop at LongJmp pass in my workload.
https://github.com/llvm/llvm-project/blob/main/bolt/lib/Passes/LongJmp.cpp#L203-L209
TgtSym
now is the target of the local stub (statement 1), and thus it is not a successor ofBB
(statement 2), and thus TgtBB will be set to nullptr.After this patch, we first convert the target of
BB
back to the target of the local stub, and then look up for other valid local stubs... I tested it on my workload, and it works fine after this change.