Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GlobalsModRef, ValueTracking: Look through threadlocal.address intrinsic #88418

Merged
merged 1 commit into from
Apr 19, 2024

Conversation

MatzeB
Copy link
Contributor

@MatzeB MatzeB commented Apr 11, 2024

This improves handling of threadlocal.address intrinsic in analyses:

The thread-id cannot change within a function with the exception of suspend points of pre-split coroutines. This changes llvm::getUnderlyingObject to look through threadlocal.address in these cases.

GlobalsAAResult::AnalyzeUsesOfPointer checks whether an address can be traced to simple loads/stores or escapes to other places. Starting the analysis from a thread-local GlobalValue the threadlocal.address intrinsic is safe to skip here.

This improves issue #87437

@MatzeB MatzeB marked this pull request as ready for review April 11, 2024 17:48
@MatzeB MatzeB requested a review from nikic as a code owner April 11, 2024 17:48
@llvmbot
Copy link
Collaborator

llvmbot commented Apr 11, 2024

@llvm/pr-subscribers-backend-powerpc
@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-llvm-analysis

Author: Matthias Braun (MatzeB)

Changes

Analyses should typically look through threadlocal.address intrinsic to get at the underlying GlobalValue. This fixes GlobalsAAResult::AnalyzeUsesOfPointer and llvm::getUnderlyingObject.


Full diff: https://github.com/llvm/llvm-project/pull/88418.diff

3 Files Affected:

  • (modified) llvm/lib/Analysis/GlobalsModRef.cpp (+8)
  • (modified) llvm/lib/Analysis/ValueTracking.cpp (+6)
  • (modified) llvm/test/Analysis/GlobalsModRef/nonescaping-noalias.ll (+19)
diff --git a/llvm/lib/Analysis/GlobalsModRef.cpp b/llvm/lib/Analysis/GlobalsModRef.cpp
index 527f19b194eeb9..1ceb1b26294183 100644
--- a/llvm/lib/Analysis/GlobalsModRef.cpp
+++ b/llvm/lib/Analysis/GlobalsModRef.cpp
@@ -344,6 +344,14 @@ bool GlobalsAAResult::AnalyzeUsesOfPointer(Value *V,
       if (AnalyzeUsesOfPointer(I, Readers, Writers, OkayStoreDest))
         return true;
     } else if (auto *Call = dyn_cast<CallBase>(I)) {
+      if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(I)) {
+        if (II->getIntrinsicID() == Intrinsic::threadlocal_address &&
+            V == II->getArgOperand(0)) {
+          if (AnalyzeUsesOfPointer(II, Readers, Writers))
+            return true;
+          continue;
+        }
+      }
       // Make sure that this is just the function being called, not that it is
       // passing into the function.
       if (Call->isDataOperand(&U)) {
diff --git a/llvm/lib/Analysis/ValueTracking.cpp b/llvm/lib/Analysis/ValueTracking.cpp
index 3a10de72a27562..d25ebd80d2411d 100644
--- a/llvm/lib/Analysis/ValueTracking.cpp
+++ b/llvm/lib/Analysis/ValueTracking.cpp
@@ -6269,6 +6269,12 @@ const Value *llvm::getUnderlyingObject(const Value *V, unsigned MaxLookup) {
           continue;
         }
       } else if (auto *Call = dyn_cast<CallBase>(V)) {
+        if (const IntrinsicInst *II = dyn_cast<IntrinsicInst>(V)) {
+          if (II->getIntrinsicID() == Intrinsic::threadlocal_address) {
+            V = II->getArgOperand(0);
+            continue;
+          }
+        }
         // CaptureTracking can know about special capturing properties of some
         // intrinsics like launder.invariant.group, that can't be expressed with
         // the attributes, but have properties like returning aliasing pointer.
diff --git a/llvm/test/Analysis/GlobalsModRef/nonescaping-noalias.ll b/llvm/test/Analysis/GlobalsModRef/nonescaping-noalias.ll
index d109b3b8748ba7..f28e73c48b3e8e 100644
--- a/llvm/test/Analysis/GlobalsModRef/nonescaping-noalias.ll
+++ b/llvm/test/Analysis/GlobalsModRef/nonescaping-noalias.ll
@@ -22,6 +22,25 @@ entry:
   ret i32 %v
 }
 
+@g1_tls = internal thread_local global i32 0
+
+define i32 @test1_tls(ptr %param) {
+; Ensure that we can fold a store to a load of a global across a store to
+; a parameter when the global is non-escaping.
+;
+; CHECK-LABEL: @test1_tls(
+; CHECK: %p = call ptr @llvm.threadlocal.address.p0(ptr @g1_tls)
+; CHECK: store i32 42, ptr %p
+; CHECK-NOT: load i32
+; CHECK: ret i32 42
+entry:
+  %p = call ptr @llvm.threadlocal.address(ptr @g1_tls)
+  store i32 42, ptr %p
+  store i32 7, ptr %param
+  %v = load i32, ptr %p
+  ret i32 %v
+}
+
 declare ptr @f()
 
 define i32 @test2() {

@jyknight
Copy link
Member

I'm a bit worried of this, though I'm not sure if my worry is valid or not.

I worry that adding such code before we fix the underlying representation (to make the thread_local GlovalValue no longer incorrect pretend like it represents an address at the IR level) will end up making it more difficult to do so.

@MatzeB
Copy link
Contributor Author

MatzeB commented Apr 11, 2024

It's possible I missed most of the discussion around this intrinsic. To me this mostly felt like coroutines were breaking because the underlying thread could change across yield points?

At least for GlobalsAAResult::AnalyzeUsesOfPointer it shouldn't really matter at what thread we are looking at... I guess for llvm::getUnderlyingObject it could depend on what calling code is using it for whether changing threads are a problem. For the BasicAA usage it should be fine AFAIK, but admittedly can't necessary say it for other users...

Either way for my use cases here the intrinsic is showing up as a code quality regression compared to older LLVM versions...

@MatzeB
Copy link
Contributor Author

MatzeB commented Apr 11, 2024

At least if we sprinkle in more checks for llvm.threadlocal.address you have a good way to search for places that need updating if you want to change the representation later...

@MatzeB MatzeB force-pushed the threadlocal_ana branch 2 times, most recently from 75c583d to 7b06d6b Compare April 17, 2024 18:59
@MatzeB
Copy link
Contributor Author

MatzeB commented Apr 17, 2024

Added unittests for getUnderlyingObject in #89120 so I can add to them here.

@MatzeB
Copy link
Contributor Author

MatzeB commented Apr 17, 2024

Any way to move this forward?

There is no problem for correctness here:

  • The code hasn't changed substantially since the llvm.threadlocal.address intrinsic was introduced and worked fine before. So skipping the intrinsic in getUnderlyingObject should have no meaningful difference to previous state.
  • We need some representative entity for alias analysis that expresses easily that TLS vars are separate from other global entities and each other. The underlying GlobalVariable works well for that (while the @llvm.threadlocal.address intrinsic does not as it may be duplicated in multiple functions or even within the same function).

@MatzeB
Copy link
Contributor Author

MatzeB commented Apr 17, 2024

Found https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015 and seeing now why this patch is a problem before coroutine splitting... Though still need to catch up on whether we actually developed any further mechanisms in the meantime to maybe express/mark coroutines before/after splitting.

@MatzeB
Copy link
Contributor Author

MatzeB commented Apr 17, 2024

Changed the new code to no longer skip threadlocal.address when isPresplitCoroutine() == true which should address the coroutine concerns (and as far as I can tell we currently have no other situations/attributes indicating that a function could switch thread-ids).

Adding some more reviewers given the discussion from https://discourse.llvm.org/t/address-thread-identification-problems-with-coroutine/62015

@MatzeB
Copy link
Contributor Author

MatzeB commented Apr 17, 2024

Some backend tests changed apparently because the improved getUnderlyingObject results in better alias analysis (and consequently different scheduling decisions in PowerPC).

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Making ValueTracking look though the intirnisc should be fine, I think? Certain IPO passes might need to be a little careful with how they deal with an identified thread-local, but that was true even before threadlocal.address was a thing.

GlobalsModRef seems fine; it never returns MustAlias, so approximating the thread-locals of all threads as a single "global" should be fine.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is okay, but please replace those unit tests with lit tests, e.g. in llvm/test/Analysis/BasicAA. Test the actually desired effect on AA you want from this.

; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT: std r0, 64(r1)
; SMALL-LOCAL-DYNAMIC-SMALLCM64-NEXT: li r6, 2
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't find obvious issue for the PowerPC code gen diffs.

FYI @amy-kwan @orcguru

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@MatzeB
Copy link
Contributor Author

MatzeB commented Apr 18, 2024

Test the actually desired effect on AA you want from this.

Dropping unittests, there were already IR tests added in llvm/test/Analysis/GlobalsModRef/nonescaping-noalias.ll

@MatzeB MatzeB merged commit 6bbccd2 into llvm:main Apr 19, 2024
4 checks passed
@MatzeB
Copy link
Contributor Author

MatzeB commented Apr 19, 2024

Seems the newly added test broke main (thanks Michael Liao for cleaning it up). I wonder why that didn't show in buildkite, doesn't seem to be caused by a recent change...

aniplcc pushed a commit to aniplcc/llvm-project that referenced this pull request Apr 21, 2024
…sic (llvm#88418)

This improves handling of `threadlocal.address` intrinsic in analyses:

The thread-id cannot change within a function with the exception of
suspend points of pre-split coroutines. This changes
`llvm::getUnderlyingObject` to look through `threadlocal.address` in
these cases.

`GlobalsAAResult::AnalyzeUsesOfPointer` checks whether an address can be
traced to simple loads/stores or escapes to other places. Starting the
analysis from a thread-local `GlobalValue` the `threadlocal.address`
intrinsic is safe to skip here.

This improves issue llvm#87437
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants