Skip to content

Commit

Permalink
[amdgpu] Implement dynamic LDS accesses from non-kernel functions
Browse files Browse the repository at this point in the history
The premise here is to allow non-kernel functions to locate external LDS variables without using LDS or extra magic SGPRs to do so.

1/ First it crawls the callgraph to work out which external LDS variables are reachable from a given kernel
2/ Then it creates a new `extern char[0]` variable for each kernel, which will alias all the other extern LDS variables because that's the documented behaviour of these variables
3/ The address of that variable is written to a lookup table. The global variable is tagged with metadata to track what address it was allocated at by codegen
4/ The assembler builds the lookup table using the metadata
5/ Any non-kernel functions use the same magic intrinsic used by table lookups of non-dynamic LDS variables to find the address to use

Heavy overlap with the code paths taken for other lowering, in particular the same intrinsic is used to pass the dynamic scope information through the same sgpr as for table lookups of static LDS.

Reviewed By: arsenm

Differential Revision: https://reviews.llvm.org/D144233
  • Loading branch information
JonChesterfield committed Apr 4, 2023
1 parent 9b79d0b commit 0507448
Show file tree
Hide file tree
Showing 12 changed files with 813 additions and 88 deletions.
2 changes: 1 addition & 1 deletion llvm/lib/Target/AMDGPU/AMDGPUAlwaysInlinePass.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ static bool alwaysInlineImpl(Module &M, bool GlobalOpt) {
unsigned AS = GV.getAddressSpace();
if ((AS == AMDGPUAS::REGION_ADDRESS) ||
(AS == AMDGPUAS::LOCAL_ADDRESS &&
(!AMDGPUTargetMachine::EnableLowerModuleLDS || !GV.hasInitializer())))
(!AMDGPUTargetMachine::EnableLowerModuleLDS)))
recursivelyVisitUsers(GV, FuncsToAlwaysInline);
}

Expand Down
2 changes: 1 addition & 1 deletion llvm/lib/Target/AMDGPU/AMDGPULegalizerInfo.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2542,7 +2542,7 @@ bool AMDGPULegalizerInfo::legalizeGlobalValue(
// allocated ones. They all share the same offset.
if (B.getDataLayout().getTypeAllocSize(Ty).isZero()) {
// Adjust alignment for that dynamic shared memory array.
MFI->setDynLDSAlign(B.getDataLayout(), *cast<GlobalVariable>(GV));
MFI->setDynLDSAlign(MF.getFunction(), *cast<GlobalVariable>(GV));
LLT S32 = LLT::scalar(32);
auto Sz =
B.buildIntrinsic(Intrinsic::amdgcn_groupstaticsize, {S32}, false);
Expand Down
Loading

0 comments on commit 0507448

Please sign in to comment.