Skip to content

Commit

Permalink
[LinkerWrapper] Handle AMDGPU Target-IDs correctly when linking (#78359)
Browse files Browse the repository at this point in the history
Summary:
The linker wrapper's job is to sort various embedded inputs into a list
of files that participate in a single link job. So far, this has been
completely 1-to-1, that is, each input file participates in exactly one
link job. However, support for AMD's target-id requires that one input
file may participate in multiple link jobs. For example, if given a
`gfx90a` static library and a `gfx90a:xnack+` object file input, we
should link the gfx90a` target into the `gfx90a:xnack+` job. These are
considered separate CPUs that can be mutually linked more or less.

This patch adds the necessary logic to make this happen. It primarily
reworks the logic to copy relevant input files into a separate list. So,
it moves construction of the final list of link jobs into the extraction
phase. We also need to copy the files in the case that it is needed more
than once, as the entire workflow expects ownership of said file.
  • Loading branch information
jhuber6 committed Jan 18, 2024
1 parent 110e171 commit 12c90bd
Show file tree
Hide file tree
Showing 6 changed files with 126 additions and 38 deletions.
2 changes: 1 addition & 1 deletion clang/lib/Driver/ToolChains/Clang.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -8700,7 +8700,7 @@ void OffloadPackager::ConstructJob(Compilation &C, const JobAction &JA,
SmallVector<std::string> Parts{
"file=" + File.str(),
"triple=" + TC->getTripleString(),
"arch=" + getProcessorFromTargetID(TC->getTriple(), Arch).str(),
"arch=" + Arch.str(),
"kind=" + Kind.str(),
};

Expand Down
2 changes: 1 addition & 1 deletion clang/test/Driver/amdgpu-openmp-toolchain.c
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@

// RUN: %clang -### -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx90a:sramecc-:xnack+ \
// RUN: -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID
// CHECK-TARGET-ID: clang-offload-packager{{.*}}arch=gfx90a,kind=openmp,feature=-sramecc,feature=+xnack
// CHECK-TARGET-ID: clang-offload-packager{{.*}}arch=gfx90a:sramecc-:xnack+,kind=openmp,feature=-sramecc,feature=+xnack

// RUN: not %clang -### -target x86_64-pc-linux-gnu -fopenmp --offload-arch=gfx90a,gfx90a:xnack+ \
// RUN: -nogpulib %s 2>&1 | FileCheck %s --check-prefix=CHECK-TARGET-ID-ERROR
Expand Down
21 changes: 21 additions & 0 deletions clang/test/Driver/linker-wrapper.c
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@
// REQUIRES: nvptx-registered-target
// REQUIRES: amdgpu-registered-target

// UNSUPPORTED: system-linux

// An externally visible variable so static libraries extract.
__attribute__((visibility("protected"), used)) int x;

// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.elf.o
// RUN: %clang -cc1 %s -triple nvptx64-nvidia-cuda -emit-llvm-bc -o %t.nvptx.bc
// RUN: %clang -cc1 %s -triple amdgcn-amd-amdhsa -emit-llvm-bc -o %t.amdgpu.bc
Expand Down Expand Up @@ -150,3 +155,19 @@
// RUN: --linker-path=/usr/bin/lld-link -- %t.o -libpath:./ -out:a.exe 2>&1 | FileCheck %s --check-prefix=COFF

// COFF: "/usr/bin/lld-link" {{.*}}.o -libpath:./ -out:a.exe {{.*}}openmp.image.wrapper{{.*}}

// RUN: clang-offload-packager -o %t-lib.out \
// RUN: --image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx90a
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t.o -fembed-offload-object=%t-lib.out
// RUN: llvm-ar rcs %t.a %t.o
// RUN: clang-offload-packager -o %t-on.out \
// RUN: --image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx90a:xnack+
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t-on.o -fembed-offload-object=%t-on.out
// RUN: clang-offload-packager -o %t-off.out \
// RUN: --image=file=%t.elf.o,kind=openmp,triple=amdgcn-amd-amdhsa,arch=gfx90a:xnack-
// RUN: %clang -cc1 %s -triple x86_64-unknown-linux-gnu -emit-obj -o %t-off.o -fembed-offload-object=%t-off.out
// RUN: clang-linker-wrapper --host-triple=x86_64-unknown-linux-gnu --dry-run \
// RUN: --linker-path=/usr/bin/ld -- %t-on.o %t-off.o %t.a -o a.out 2>&1 | FileCheck %s --check-prefix=AMD-TARGET-ID

// AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack+ -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o
// AMD-TARGET-ID: clang{{.*}} -o {{.*}}.img --target=amdgcn-amd-amdhsa -mcpu=gfx90a:xnack- -O2 -Wl,--no-undefined {{.*}}.o {{.*}}.o
82 changes: 46 additions & 36 deletions clang/tools/clang-linker-wrapper/ClangLinkerWrapper.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -538,7 +538,8 @@ std::unique_ptr<lto::LTO> createLTO(
const ArgList &Args, const std::vector<std::string> &Features,
ModuleHook Hook = [](size_t, const Module &) { return true; }) {
const llvm::Triple Triple(Args.getLastArgValue(OPT_triple_EQ));
StringRef Arch = Args.getLastArgValue(OPT_arch_EQ);
// We need to remove AMD's target-id from the processor if present.
StringRef Arch = Args.getLastArgValue(OPT_arch_EQ).split(":").first;
lto::Config Conf;
lto::ThinBackend Backend;
// TODO: Handle index-only thin-LTO
Expand Down Expand Up @@ -1066,24 +1067,14 @@ DerivedArgList getLinkerArgs(ArrayRef<OffloadFile> Input,

/// Transforms all the extracted offloading input files into an image that can
/// be registered by the runtime.
Expected<SmallVector<StringRef>>
linkAndWrapDeviceFiles(SmallVectorImpl<OffloadFile> &LinkerInputFiles,
const InputArgList &Args, char **Argv, int Argc) {
Expected<SmallVector<StringRef>> linkAndWrapDeviceFiles(
SmallVectorImpl<SmallVector<OffloadFile>> &LinkerInputFiles,
const InputArgList &Args, char **Argv, int Argc) {
llvm::TimeTraceScope TimeScope("Handle all device input");

DenseMap<OffloadFile::TargetID, SmallVector<OffloadFile>> InputMap;
for (auto &File : LinkerInputFiles)
InputMap[File].emplace_back(std::move(File));
LinkerInputFiles.clear();

SmallVector<SmallVector<OffloadFile>> InputsForTarget;
for (auto &[ID, Input] : InputMap)
InputsForTarget.emplace_back(std::move(Input));
InputMap.clear();

std::mutex ImageMtx;
DenseMap<OffloadKind, SmallVector<OffloadingImage>> Images;
auto Err = parallelForEachError(InputsForTarget, [&](auto &Input) -> Error {
auto Err = parallelForEachError(LinkerInputFiles, [&](auto &Input) -> Error {
llvm::TimeTraceScope TimeScope("Link device input");

// Each thread needs its own copy of the base arguments to maintain
Expand Down Expand Up @@ -1359,8 +1350,9 @@ Expected<bool> getSymbols(StringRef Image, OffloadKind Kind, bool IsArchive,
/// Search the input files and libraries for embedded device offloading code
/// and add it to the list of files to be linked. Files coming from static
/// libraries are only added to the input if they are used by an existing
/// input file.
Expected<SmallVector<OffloadFile>> getDeviceInput(const ArgList &Args) {
/// input file. Returns a list of input files intended for a single linking job.
Expected<SmallVector<SmallVector<OffloadFile>>>
getDeviceInput(const ArgList &Args) {
llvm::TimeTraceScope TimeScope("ExtractDeviceCode");

StringRef Root = Args.getLastArgValue(OPT_sysroot_EQ);
Expand All @@ -1372,7 +1364,7 @@ Expected<SmallVector<OffloadFile>> getDeviceInput(const ArgList &Args) {
StringSaver Saver(Alloc);

// Try to extract device code from the linker input files.
SmallVector<OffloadFile> InputFiles;
DenseMap<OffloadFile::TargetID, SmallVector<OffloadFile>> InputFiles;
DenseMap<OffloadFile::TargetID, DenseMap<StringRef, Symbol>> Syms;
bool WholeArchive = Args.hasArg(OPT_wholearchive_flag) ? true : false;
for (const opt::Arg *Arg : Args.filtered(
Expand Down Expand Up @@ -1416,25 +1408,39 @@ Expected<SmallVector<OffloadFile>> getDeviceInput(const ArgList &Args) {
while (Extracted) {
Extracted = false;
for (OffloadFile &Binary : Binaries) {
// If the binary was previously extracted it will be set to null.
if (!Binary.getBinary())
continue;

// If we don't have an object file for this architecture do not
// extract.
if (IsArchive && !WholeArchive && !Syms.count(Binary))
continue;

Expected<bool> ExtractOrErr =
getSymbols(Binary.getBinary()->getImage(),
Binary.getBinary()->getOffloadKind(), IsArchive, Saver,
Syms[Binary]);
if (!ExtractOrErr)
return ExtractOrErr.takeError();

Extracted = !WholeArchive && *ExtractOrErr;

if (!IsArchive || WholeArchive || Extracted)
InputFiles.emplace_back(std::move(Binary));
SmallVector<OffloadFile::TargetID> CompatibleTargets = {Binary};
for (const auto &[ID, Input] : InputFiles)
if (object::areTargetsCompatible(Binary, ID))
CompatibleTargets.emplace_back(ID);

for (const auto &[Index, ID] : llvm::enumerate(CompatibleTargets)) {
// Only extract an if we have an an object matching this target.
if (IsArchive && !WholeArchive && !InputFiles.count(ID))
continue;

Expected<bool> ExtractOrErr = getSymbols(
Binary.getBinary()->getImage(),
Binary.getBinary()->getOffloadKind(), IsArchive, Saver, Syms[ID]);
if (!ExtractOrErr)
return ExtractOrErr.takeError();

Extracted = !WholeArchive && *ExtractOrErr;

// Skip including the file if it is an archive that does not resolve
// any symbols.
if (IsArchive && !WholeArchive && !Extracted)
continue;

// If another target needs this binary it must be copied instead.
if (Index == CompatibleTargets.size() - 1)
InputFiles[ID].emplace_back(std::move(Binary));
else
InputFiles[ID].emplace_back(std::move(Binary.copy()));
}

// If we extracted any files we need to check all the symbols again.
if (Extracted)
Expand All @@ -1447,10 +1453,14 @@ Expected<SmallVector<OffloadFile>> getDeviceInput(const ArgList &Args) {
auto FileOrErr = getInputBitcodeLibrary(Library);
if (!FileOrErr)
return FileOrErr.takeError();
InputFiles.push_back(std::move(*FileOrErr));
InputFiles[*FileOrErr].push_back(std::move(*FileOrErr));
}

return std::move(InputFiles);
SmallVector<SmallVector<OffloadFile>> InputsForTarget;
for (auto &[ID, Input] : InputFiles)
InputsForTarget.emplace_back(std::move(Input));

return std::move(InputsForTarget);
}

} // namespace
Expand Down
25 changes: 25 additions & 0 deletions llvm/include/llvm/Object/OffloadBinary.h
Original file line number Diff line number Diff line change
Expand Up @@ -162,6 +162,19 @@ class OffloadFile : public OwningBinary<OffloadBinary> {
std::unique_ptr<MemoryBuffer> Buffer)
: OwningBinary<OffloadBinary>(std::move(Binary), std::move(Buffer)) {}

/// Make a deep copy of this offloading file.
OffloadFile copy() const {
std::unique_ptr<MemoryBuffer> Buffer = MemoryBuffer::getMemBufferCopy(
getBinary()->getMemoryBufferRef().getBuffer());

// This parsing should never fail because it has already been parsed.
auto NewBinaryOrErr = OffloadBinary::create(*Buffer);
assert(NewBinaryOrErr && "Failed to parse a copy of the binary?");
if (!NewBinaryOrErr)
llvm::consumeError(NewBinaryOrErr.takeError());
return OffloadFile(std::move(*NewBinaryOrErr), std::move(Buffer));
}

/// We use the Triple and Architecture pair to group linker inputs together.
/// This conversion function lets us use these inputs in a hash-map.
operator TargetID() const {
Expand All @@ -186,6 +199,18 @@ OffloadKind getOffloadKind(StringRef Name);
/// Convert an offload kind to its string representation.
StringRef getOffloadKindName(OffloadKind Name);

/// If the target is AMD we check the target IDs for mutual compatibility. A
/// target id is a string conforming to the folowing BNF syntax:
///
/// target-id ::= '<arch> ( : <feature> ( '+' | '-' ) )*'
///
/// The features 'xnack' and 'sramecc' are currently supported. These can be in
/// the state of on, off, and any when unspecified. A target marked as any can
/// bind with either on or off. This is used to link mutually compatible
/// architectures together. Returns false in the case of an exact match.
bool areTargetsCompatible(const OffloadFile::TargetID &LHS,
const OffloadFile::TargetID &RHS);

} // namespace object

} // namespace llvm
Expand Down
32 changes: 32 additions & 0 deletions llvm/lib/Object/OffloadBinary.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -343,3 +343,35 @@ StringRef object::getImageKindName(ImageKind Kind) {
return "";
}
}

bool object::areTargetsCompatible(const OffloadFile::TargetID &LHS,
const OffloadFile::TargetID &RHS) {
// Exact matches are not considered compatible because they are the same
// target. We are interested in different targets that are compatible.
if (LHS == RHS)
return false;

// The triples must match at all times.
if (LHS.first != RHS.first)
return false;

// Only The AMDGPU target requires additional checks.
llvm::Triple T(LHS.first);
if (!T.isAMDGPU())
return false;

// The base processor must always match.
if (LHS.second.split(":").first != RHS.second.split(":").first)
return false;

// Check combintions of on / off features that must match.
if (LHS.second.contains("xnack+") && RHS.second.contains("xnack-"))
return false;
if (LHS.second.contains("xnack-") && RHS.second.contains("xnack+"))
return false;
if (LHS.second.contains("sramecc-") && RHS.second.contains("sramecc+"))
return false;
if (LHS.second.contains("sramecc+") && RHS.second.contains("sramecc-"))
return false;
return true;
}

0 comments on commit 12c90bd

Please sign in to comment.