Skip to content

Commit

Permalink
[CUDA][HIP] Fix overloading resolution
Browse files Browse the repository at this point in the history
This patch implements correct hostness based overloading resolution
in isBetterOverloadCandidate.

Based on hostness, if one candidate is emittable whereas the other
candidate is not emittable, the emittable candidate is better.

If both candidates are emittable, or neither is emittable based on hostness, then
other rules should be used to determine which is better. This is because
hostness based overloading resolution is mostly for determining
viability of a function. If two functions are both viable, other factors
should take precedence in preference.

If other rules cannot determine which is better, CUDA preference will be
used again to determine which is better.

However, correct hostness based overloading resolution
requires overloading resolution diagnostics to be deferred,
which is not on by default. The rationale is that deferring
overloading resolution diagnostics may hide overloading reslolutions
issues in header files.

An option -fgpu-exclude-wrong-side-overloads is added, which is off by
default.

When -fgpu-exclude-wrong-side-overloads is off, keep the original behavior,
that is, exclude wrong side overloads only if there are same side overloads.
This may result in incorrect overloading resolution when there are no
same side candates, but is sufficient for most CUDA/HIP applications.

When -fgpu-exclude-wrong-side-overloads is on, enable deferring
overloading resolution diagnostics and enable correct hostness
based overloading resolution, i.e., always exclude wrong side overloads.

Differential Revision: https://reviews.llvm.org/D80450
  • Loading branch information
yxsamliu committed Dec 2, 2020
1 parent baa005c commit acb6f80
Show file tree
Hide file tree
Showing 9 changed files with 457 additions and 54 deletions.
1 change: 1 addition & 0 deletions clang/include/clang/Basic/LangOptions.def
Expand Up @@ -243,6 +243,7 @@ LANGOPT(GPURelocatableDeviceCode, 1, 0, "generate relocatable device code")
LANGOPT(GPUAllowDeviceInit, 1, 0, "allowing device side global init functions for HIP")
LANGOPT(GPUMaxThreadsPerBlock, 32, 256, "default max threads per block for kernel launch bounds for HIP")
LANGOPT(GPUDeferDiag, 1, 0, "defer host/device related diagnostic messages for CUDA/HIP")
LANGOPT(GPUExcludeWrongSideOverloads, 1, 0, "always exclude wrong side overloads in overloading resolution for CUDA/HIP")

LANGOPT(SYCL , 1, 0, "SYCL")
LANGOPT(SYCLIsDevice , 1, 0, "Generate code for SYCL device")
Expand Down
3 changes: 3 additions & 0 deletions clang/include/clang/Driver/Options.td
Expand Up @@ -721,6 +721,9 @@ defm gpu_allow_device_init : OptInFFlag<"gpu-allow-device-init",
defm gpu_defer_diag : OptInFFlag<"gpu-defer-diag",
"Defer", "Don't defer", " host/device related diagnostic messages"
" for CUDA/HIP">;
defm gpu_exclude_wrong_side_overloads : OptInFFlag<"gpu-exclude-wrong-side-overloads",
"Always exclude wrong side overloads", "Exclude wrong side overloads only if there are same side overloads",
" in overloading resolution for CUDA/HIP", [HelpHidden]>;
def gpu_max_threads_per_block_EQ : Joined<["--"], "gpu-max-threads-per-block=">,
Flags<[CC1Option]>,
HelpText<"Default max threads per block for kernel launch bounds for HIP">;
Expand Down
3 changes: 3 additions & 0 deletions clang/include/clang/Sema/Overload.h
Expand Up @@ -1051,6 +1051,9 @@ class Sema;

void destroyCandidates();

/// Whether diagnostics should be deferred.
bool shouldDeferDiags(Sema &S, ArrayRef<Expr *> Args, SourceLocation OpLoc);

public:
OverloadCandidateSet(SourceLocation Loc, CandidateSetKind CSK,
OperatorRewriteInfo RewriteInfo = {})
Expand Down
6 changes: 6 additions & 0 deletions clang/lib/Driver/ToolChains/Clang.cpp
Expand Up @@ -5610,6 +5610,12 @@ void Clang::ConstructJob(Compilation &C, const JobAction &JA,
if (Args.hasFlag(options::OPT_fgpu_defer_diag,
options::OPT_fno_gpu_defer_diag, false))
CmdArgs.push_back("-fgpu-defer-diag");
if (Args.hasFlag(options::OPT_fgpu_exclude_wrong_side_overloads,
options::OPT_fno_gpu_exclude_wrong_side_overloads,
false)) {
CmdArgs.push_back("-fgpu-exclude-wrong-side-overloads");
CmdArgs.push_back("-fgpu-defer-diag");
}
}

if (Arg *A = Args.getLastArg(options::OPT_fcf_protection_EQ)) {
Expand Down
3 changes: 3 additions & 0 deletions clang/lib/Frontend/CompilerInvocation.cpp
Expand Up @@ -2693,6 +2693,9 @@ static void ParseLangArgs(LangOptions &Opts, ArgList &Args, InputKind IK,
if (Args.hasArg(OPT_fno_cuda_host_device_constexpr))
Opts.CUDAHostDeviceConstexpr = 0;

if (Args.hasArg(OPT_fgpu_exclude_wrong_side_overloads))
Opts.GPUExcludeWrongSideOverloads = 1;

if (Args.hasArg(OPT_fgpu_defer_diag))
Opts.GPUDeferDiag = 1;

Expand Down
132 changes: 111 additions & 21 deletions clang/lib/Sema/SemaOverload.cpp
Expand Up @@ -9616,6 +9616,75 @@ bool clang::isBetterOverloadCandidate(
else if (!Cand1.Viable)
return false;

// [CUDA] A function with 'never' preference is marked not viable, therefore
// is never shown up here. The worst preference shown up here is 'wrong side',
// e.g. an H function called by a HD function in device compilation. This is
// valid AST as long as the HD function is not emitted, e.g. it is an inline
// function which is called only by an H function. A deferred diagnostic will
// be triggered if it is emitted. However a wrong-sided function is still
// a viable candidate here.
//
// If Cand1 can be emitted and Cand2 cannot be emitted in the current
// context, Cand1 is better than Cand2. If Cand1 can not be emitted and Cand2
// can be emitted, Cand1 is not better than Cand2. This rule should have
// precedence over other rules.
//
// If both Cand1 and Cand2 can be emitted, or neither can be emitted, then
// other rules should be used to determine which is better. This is because
// host/device based overloading resolution is mostly for determining
// viability of a function. If two functions are both viable, other factors
// should take precedence in preference, e.g. the standard-defined preferences
// like argument conversion ranks or enable_if partial-ordering. The
// preference for pass-object-size parameters is probably most similar to a
// type-based-overloading decision and so should take priority.
//
// If other rules cannot determine which is better, CUDA preference will be
// used again to determine which is better.
//
// TODO: Currently IdentifyCUDAPreference does not return correct values
// for functions called in global variable initializers due to missing
// correct context about device/host. Therefore we can only enforce this
// rule when there is a caller. We should enforce this rule for functions
// in global variable initializers once proper context is added.
//
// TODO: We can only enable the hostness based overloading resolution when
// -fgpu-exclude-wrong-side-overloads is on since this requires deferring
// overloading resolution diagnostics.
if (S.getLangOpts().CUDA && Cand1.Function && Cand2.Function &&
S.getLangOpts().GPUExcludeWrongSideOverloads) {
if (FunctionDecl *Caller = dyn_cast<FunctionDecl>(S.CurContext)) {
bool IsCallerImplicitHD = Sema::isCUDAImplicitHostDeviceFunction(Caller);
bool IsCand1ImplicitHD =
Sema::isCUDAImplicitHostDeviceFunction(Cand1.Function);
bool IsCand2ImplicitHD =
Sema::isCUDAImplicitHostDeviceFunction(Cand2.Function);
auto P1 = S.IdentifyCUDAPreference(Caller, Cand1.Function);
auto P2 = S.IdentifyCUDAPreference(Caller, Cand2.Function);
assert(P1 != Sema::CFP_Never && P2 != Sema::CFP_Never);
// The implicit HD function may be a function in a system header which
// is forced by pragma. In device compilation, if we prefer HD candidates
// over wrong-sided candidates, overloading resolution may change, which
// may result in non-deferrable diagnostics. As a workaround, we let
// implicit HD candidates take equal preference as wrong-sided candidates.
// This will preserve the overloading resolution.
// TODO: We still need special handling of implicit HD functions since
// they may incur other diagnostics to be deferred. We should make all
// host/device related diagnostics deferrable and remove special handling
// of implicit HD functions.
auto EmitThreshold =
(S.getLangOpts().CUDAIsDevice && IsCallerImplicitHD &&
(IsCand1ImplicitHD || IsCand2ImplicitHD))
? Sema::CFP_Never
: Sema::CFP_WrongSide;
auto Cand1Emittable = P1 > EmitThreshold;
auto Cand2Emittable = P2 > EmitThreshold;
if (Cand1Emittable && !Cand2Emittable)
return true;
if (!Cand1Emittable && Cand2Emittable)
return false;
}
}

// C++ [over.match.best]p1:
//
// -- if F is a static member function, ICS1(F) is defined such
Expand Down Expand Up @@ -9850,21 +9919,28 @@ bool clang::isBetterOverloadCandidate(
return Cmp == Comparison::Better;
}

if (S.getLangOpts().CUDA && Cand1.Function && Cand2.Function) {
FunctionDecl *Caller = dyn_cast<FunctionDecl>(S.CurContext);
return S.IdentifyCUDAPreference(Caller, Cand1.Function) >
S.IdentifyCUDAPreference(Caller, Cand2.Function);
}

bool HasPS1 = Cand1.Function != nullptr &&
functionHasPassObjectSizeParams(Cand1.Function);
bool HasPS2 = Cand2.Function != nullptr &&
functionHasPassObjectSizeParams(Cand2.Function);
if (HasPS1 != HasPS2 && HasPS1)
return true;

Comparison MV = isBetterMultiversionCandidate(Cand1, Cand2);
return MV == Comparison::Better;
auto MV = isBetterMultiversionCandidate(Cand1, Cand2);
if (MV == Comparison::Better)
return true;
if (MV == Comparison::Worse)
return false;

// If other rules cannot determine which is better, CUDA preference is used
// to determine which is better.
if (S.getLangOpts().CUDA && Cand1.Function && Cand2.Function) {
FunctionDecl *Caller = dyn_cast<FunctionDecl>(S.CurContext);
return S.IdentifyCUDAPreference(Caller, Cand1.Function) >
S.IdentifyCUDAPreference(Caller, Cand2.Function);
}

return false;
}

/// Determine whether two declarations are "equivalent" for the purposes of
Expand Down Expand Up @@ -9957,7 +10033,11 @@ OverloadCandidateSet::BestViableFunction(Sema &S, SourceLocation Loc,
// only on their host/device attributes. Specifically, if one
// candidate call is WrongSide and the other is SameSide, we ignore
// the WrongSide candidate.
if (S.getLangOpts().CUDA) {
// We only need to remove wrong-sided candidates here if
// -fgpu-exclude-wrong-side-overloads is off. When
// -fgpu-exclude-wrong-side-overloads is on, all candidates are compared
// uniformly in isBetterOverloadCandidate.
if (S.getLangOpts().CUDA && !S.getLangOpts().GPUExcludeWrongSideOverloads) {
const FunctionDecl *Caller = dyn_cast<FunctionDecl>(S.CurContext);
bool ContainsSameSideCandidate =
llvm::any_of(Candidates, [&](OverloadCandidate *Cand) {
Expand Down Expand Up @@ -11620,26 +11700,34 @@ SmallVector<OverloadCandidate *, 32> OverloadCandidateSet::CompleteCandidates(
return Cands;
}

/// When overload resolution fails, prints diagnostic messages containing the
/// candidates in the candidate set.
void OverloadCandidateSet::NoteCandidates(PartialDiagnosticAt PD,
Sema &S, OverloadCandidateDisplayKind OCD, ArrayRef<Expr *> Args,
StringRef Opc, SourceLocation OpLoc,
llvm::function_ref<bool(OverloadCandidate &)> Filter) {

bool OverloadCandidateSet::shouldDeferDiags(Sema &S, ArrayRef<Expr *> Args,
SourceLocation OpLoc) {
bool DeferHint = false;
if (S.getLangOpts().CUDA && S.getLangOpts().GPUDeferDiag) {
// Defer diagnostic for CUDA/HIP if there are wrong-sided candidates.
// Defer diagnostic for CUDA/HIP if there are wrong-sided candidates or
// host device candidates.
auto WrongSidedCands =
CompleteCandidates(S, OCD_AllCandidates, Args, OpLoc, [](auto &Cand) {
return Cand.Viable == false &&
Cand.FailureKind == ovl_fail_bad_target;
return (Cand.Viable == false &&
Cand.FailureKind == ovl_fail_bad_target) ||
(Cand.Function->template hasAttr<CUDAHostAttr>() &&
Cand.Function->template hasAttr<CUDADeviceAttr>());
});
DeferHint = WrongSidedCands.size();
}
return DeferHint;
}

/// When overload resolution fails, prints diagnostic messages containing the
/// candidates in the candidate set.
void OverloadCandidateSet::NoteCandidates(
PartialDiagnosticAt PD, Sema &S, OverloadCandidateDisplayKind OCD,
ArrayRef<Expr *> Args, StringRef Opc, SourceLocation OpLoc,
llvm::function_ref<bool(OverloadCandidate &)> Filter) {

auto Cands = CompleteCandidates(S, OCD, Args, OpLoc, Filter);

S.Diag(PD.first, PD.second, DeferHint);
S.Diag(PD.first, PD.second, shouldDeferDiags(S, Args, OpLoc));

NoteCandidates(S, Args, Cands, Opc, OpLoc);

Expand Down Expand Up @@ -11691,7 +11779,9 @@ void OverloadCandidateSet::NoteCandidates(Sema &S, ArrayRef<Expr *> Args,
}

if (I != E)
S.Diag(OpLoc, diag::note_ovl_too_many_candidates) << int(E - I);
S.Diag(OpLoc, diag::note_ovl_too_many_candidates,
shouldDeferDiags(S, Args, OpLoc))
<< int(E - I);
}

static SourceLocation
Expand Down
5 changes: 5 additions & 0 deletions clang/test/Driver/hip-options.hip
Expand Up @@ -35,3 +35,8 @@
// RUN: %clang -### -nogpuinc -nogpulib -munsafe-fp-atomics \
// RUN: --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=UNSAFE-FP-ATOMICS %s
// UNSAFE-FP-ATOMICS: clang{{.*}} "-triple" "amdgcn-amd-amdhsa" {{.*}} "-munsafe-fp-atomics"

// RUN: %clang -### -target x86_64-unknown-linux-gnu -nogpuinc -nogpulib -fgpu-exclude-wrong-side-overloads \
// RUN: --cuda-gpu-arch=gfx906 %s 2>&1 | FileCheck -check-prefix=FIX-OVERLOAD %s
// FIX-OVERLOAD: clang{{.*}} "-triple" "amdgcn-amd-amdhsa" {{.*}} "-fgpu-exclude-wrong-side-overloads" "-fgpu-defer-diag"
// FIX-OVERLOAD: clang{{.*}} "-triple" "x86_64-unknown-linux-gnu" {{.*}} "-fgpu-exclude-wrong-side-overloads" "-fgpu-defer-diag"
6 changes: 3 additions & 3 deletions clang/test/SemaCUDA/deferred-oeverload.cu
Expand Up @@ -54,7 +54,7 @@ struct B { int x; };
// This fails to substitue for A but no diagnostic
// should be emitted.
template<typename T, typename T::foo* = nullptr>
__host__ __device__ void sfinae(T t) { // com-note {{candidate template ignored: substitution failure [with T = B]}}
__host__ __device__ void sfinae(T t) { // host-note {{candidate template ignored: substitution failure [with T = B]}}
t.x = 1;
}

Expand All @@ -64,13 +64,13 @@ __host__ __device__ void sfinae(T t) { // com-note {{candidate template ignored:
// file scope.

template<typename T, typename T::isA* = nullptr>
__host__ __device__ void sfinae(T t) { // com-note {{candidate template ignored: substitution failure [with T = B]}}
__host__ __device__ void sfinae(T t) { // host-note {{candidate template ignored: substitution failure [with T = B]}}
t.x = 1;
}

void test_sfinae() {
sfinae(A());
sfinae(B()); // com-error{{no matching function for call to 'sfinae'}}
sfinae(B()); // host-error{{no matching function for call to 'sfinae'}}
}

// Make sure throw is diagnosed in OpenMP parallel region in host function.
Expand Down

0 comments on commit acb6f80

Please sign in to comment.