Skip to content

Commit

Permalink
LangRef: Add "dynamic" option to "denormal-fp-math"
Browse files Browse the repository at this point in the history
This is stricter than the default "ieee", and should probably be the
default. This patch leaves the default alone. I can change this in a
future patch.

There are non-reversible transforms I would like to perform which are
legal under IEEE denormal handling, but illegal with flushing zero
behavior. Namely, conversions between llvm.is.fpclass and fcmp with
zeroes.

Under "ieee" handling, it is legal to translate between
llvm.is.fpclass(x, fcZero) and fcmp x, 0.

Under "preserve-sign" handling, it is legal to translate between
llvm.is.fpclass(x, fcSubnormal|fcZero) and fcmp x, 0.

I would like to compile and distribute some math library functions in
a mode where it's callable from code with and without denormals
enabled, which requires not changing the compares with denormals or
zeroes.

If an IEEE function transforms an llvm.is.fpclass call into an fcmp 0,
it is no longer possible to call the function from code with denormals
enabled, or write an optimization to move the function into a denormal
flushing mode. For the original function, if x was a denormal, the
class would evaluate to false. If the function compiled with denormal
handling was converted to or called from a preserve-sign function, the
fcmp now evaluates to true.

This could also be of use for strictfp handling, where code may be
changing the denormal mode.

Alternative name could be "unknown".

Replaces the old AMDGPU custom inlining logic with more conservative
logic which tries to permit inlining for callees with dynamic handling
and avoids inlining other mismatched modes.
  • Loading branch information
arsenm committed Apr 29, 2023
1 parent 0610e2f commit bc37be1
Show file tree
Hide file tree
Showing 26 changed files with 1,490 additions and 104 deletions.
106 changes: 92 additions & 14 deletions clang/lib/CodeGen/CGCall.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1829,10 +1829,32 @@ static bool HasStrictReturn(const CodeGenModule &Module, QualType RetTy,
Module.getLangOpts().Sanitize.has(SanitizerKind::Return);
}

void CodeGenModule::getDefaultFunctionAttributes(StringRef Name,
bool HasOptnone,
bool AttrOnCallSite,
llvm::AttrBuilder &FuncAttrs) {
/// Add denormal-fp-math and denormal-fp-math-f32 as appropriate for the
/// requested denormal behavior, accounting for the overriding behavior of the
/// -f32 case.
static void addDenormalModeAttrs(llvm::DenormalMode FPDenormalMode,
llvm::DenormalMode FP32DenormalMode,
llvm::AttrBuilder &FuncAttrs) {
if (FPDenormalMode != llvm::DenormalMode::getDefault())
FuncAttrs.addAttribute("denormal-fp-math", FPDenormalMode.str());

if (FP32DenormalMode != FPDenormalMode && FP32DenormalMode.isValid())
FuncAttrs.addAttribute("denormal-fp-math-f32", FP32DenormalMode.str());
}

/// Add default attributes to a function, which have merge semantics under
/// -mlink-builtin-bitcode and should not simply overwrite any existing
/// attributes in the linked library.
static void
addMergableDefaultFunctionAttributes(const CodeGenOptions &CodeGenOpts,
llvm::AttrBuilder &FuncAttrs) {
addDenormalModeAttrs(CodeGenOpts.FPDenormalMode, CodeGenOpts.FP32DenormalMode,
FuncAttrs);
}

void CodeGenModule::getTrivialDefaultFunctionAttributes(
StringRef Name, bool HasOptnone, bool AttrOnCallSite,
llvm::AttrBuilder &FuncAttrs) {
// OptimizeNoneAttr takes precedence over -Os or -Oz. No warning needed.
if (!HasOptnone) {
if (CodeGenOpts.OptimizeSize)
Expand Down Expand Up @@ -1874,15 +1896,6 @@ void CodeGenModule::getDefaultFunctionAttributes(StringRef Name,
if (CodeGenOpts.NullPointerIsValid)
FuncAttrs.addAttribute(llvm::Attribute::NullPointerIsValid);

if (CodeGenOpts.FPDenormalMode != llvm::DenormalMode::getIEEE())
FuncAttrs.addAttribute("denormal-fp-math",
CodeGenOpts.FPDenormalMode.str());
if (CodeGenOpts.FP32DenormalMode != CodeGenOpts.FPDenormalMode) {
FuncAttrs.addAttribute(
"denormal-fp-math-f32",
CodeGenOpts.FP32DenormalMode.str());
}

if (LangOpts.getDefaultExceptionMode() == LangOptions::FPE_Ignore)
FuncAttrs.addAttribute("no-trapping-math", "true");

Expand Down Expand Up @@ -1984,6 +1997,19 @@ void CodeGenModule::getDefaultFunctionAttributes(StringRef Name,
}
}

void CodeGenModule::getDefaultFunctionAttributes(StringRef Name,
bool HasOptnone,
bool AttrOnCallSite,
llvm::AttrBuilder &FuncAttrs) {
getTrivialDefaultFunctionAttributes(Name, HasOptnone, AttrOnCallSite,
FuncAttrs);
if (!AttrOnCallSite) {
// If we're just getting the default, get the default values for mergeable
// attributes.
addMergableDefaultFunctionAttributes(CodeGenOpts, FuncAttrs);
}
}

void CodeGenModule::addDefaultFunctionDefinitionAttributes(llvm::Function &F) {
llvm::AttrBuilder FuncAttrs(F.getContext());
getDefaultFunctionAttributes(F.getName(), F.hasOptNone(),
Expand All @@ -1992,8 +2018,60 @@ void CodeGenModule::addDefaultFunctionDefinitionAttributes(llvm::Function &F) {
F.addFnAttrs(FuncAttrs);
}

/// Apply default attributes to \p F, accounting for merge semantics of
/// attributes that should not overwrite existing attributes.
void CodeGenModule::mergeDefaultFunctionDefinitionAttributes(
llvm::Function &F, bool WillInternalize) {
llvm::AttrBuilder FuncAttrs(F.getContext());
getTrivialDefaultFunctionAttributes(F.getName(), F.hasOptNone(),
/*AttrOnCallSite=*/false, FuncAttrs);
GetCPUAndFeaturesAttributes(GlobalDecl(), FuncAttrs);

if (!WillInternalize && F.isInterposable()) {
// Do not promote "dynamic" denormal-fp-math to this translation unit's
// setting for weak functions that won't be internalized. The user has no
// real control for how builtin bitcode is linked, so we shouldn't assume
// later copies will use a consistent mode.
F.addFnAttrs(FuncAttrs);
return;
}

llvm::AttributeMask AttrsToRemove;

llvm::DenormalMode DenormModeToMerge = F.getDenormalModeRaw();
llvm::DenormalMode DenormModeToMergeF32 = F.getDenormalModeF32Raw();
llvm::DenormalMode Merged =
CodeGenOpts.FPDenormalMode.mergeCalleeMode(DenormModeToMerge);
llvm::DenormalMode MergedF32 = CodeGenOpts.FP32DenormalMode;

if (DenormModeToMergeF32.isValid()) {
MergedF32 =
CodeGenOpts.FP32DenormalMode.mergeCalleeMode(DenormModeToMergeF32);
}

if (Merged == llvm::DenormalMode::getDefault()) {
AttrsToRemove.addAttribute("denormal-fp-math");
} else if (Merged != DenormModeToMerge) {
// Overwrite existing attribute
FuncAttrs.addAttribute("denormal-fp-math",
CodeGenOpts.FPDenormalMode.str());
}

if (MergedF32 == llvm::DenormalMode::getDefault()) {
AttrsToRemove.addAttribute("denormal-fp-math-f32");
} else if (MergedF32 != DenormModeToMergeF32) {
// Overwrite existing attribute
FuncAttrs.addAttribute("denormal-fp-math-f32",
CodeGenOpts.FP32DenormalMode.str());
}

F.removeFnAttrs(AttrsToRemove);
addDenormalModeAttrs(Merged, MergedF32, FuncAttrs);
F.addFnAttrs(FuncAttrs);
}

void CodeGenModule::addDefaultFunctionDefinitionAttributes(
llvm::AttrBuilder &attrs) {
llvm::AttrBuilder &attrs) {
getDefaultFunctionAttributes(/*function name*/ "", /*optnone*/ false,
/*for call*/ false, attrs);
GetCPUAndFeaturesAttributes(GlobalDecl(), attrs);
Expand Down
3 changes: 2 additions & 1 deletion clang/lib/CodeGen/CodeGenAction.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -270,7 +270,8 @@ namespace clang {
// in LLVM IR.
if (F.isIntrinsic())
continue;
Gen->CGM().addDefaultFunctionDefinitionAttributes(F);
Gen->CGM().mergeDefaultFunctionDefinitionAttributes(F,
LM.Internalize);
}

CurLinkModule = LM.Module.get();
Expand Down
8 changes: 8 additions & 0 deletions clang/lib/CodeGen/CodeGenModule.h
Original file line number Diff line number Diff line change
Expand Up @@ -1272,6 +1272,8 @@ class CodeGenModule : public CodeGenTypeCache {
/// function which relies on particular fast-math attributes for correctness.
/// It's up to you to ensure that this is safe.
void addDefaultFunctionDefinitionAttributes(llvm::Function &F);
void mergeDefaultFunctionDefinitionAttributes(llvm::Function &F,
bool WillInternalize);

/// Like the overload taking a `Function &`, but intended specifically
/// for frontends that want to build on Clang's target-configuration logic.
Expand Down Expand Up @@ -1734,6 +1736,12 @@ class CodeGenModule : public CodeGenTypeCache {
/// function.
void SimplifyPersonality();

/// Helper function for getDefaultFunctionAttributes. Builds a set of function
/// attributes which can be simply added to a function.
void getTrivialDefaultFunctionAttributes(StringRef Name, bool HasOptnone,
bool AttrOnCallSite,
llvm::AttrBuilder &FuncAttrs);

/// Helper function for ConstructAttributeList and
/// addDefaultFunctionDefinitionAttributes. Builds a set of function
/// attributes to add to a function with the given properties.
Expand Down
14 changes: 14 additions & 0 deletions clang/test/CodeGen/denormalfpmode-f32.c
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,30 @@
// RUN: %clang_cc1 -S -fdenormal-fp-math=ieee %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-NONE,CHECK-F32-NONE
// RUN: %clang_cc1 -S -fdenormal-fp-math=preserve-sign %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-PS,CHECK-F32-NONE
// RUN: %clang_cc1 -S -fdenormal-fp-math=positive-zero %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-PZ,CHECK-F32-NONE
// RUN: %clang_cc1 -S -fdenormal-fp-math=dynamic %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-DYNAMIC,CHECK-F32-NONE

// RUN: %clang_cc1 -S -fdenormal-fp-math-f32=ieee %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-NONE,CHECK-F32-NONE
// RUN: %clang_cc1 -S -fdenormal-fp-math=ieee -fdenormal-fp-math-f32=ieee %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-NONE,CHECK-F32-NONE
// RUN: %clang_cc1 -S -fdenormal-fp-math=preserve-sign -fdenormal-fp-math-f32=ieee %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-PS,CHECK-F32-IEEE
// RUN: %clang_cc1 -S -fdenormal-fp-math=positive-zero -fdenormal-fp-math-f32=ieee %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-PZ,CHECK-F32-IEEE
// RUN: %clang_cc1 -S -fdenormal-fp-math=positive-zero -fdenormal-fp-math-f32=dynamic %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-PZ,CHECK-F32-DYNAMIC


// RUN: %clang_cc1 -S -fdenormal-fp-math-f32=preserve-sign %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-NONE,CHECK-F32-PS
// RUN: %clang_cc1 -S -fdenormal-fp-math=ieee -fdenormal-fp-math-f32=preserve-sign %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-NONE,CHECK-F32-PS
// RUN: %clang_cc1 -S -fdenormal-fp-math=preserve-sign -fdenormal-fp-math-f32=preserve-sign %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-PS,CHECK-F32-NONE
// RUN: %clang_cc1 -S -fdenormal-fp-math=positive-zero -fdenormal-fp-math-f32=preserve-sign %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-PZ,CHECK-F32-PS
// RUN: %clang_cc1 -S -fdenormal-fp-math=ieee -fdenormal-fp-math-f32=dynamic %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-NONE,CHECK-F32-DYNAMIC


// RUN: %clang_cc1 -S -fdenormal-fp-math-f32=positive-zero %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-NONE,CHECK-F32-PZ
// RUN: %clang_cc1 -S -fdenormal-fp-math-f32=dynamic %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-NONE,CHECK-F32-DYNAMIC
// RUN: %clang_cc1 -S -fdenormal-fp-math=ieee -fdenormal-fp-math-f32=positive-zero %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-NONE,CHECK-F32-PZ
// RUN: %clang_cc1 -S -fdenormal-fp-math=dynamic -fdenormal-fp-math-f32=positive-zero %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-DYNAMIC,CHECK-F32-PZ
// RUN: %clang_cc1 -S -fdenormal-fp-math=preserve-sign -fdenormal-fp-math-f32=positive-zero %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-PS,CHECK-F32-PZ
// RUN: %clang_cc1 -S -fdenormal-fp-math=positive-zero -fdenormal-fp-math-f32=positive-zero %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-PZ,CHECK-F32-NONE
// RUN: %clang_cc1 -S -fdenormal-fp-math=dynamic -fdenormal-fp-math-f32=dynamic %s -emit-llvm -o - | FileCheck %s --check-prefixes=CHECK-ATTR,CHECK-DYNAMIC,CHECK-F32-NONE


// CHECK-LABEL: main

Expand All @@ -25,11 +34,16 @@
// CHECK-IEEE: "denormal-fp-math"="ieee,ieee"
// CHECK-PS: "denormal-fp-math"="preserve-sign,preserve-sign"
// CHECK-PZ: "denormal-fp-math"="positive-zero,positive-zero"
// CHECK-DYNAMIC: "denormal-fp-math"="dynamic,dynamic"

// CHECK-F32-NONE-NOT:"denormal-fp-math-f32"
// CHECK-F32-IEEE: "denormal-fp-math-f32"="ieee,ieee"
// CHECK-F32-PS: "denormal-fp-math-f32"="preserve-sign,preserve-sign"
// CHECK-F32-PZ: "denormal-fp-math-f32"="positive-zero,positive-zero"


// CHECK-F32-DYNAMIC: "denormal-fp-math-f32"="dynamic,dynamic"

int main(void) {
return 0;
}
2 changes: 2 additions & 0 deletions clang/test/CodeGen/denormalfpmode.c
Original file line number Diff line number Diff line change
@@ -1,13 +1,15 @@
// RUN: %clang_cc1 -S -fdenormal-fp-math=ieee %s -emit-llvm -o - | FileCheck %s --check-prefix=CHECK-IEEE
// RUN: %clang_cc1 -S -fdenormal-fp-math=preserve-sign %s -emit-llvm -o - | FileCheck %s --check-prefix=CHECK-PS
// RUN: %clang_cc1 -S -fdenormal-fp-math=positive-zero %s -emit-llvm -o - | FileCheck %s --check-prefix=CHECK-PZ
// RUN: %clang_cc1 -S -fdenormal-fp-math=dynamic %s -emit-llvm -o - | FileCheck %s --check-prefix=CHECK-DYNAMIC

// CHECK-LABEL: main

// The ieee,ieee is the default, so omit the attribute
// CHECK-IEEE-NOT:"denormal-fp-math"
// CHECK-PS: attributes #0 = {{.*}}"denormal-fp-math"="preserve-sign,preserve-sign"{{.*}}
// CHECK-PZ: attributes #0 = {{.*}}"denormal-fp-math"="positive-zero,positive-zero"{{.*}}
// CHECK-DYNAMIC: attributes #0 = {{.*}}"denormal-fp-math"="dynamic,dynamic"{{.*}}

int main(void) {
return 0;
Expand Down
18 changes: 18 additions & 0 deletions clang/test/CodeGenCUDA/Inputs/ocml-sample.cl
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#pragma OPENCL EXTENSION cl_khr_fp16 : enable

half do_f16_stuff(half a, half b, half c) {
return __builtin_fmaf16(a, b, c) + 4.0h;
}

float do_f32_stuff(float a, float b, float c) {
return __builtin_fmaf(a, b, c) + 4.0f;
}

double do_f64_stuff(double a, double b, double c) {
return __builtin_fma(a, b, c) + 4.0;
}

__attribute__((weak))
float weak_do_f32_stuff(float a, float b, float c) {
return c * (a / b);
}

3 comments on commit bc37be1

@nunoplopes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alive2 complains about this test:

define float @canonicalize_neg_denorm_preserve_sign_output_dynamic_input() "denormal-fp-math"="preserve-sign,dynamic" {
; CHECK-LABEL: @canonicalize_neg_denorm_preserve_sign_output_dynamic_input(
; CHECK-NEXT:    ret float -0.000000e+00
;
  %ret = call float @llvm.canonicalize.f32(float bitcast (i32 -2139095041 to float))
  ret float %ret
}

The input denormal is dynamic. So it can produce -0/+0/0x807fffff, depending on the dynamic denormal mode.
So I don't think this can be optimized to -0.

@arsenm
Copy link
Contributor Author

@arsenm arsenm commented on bc37be1 Jul 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input denormal is dynamic. So it can produce -0/+0/0x807fffff, depending on the dynamic denormal mode. So I don't think this can be optimized to -0.

Fixed 952fe94

@nunoplopes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The input denormal is dynamic. So it can produce -0/+0/0x807fffff, depending on the dynamic denormal mode. So I don't think this can be optimized to -0.

Fixed 952fe94

Thank you!

Please sign in to comment.