-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
LangRef: Add "dynamic" option to "denormal-fp-math"
This is stricter than the default "ieee", and should probably be the default. This patch leaves the default alone. I can change this in a future patch. There are non-reversible transforms I would like to perform which are legal under IEEE denormal handling, but illegal with flushing zero behavior. Namely, conversions between llvm.is.fpclass and fcmp with zeroes. Under "ieee" handling, it is legal to translate between llvm.is.fpclass(x, fcZero) and fcmp x, 0. Under "preserve-sign" handling, it is legal to translate between llvm.is.fpclass(x, fcSubnormal|fcZero) and fcmp x, 0. I would like to compile and distribute some math library functions in a mode where it's callable from code with and without denormals enabled, which requires not changing the compares with denormals or zeroes. If an IEEE function transforms an llvm.is.fpclass call into an fcmp 0, it is no longer possible to call the function from code with denormals enabled, or write an optimization to move the function into a denormal flushing mode. For the original function, if x was a denormal, the class would evaluate to false. If the function compiled with denormal handling was converted to or called from a preserve-sign function, the fcmp now evaluates to true. This could also be of use for strictfp handling, where code may be changing the denormal mode. Alternative name could be "unknown". Replaces the old AMDGPU custom inlining logic with more conservative logic which tries to permit inlining for callees with dynamic handling and avoids inlining other mismatched modes.
- Loading branch information
Showing
26 changed files
with
1,490 additions
and
104 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
#pragma OPENCL EXTENSION cl_khr_fp16 : enable | ||
|
||
half do_f16_stuff(half a, half b, half c) { | ||
return __builtin_fmaf16(a, b, c) + 4.0h; | ||
} | ||
|
||
float do_f32_stuff(float a, float b, float c) { | ||
return __builtin_fmaf(a, b, c) + 4.0f; | ||
} | ||
|
||
double do_f64_stuff(double a, double b, double c) { | ||
return __builtin_fma(a, b, c) + 4.0; | ||
} | ||
|
||
__attribute__((weak)) | ||
float weak_do_f32_stuff(float a, float b, float c) { | ||
return c * (a / b); | ||
} |
Oops, something went wrong.
bc37be1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alive2 complains about this test:
The input denormal is dynamic. So it can produce -0/+0/0x807fffff, depending on the dynamic denormal mode.
So I don't think this can be optimized to -0.
bc37be1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed 952fe94
bc37be1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!