Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix a bug in implementation of Smith's algorithm used in complex div. #78330

Merged
merged 8 commits into from
Jan 22, 2024

Conversation

zahiraam
Copy link
Contributor

@zahiraam zahiraam commented Jan 16, 2024

This patch fixes a bug in Smith's algorithm (thanks to @andykaylor who detected it) and makes sure that last option in command line rules.

@zahiraam zahiraam changed the title Fixed a bug in Smith's algorithm and made sure last option Fixed a bug in Smith's algorithm Jan 16, 2024
@zahiraam zahiraam changed the title Fixed a bug in Smith's algorithm Fix a bug in Smith's algorithm used in complex div/mul. Jan 16, 2024
Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's bad that we don't have a IR test that breaks when you change the IR output like this. Please add one.

// '-ffast-math' is used in the command line but followed by an
// '-fno-cx-limited-range'.
(CGF.getLangOpts().FastMath &&
Op.FPFeatures.getComplexRange() == LangOptions::CX_Full)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FastMath is tautologically true here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

std::string ComplexRangeStr = RenderComplexRangeOption("limited");
if (!ComplexRangeStr.empty())
CmdArgs.push_back(Args.MakeArgString(ComplexRangeStr));
ComplexRangeStr = RenderComplexRangeOption("limited");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to move the rendering of this option to the bottom of this function (around line 3124) to avoid inserting multiple, conflicting options if more than one -f[no-]cx-* option is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@zahiraam zahiraam marked this pull request as ready for review January 17, 2024 20:25
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:codegen labels Jan 17, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Jan 17, 2024

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Zahira Ammarguellat (zahiraam)

Changes

This patch fixes a bug in Smith's algorithm (thanks to @andykaylor who detected it) and makes sure that last option in command line rules.


Full diff: https://github.com/llvm/llvm-project/pull/78330.diff

4 Files Affected:

  • (modified) clang/lib/CodeGen/CGExprComplex.cpp (+6-2)
  • (modified) clang/lib/Driver/ToolChains/Clang.cpp (+10-9)
  • (modified) clang/test/CodeGen/cx-complex-range.c (+16)
  • (modified) clang/test/Driver/range.c (+7)
diff --git a/clang/lib/CodeGen/CGExprComplex.cpp b/clang/lib/CodeGen/CGExprComplex.cpp
index e532794b71bdb4a..6fbd8f19eeb50a4 100644
--- a/clang/lib/CodeGen/CGExprComplex.cpp
+++ b/clang/lib/CodeGen/CGExprComplex.cpp
@@ -936,7 +936,7 @@ ComplexPairTy ComplexExprEmitter::EmitRangeReductionDiv(llvm::Value *LHSr,
   llvm::Value *RC = Builder.CreateFMul(CdD, RHSr);  // rc
   llvm::Value *DpRC = Builder.CreateFAdd(RHSi, RC); // tmp=d+rc
 
-  llvm::Value *T7 = Builder.CreateFMul(LHSr, RC);    // ar
+  llvm::Value *T7 = Builder.CreateFMul(LHSr, CdD);   // ar
   llvm::Value *T8 = Builder.CreateFAdd(T7, LHSi);    // ar+b
   llvm::Value *DSTFr = Builder.CreateFDiv(T8, DpRC); // (ar+b)/tmp
 
@@ -978,7 +978,11 @@ ComplexPairTy ComplexExprEmitter::EmitBinDiv(const BinOpInfo &Op) {
       return EmitRangeReductionDiv(LHSr, LHSi, RHSr, RHSi);
     else if (Op.FPFeatures.getComplexRange() == LangOptions::CX_Limited)
       return EmitAlgebraicDiv(LHSr, LHSi, RHSr, RHSi);
-    else if (!CGF.getLangOpts().FastMath) {
+    else if (!CGF.getLangOpts().FastMath ||
+             // '-ffast-math' is used in the command line but followed by an
+             // '-fno-cx-limited-range'.
+             (CGF.getLangOpts().FastMath &&
+              Op.FPFeatures.getComplexRange() == LangOptions::CX_Full)) {
       LHSi = OrigLHSi;
       // If we have a complex operand on the RHS and FastMath is not allowed, we
       // delegate to a libcall to handle all of the complexities and minimize
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 997ec2d491d02c8..7e25347677e23e8 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -2765,6 +2765,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
   StringRef Float16ExcessPrecision = "";
   StringRef BFloat16ExcessPrecision = "";
   LangOptions::ComplexRangeKind Range = LangOptions::ComplexRangeKind::CX_Full;
+  std::string ComplexRangeStr = "";
 
   if (const Arg *A = Args.getLastArg(options::OPT_flimited_precision_EQ)) {
     CmdArgs.push_back("-mlimit-float-precision");
@@ -2780,24 +2781,24 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
     case options::OPT_fcx_limited_range: {
       EmitComplexRangeDiag(D, Range, LangOptions::ComplexRangeKind::CX_Limited);
       Range = LangOptions::ComplexRangeKind::CX_Limited;
-      std::string ComplexRangeStr = RenderComplexRangeOption("limited");
-      if (!ComplexRangeStr.empty())
-        CmdArgs.push_back(Args.MakeArgString(ComplexRangeStr));
+      ComplexRangeStr = RenderComplexRangeOption("limited");
       break;
     }
     case options::OPT_fno_cx_limited_range:
+      EmitComplexRangeDiag(D, Range, LangOptions::ComplexRangeKind::CX_Full);
       Range = LangOptions::ComplexRangeKind::CX_Full;
+      ComplexRangeStr = RenderComplexRangeOption("full");
       break;
     case options::OPT_fcx_fortran_rules: {
       EmitComplexRangeDiag(D, Range, LangOptions::ComplexRangeKind::CX_Fortran);
       Range = LangOptions::ComplexRangeKind::CX_Fortran;
-      std::string ComplexRangeStr = RenderComplexRangeOption("fortran");
-      if (!ComplexRangeStr.empty())
-        CmdArgs.push_back(Args.MakeArgString(ComplexRangeStr));
+      ComplexRangeStr = RenderComplexRangeOption("fortran");
       break;
     }
     case options::OPT_fno_cx_fortran_rules:
+      EmitComplexRangeDiag(D, Range, LangOptions::ComplexRangeKind::CX_Full);
       Range = LangOptions::ComplexRangeKind::CX_Full;
+      ComplexRangeStr = RenderComplexRangeOption("full");
       break;
     case options::OPT_ffp_model_EQ: {
       // If -ffp-model= is seen, reset to fno-fast-math
@@ -3068,9 +3069,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
       SeenUnsafeMathModeOption = true;
       // ffast-math enables fortran rules for complex multiplication and
       // division.
-      std::string ComplexRangeStr = RenderComplexRangeOption("limited");
-      if (!ComplexRangeStr.empty())
-        CmdArgs.push_back(Args.MakeArgString(ComplexRangeStr));
+      ComplexRangeStr = RenderComplexRangeOption("limited");
       break;
     }
     case options::OPT_fno_fast_math:
@@ -3227,6 +3226,8 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
                    options::OPT_fstrict_float_cast_overflow, false))
     CmdArgs.push_back("-fno-strict-float-cast-overflow");
 
+  if (!ComplexRangeStr.empty())
+    CmdArgs.push_back(Args.MakeArgString(ComplexRangeStr));
   if (Args.hasArg(options::OPT_fcx_limited_range))
     CmdArgs.push_back("-fcx-limited-range");
   if (Args.hasArg(options::OPT_fcx_fortran_rules))
diff --git a/clang/test/CodeGen/cx-complex-range.c b/clang/test/CodeGen/cx-complex-range.c
index 8368fa611335cca..2d8507c710f2021 100644
--- a/clang/test/CodeGen/cx-complex-range.c
+++ b/clang/test/CodeGen/cx-complex-range.c
@@ -15,9 +15,25 @@
 // RUN: -ffast-math -complex-range=limited -emit-llvm -o - %s \
 // RUN: | FileCheck %s --check-prefix=LMTD-FAST
 
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu \
+// RUN: -ffast-math -complex-range=full -emit-llvm -o - %s \
+// RUN: | FileCheck %s --check-prefix=FULL
+
 // RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown \
 // RUN: -fno-cx-fortran-rules -o - | FileCheck %s --check-prefix=FULL
 
+// RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown \
+// RUN: -fcx-limited-range -fno-cx-limited-range -o - \
+// RUN: | FileCheck %s --check-prefix=FULL
+
+// RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown \
+// RUN: -fno-cx-limited-range -fcx-limited-range -o - \
+// RUN: | FileCheck %s --check-prefix=FULL
+
+// RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown \
+// RUN: -fno-cx-fortran-rules -fcx-fortran-rules -o - \
+// RUN: | FileCheck %s --check-prefix=FULL
+
 _Complex float div(_Complex float a, _Complex float b) {
   // LABEL: define {{.*}} @div(
   // FULL:  call {{.*}} @__divsc3
diff --git a/clang/test/Driver/range.c b/clang/test/Driver/range.c
index 8d456a997d6967e..045d9c7d3d802ae 100644
--- a/clang/test/Driver/range.c
+++ b/clang/test/Driver/range.c
@@ -6,6 +6,9 @@
 // RUN: %clang -### -target x86_64 -fno-cx-limited-range -c %s 2>&1 \
 // RUN:   | FileCheck %s
 
+// RUN: %clang -### -target x86_64 -fcx-limited-range -fno-cx-limited-range \
+// RUN: -c %s 2>&1 | FileCheck --check-prefix=FULL %s
+
 // RUN: %clang -### -target x86_64 -fcx-fortran-rules -c %s 2>&1 \
 // RUN:   | FileCheck --check-prefix=FRTRN %s
 
@@ -29,7 +32,11 @@
 // RUN: %clang -### -target x86_64 -fcx-limited-range -ffast-math -c %s 2>&1 \
 // RUN:   | FileCheck --check-prefix=LMTD %s
 
+// RUN: %clang -### -target x86_64 -ffast-math -fno-cx-limited-range -c %s 2>&1 \
+// RUN:   | FileCheck --check-prefix=FULL %s
+
 // LMTD: -complex-range=limited
+// FULL: -complex-range=full
 // LMTD-NOT: -complex-range=fortran
 // CHECK-NOT: -complex-range=limited
 // FRTRN: -complex-range=fortran

@llvmbot llvmbot added the clang:frontend Language frontend issues, e.g. anything involving "Sema" label Jan 18, 2024
@zahiraam
Copy link
Contributor Author

It's bad that we don't have a IR test that breaks when you change the IR output like this. Please add one.

Done.

@zahiraam zahiraam changed the title Fix a bug in Smith's algorithm used in complex div/mul. Fix a bug in Smith's algorithm used in complex divl. Jan 18, 2024
@zahiraam zahiraam changed the title Fix a bug in Smith's algorithm used in complex divl. Fix a bug in Smith's algorithm used in complex div. Jan 18, 2024
Copy link
Contributor

@rjmccall rjmccall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as far as my requests; please wait for approval from the other reviewers.

This is unrelated, but I wonder if we should proactively outline this sequence; it's quite long.

@zahiraam
Copy link
Contributor Author

This is unrelated, but I wonder if we should proactively outline this sequence; it's quite long.

Not sure what you mean.

@rjmccall
Copy link
Contributor

This is unrelated, but I wonder if we should proactively outline this sequence; it's quite long.

Not sure what you mean.

I mean that we could lazily emit a helper function like __clang_smiths_division_double and call it instead of emitting like twenty instructions inline.

To be clear, I'm not suggesting you need to do this, and certainly not in this patch.

@andykaylor
Copy link
Contributor

I mean that we could lazily emit a helper function like __clang_smiths_division_double and call it instead of emitting like twenty instructions inline.

That's a reasonable suggestion. We may also want to add the ability to emit a helper function to handle the full range instead of calling the compiler-rt function when -ffreestanding has been used.

Are you aware of the PR that @jcranmer-intel posted to add intrinsics for complex multiplication and division (#68742)? That would allow us to move this problem out of the front end, which seems good.

@rjmccall
Copy link
Contributor

If LLVM wants to provide intrinsics that cover patterns that the frontend wants to emit, that'd be great. The frontend will still have to make decisions about e.g. whether to request Smith's algorithm, of course.

@zahiraam
Copy link
Contributor Author

@rjmccall thanks for the review and clarification. @andykaylor do you have any other comments about this?

@zahiraam
Copy link
Contributor Author

@AaronBallman any comments about this? Thanks.

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the current changes are reasonable to land for Clang 18 so we get the fix out to users. I don't have strong opinions on pushing this down into LLVM or outlining the function like John suggested; either approach is fine by me. I think outlining the function might make sense but I don't think it's a requirement either (perhaps add a FIXME comment suggesting it?).

ComplexRangeStr += "fortran";
break;
case LangOptions::ComplexRangeKind::CX_None:
ComplexRangeStr = "";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it valid to pass -complex-range= without anything after the equals sign? Based on how this is called, I think this case should be an assertion instead, WDYT?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! you are right. We should actually never land here because I am calling RenderComplexRangeOption with this condition:
if (Range != LangOptions::ComplexRangeKind::CX_None)
ComplexRangeStr = RenderComplexRangeOption(Range);
I guess an llvm-unreachable" should be added here?

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@andykaylor andykaylor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@zahiraam zahiraam changed the title Fix a bug in Smith's algorithm used in complex div. Fix a bug in implementation of Smith's algorithm used in complex div. Jan 22, 2024
@zahiraam zahiraam merged commit 364a5b5 into llvm:main Jan 22, 2024
4 checks passed
shiltian added a commit that referenced this pull request Jan 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants