Fix a bug in implementation of Smith's algorithm used in complex div. #78330

zahiraam · 2024-01-16T19:26:07Z

This patch fixes a bug in Smith's algorithm (thanks to @andykaylor who detected it) and makes sure that last option in command line rules.

in command line rules.

rjmccall

It's bad that we don't have a IR test that breaks when you change the IR output like this. Please add one.

rjmccall · 2024-01-17T18:42:34Z

clang/lib/CodeGen/CGExprComplex.cpp

+             // '-ffast-math' is used in the command line but followed by an
+             // '-fno-cx-limited-range'.
+             (CGF.getLangOpts().FastMath &&
+              Op.FPFeatures.getComplexRange() == LangOptions::CX_Full)) {


FastMath is tautologically true here.

andykaylor · 2024-01-17T18:53:01Z

clang/lib/Driver/ToolChains/Clang.cpp

-      std::string ComplexRangeStr = RenderComplexRangeOption("limited");
-      if (!ComplexRangeStr.empty())
-        CmdArgs.push_back(Args.MakeArgString(ComplexRangeStr));
+      ComplexRangeStr = RenderComplexRangeOption("limited");


I think it would be better to move the rendering of this option to the bottom of this function (around line 3124) to avoid inserting multiple, conflicting options if more than one -f[no-]cx-* option is used.

llvmbot · 2024-01-17T20:26:07Z

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Zahira Ammarguellat (zahiraam)

Changes

This patch fixes a bug in Smith's algorithm (thanks to @andykaylor who detected it) and makes sure that last option in command line rules.

Full diff: https://github.com/llvm/llvm-project/pull/78330.diff

4 Files Affected:

(modified) clang/lib/CodeGen/CGExprComplex.cpp (+6-2)
(modified) clang/lib/Driver/ToolChains/Clang.cpp (+10-9)
(modified) clang/test/CodeGen/cx-complex-range.c (+16)
(modified) clang/test/Driver/range.c (+7)

diff --git a/clang/lib/CodeGen/CGExprComplex.cpp b/clang/lib/CodeGen/CGExprComplex.cpp
index e532794b71bdb4a..6fbd8f19eeb50a4 100644
--- a/clang/lib/CodeGen/CGExprComplex.cpp
+++ b/clang/lib/CodeGen/CGExprComplex.cpp
@@ -936,7 +936,7 @@ ComplexPairTy ComplexExprEmitter::EmitRangeReductionDiv(llvm::Value *LHSr,
   llvm::Value *RC = Builder.CreateFMul(CdD, RHSr);  // rc
   llvm::Value *DpRC = Builder.CreateFAdd(RHSi, RC); // tmp=d+rc
 
-  llvm::Value *T7 = Builder.CreateFMul(LHSr, RC);    // ar
+  llvm::Value *T7 = Builder.CreateFMul(LHSr, CdD);   // ar
   llvm::Value *T8 = Builder.CreateFAdd(T7, LHSi);    // ar+b
   llvm::Value *DSTFr = Builder.CreateFDiv(T8, DpRC); // (ar+b)/tmp
 
@@ -978,7 +978,11 @@ ComplexPairTy ComplexExprEmitter::EmitBinDiv(const BinOpInfo &Op) {
       return EmitRangeReductionDiv(LHSr, LHSi, RHSr, RHSi);
     else if (Op.FPFeatures.getComplexRange() == LangOptions::CX_Limited)
       return EmitAlgebraicDiv(LHSr, LHSi, RHSr, RHSi);
-    else if (!CGF.getLangOpts().FastMath) {
+    else if (!CGF.getLangOpts().FastMath ||
+             // '-ffast-math' is used in the command line but followed by an
+             // '-fno-cx-limited-range'.
+             (CGF.getLangOpts().FastMath &&
+              Op.FPFeatures.getComplexRange() == LangOptions::CX_Full)) {
       LHSi = OrigLHSi;
       // If we have a complex operand on the RHS and FastMath is not allowed, we
       // delegate to a libcall to handle all of the complexities and minimize
diff --git a/clang/lib/Driver/ToolChains/Clang.cpp b/clang/lib/Driver/ToolChains/Clang.cpp
index 997ec2d491d02c8..7e25347677e23e8 100644
--- a/clang/lib/Driver/ToolChains/Clang.cpp
+++ b/clang/lib/Driver/ToolChains/Clang.cpp
@@ -2765,6 +2765,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
   StringRef Float16ExcessPrecision = "";
   StringRef BFloat16ExcessPrecision = "";
   LangOptions::ComplexRangeKind Range = LangOptions::ComplexRangeKind::CX_Full;
+  std::string ComplexRangeStr = "";
 
   if (const Arg *A = Args.getLastArg(options::OPT_flimited_precision_EQ)) {
     CmdArgs.push_back("-mlimit-float-precision");
@@ -2780,24 +2781,24 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
     case options::OPT_fcx_limited_range: {
       EmitComplexRangeDiag(D, Range, LangOptions::ComplexRangeKind::CX_Limited);
       Range = LangOptions::ComplexRangeKind::CX_Limited;
-      std::string ComplexRangeStr = RenderComplexRangeOption("limited");
-      if (!ComplexRangeStr.empty())
-        CmdArgs.push_back(Args.MakeArgString(ComplexRangeStr));
+      ComplexRangeStr = RenderComplexRangeOption("limited");
       break;
     }
     case options::OPT_fno_cx_limited_range:
+      EmitComplexRangeDiag(D, Range, LangOptions::ComplexRangeKind::CX_Full);
       Range = LangOptions::ComplexRangeKind::CX_Full;
+      ComplexRangeStr = RenderComplexRangeOption("full");
       break;
     case options::OPT_fcx_fortran_rules: {
       EmitComplexRangeDiag(D, Range, LangOptions::ComplexRangeKind::CX_Fortran);
       Range = LangOptions::ComplexRangeKind::CX_Fortran;
-      std::string ComplexRangeStr = RenderComplexRangeOption("fortran");
-      if (!ComplexRangeStr.empty())
-        CmdArgs.push_back(Args.MakeArgString(ComplexRangeStr));
+      ComplexRangeStr = RenderComplexRangeOption("fortran");
       break;
     }
     case options::OPT_fno_cx_fortran_rules:
+      EmitComplexRangeDiag(D, Range, LangOptions::ComplexRangeKind::CX_Full);
       Range = LangOptions::ComplexRangeKind::CX_Full;
+      ComplexRangeStr = RenderComplexRangeOption("full");
       break;
     case options::OPT_ffp_model_EQ: {
       // If -ffp-model= is seen, reset to fno-fast-math
@@ -3068,9 +3069,7 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
       SeenUnsafeMathModeOption = true;
       // ffast-math enables fortran rules for complex multiplication and
       // division.
-      std::string ComplexRangeStr = RenderComplexRangeOption("limited");
-      if (!ComplexRangeStr.empty())
-        CmdArgs.push_back(Args.MakeArgString(ComplexRangeStr));
+      ComplexRangeStr = RenderComplexRangeOption("limited");
       break;
     }
     case options::OPT_fno_fast_math:
@@ -3227,6 +3226,8 @@ static void RenderFloatingPointOptions(const ToolChain &TC, const Driver &D,
                    options::OPT_fstrict_float_cast_overflow, false))
     CmdArgs.push_back("-fno-strict-float-cast-overflow");
 
+  if (!ComplexRangeStr.empty())
+    CmdArgs.push_back(Args.MakeArgString(ComplexRangeStr));
   if (Args.hasArg(options::OPT_fcx_limited_range))
     CmdArgs.push_back("-fcx-limited-range");
   if (Args.hasArg(options::OPT_fcx_fortran_rules))
diff --git a/clang/test/CodeGen/cx-complex-range.c b/clang/test/CodeGen/cx-complex-range.c
index 8368fa611335cca..2d8507c710f2021 100644
--- a/clang/test/CodeGen/cx-complex-range.c
+++ b/clang/test/CodeGen/cx-complex-range.c
@@ -15,9 +15,25 @@
 // RUN: -ffast-math -complex-range=limited -emit-llvm -o - %s \
 // RUN: | FileCheck %s --check-prefix=LMTD-FAST
 
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu \
+// RUN: -ffast-math -complex-range=full -emit-llvm -o - %s \
+// RUN: | FileCheck %s --check-prefix=FULL
+
 // RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown \
 // RUN: -fno-cx-fortran-rules -o - | FileCheck %s --check-prefix=FULL
 
+// RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown \
+// RUN: -fcx-limited-range -fno-cx-limited-range -o - \
+// RUN: | FileCheck %s --check-prefix=FULL
+
+// RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown \
+// RUN: -fno-cx-limited-range -fcx-limited-range -o - \
+// RUN: | FileCheck %s --check-prefix=FULL
+
+// RUN: %clang_cc1 %s -O0 -emit-llvm -triple x86_64-unknown-unknown \
+// RUN: -fno-cx-fortran-rules -fcx-fortran-rules -o - \
+// RUN: | FileCheck %s --check-prefix=FULL
+
 _Complex float div(_Complex float a, _Complex float b) {
   // LABEL: define {{.*}} @div(
   // FULL:  call {{.*}} @__divsc3
diff --git a/clang/test/Driver/range.c b/clang/test/Driver/range.c
index 8d456a997d6967e..045d9c7d3d802ae 100644
--- a/clang/test/Driver/range.c
+++ b/clang/test/Driver/range.c
@@ -6,6 +6,9 @@
 // RUN: %clang -### -target x86_64 -fno-cx-limited-range -c %s 2>&1 \
 // RUN:   | FileCheck %s
 
+// RUN: %clang -### -target x86_64 -fcx-limited-range -fno-cx-limited-range \
+// RUN: -c %s 2>&1 | FileCheck --check-prefix=FULL %s
+
 // RUN: %clang -### -target x86_64 -fcx-fortran-rules -c %s 2>&1 \
 // RUN:   | FileCheck --check-prefix=FRTRN %s
 
@@ -29,7 +32,11 @@
 // RUN: %clang -### -target x86_64 -fcx-limited-range -ffast-math -c %s 2>&1 \
 // RUN:   | FileCheck --check-prefix=LMTD %s
 
+// RUN: %clang -### -target x86_64 -ffast-math -fno-cx-limited-range -c %s 2>&1 \
+// RUN:   | FileCheck --check-prefix=FULL %s
+
 // LMTD: -complex-range=limited
+// FULL: -complex-range=full
 // LMTD-NOT: -complex-range=fortran
 // CHECK-NOT: -complex-range=limited
 // FRTRN: -complex-range=fortran

zahiraam · 2024-01-18T15:45:25Z

It's bad that we don't have a IR test that breaks when you change the IR output like this. Please add one.

Done.

rjmccall

LGTM as far as my requests; please wait for approval from the other reviewers.

This is unrelated, but I wonder if we should proactively outline this sequence; it's quite long.

zahiraam · 2024-01-18T18:35:35Z

This is unrelated, but I wonder if we should proactively outline this sequence; it's quite long.

Not sure what you mean.

rjmccall · 2024-01-18T19:00:19Z

This is unrelated, but I wonder if we should proactively outline this sequence; it's quite long.

Not sure what you mean.

I mean that we could lazily emit a helper function like __clang_smiths_division_double and call it instead of emitting like twenty instructions inline.

To be clear, I'm not suggesting you need to do this, and certainly not in this patch.

andykaylor · 2024-01-18T19:34:17Z

I mean that we could lazily emit a helper function like __clang_smiths_division_double and call it instead of emitting like twenty instructions inline.

That's a reasonable suggestion. We may also want to add the ability to emit a helper function to handle the full range instead of calling the compiler-rt function when -ffreestanding has been used.

Are you aware of the PR that @jcranmer-intel posted to add intrinsics for complex multiplication and division (#68742)? That would allow us to move this problem out of the front end, which seems good.

rjmccall · 2024-01-18T20:11:48Z

If LLVM wants to provide intrinsics that cover patterns that the frontend wants to emit, that'd be great. The frontend will still have to make decisions about e.g. whether to request Smith's algorithm, of course.

zahiraam · 2024-01-19T13:36:09Z

@rjmccall thanks for the review and clarification. @andykaylor do you have any other comments about this?

zahiraam · 2024-01-19T13:36:37Z

@AaronBallman any comments about this? Thanks.

AaronBallman

I think the current changes are reasonable to land for Clang 18 so we get the fix out to users. I don't have strong opinions on pushing this down into LLVM or outlining the function like John suggested; either approach is fine by me. I think outlining the function might make sense but I don't think it's a requirement either (perhaps add a FIXME comment suggesting it?).

AaronBallman · 2024-01-19T13:49:53Z

clang/lib/Driver/ToolChains/Clang.cpp

+    ComplexRangeStr += "fortran";
+    break;
+  case LangOptions::ComplexRangeKind::CX_None:
+    ComplexRangeStr = "";


Is it valid to pass -complex-range= without anything after the equals sign? Based on how this is called, I think this case should be an assertion instead, WDYT?

Oh! you are right. We should actually never land here because I am calling RenderComplexRangeOption with this condition:
if (Range != LangOptions::ComplexRangeKind::CX_None)
ComplexRangeStr = RenderComplexRangeOption(Range);
I guess an llvm-unreachable" should be added here?

AaronBallman

LGTM!

andykaylor

lgtm

Fixed a bug in Smith's algorithm and made sure last option

8f89175

in command line rules.

zahiraam changed the title ~~Fixed a bug in Smith's algorithm and made sure last option~~ Fixed a bug in Smith's algorithm Jan 16, 2024

zahiraam changed the title ~~Fixed a bug in Smith's algorithm~~ Fix a bug in Smith's algorithm used in complex div/mul. Jan 16, 2024

zahiraam added 4 commits January 16, 2024 14:37

Merge remote-tracking branch 'origin/main' into CxRange

d45877f

Merge branch 'main' into CxRange

f063010

Testing.

989659a

Deleted test.c added by mistake.

aefa661

zahiraam requested review from andykaylor, rjmccall and AaronBallman January 17, 2024 13:08

rjmccall reviewed Jan 17, 2024

View reviewed changes

andykaylor reviewed Jan 17, 2024

View reviewed changes

zahiraam marked this pull request as ready for review January 17, 2024 20:25

llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:codegen IR generation bugs: mangling, exceptions, etc. labels Jan 17, 2024

Addressed review comments.

f6918d4

llvmbot added the clang:frontend Language frontend issues, e.g. anything involving "Sema" label Jan 18, 2024

zahiraam requested review from rjmccall and andykaylor January 18, 2024 15:45

zahiraam changed the title ~~Fix a bug in Smith's algorithm used in complex div/mul.~~ Fix a bug in Smith's algorithm used in complex divl. Jan 18, 2024

zahiraam changed the title ~~Fix a bug in Smith's algorithm used in complex divl.~~ Fix a bug in Smith's algorithm used in complex div. Jan 18, 2024

rjmccall approved these changes Jan 18, 2024

View reviewed changes

AaronBallman reviewed Jan 19, 2024

View reviewed changes

Added assert in RenderComplexRangeOption.

828570d

zahiraam requested a review from AaronBallman January 19, 2024 20:27

AaronBallman approved these changes Jan 22, 2024

View reviewed changes

Added a FIXME suggested in the review.

9d910a9

andykaylor approved these changes Jan 22, 2024

View reviewed changes

zahiraam changed the title ~~Fix a bug in Smith's algorithm used in complex div.~~ Fix a bug in implementation of Smith's algorithm used in complex div. Jan 22, 2024

zahiraam merged commit 364a5b5 into llvm:main Jan 22, 2024

shiltian added a commit that referenced this pull request Jan 22, 2024

[NFC][Clang] Fix compile warning caused by #78330

c532033

fmayer mentioned this pull request Feb 7, 2025

[clang] Assert the enum FPOpts and LangOpts fit into the storage #126166

Merged

Fix a bug in implementation of Smith's algorithm used in complex div. #78330

Fix a bug in implementation of Smith's algorithm used in complex div. #78330

Uh oh!

Conversation

zahiraam commented Jan 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rjmccall left a comment

Choose a reason for hiding this comment

Uh oh!

rjmccall Jan 17, 2024

Choose a reason for hiding this comment

Uh oh!

zahiraam Jan 18, 2024

Choose a reason for hiding this comment

Uh oh!

andykaylor Jan 17, 2024

Choose a reason for hiding this comment

Uh oh!

zahiraam Jan 18, 2024

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Jan 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zahiraam commented Jan 18, 2024

Uh oh!

rjmccall left a comment

Choose a reason for hiding this comment

Uh oh!

zahiraam commented Jan 18, 2024

Uh oh!

rjmccall commented Jan 18, 2024

Uh oh!

andykaylor commented Jan 18, 2024

Uh oh!

rjmccall commented Jan 18, 2024

Uh oh!

zahiraam commented Jan 19, 2024

Uh oh!

zahiraam commented Jan 19, 2024

Uh oh!

AaronBallman left a comment

Choose a reason for hiding this comment

Uh oh!

AaronBallman Jan 19, 2024

Choose a reason for hiding this comment

Uh oh!

zahiraam Jan 19, 2024

Choose a reason for hiding this comment

Uh oh!

AaronBallman left a comment

Choose a reason for hiding this comment

Uh oh!

andykaylor left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

zahiraam commented Jan 16, 2024 •

edited

Loading

llvmbot commented Jan 17, 2024 •

edited

Loading