Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AArch64][CodeGen] Fix crash when fptrunc returns fp16 with +nofp attr #81724

Merged
merged 8 commits into from
Feb 22, 2024

Conversation

nasherm
Copy link
Contributor

@nasherm nasherm commented Feb 14, 2024

When performing lowering of the fptrunc opcode returning fp16 with the +nofp flag enabled we could trigger a compiler crash. This is because we had no custom lowering implemented. This patch implements a custom lowering for the case in which we need to promote an fp16 return type for fptrunc when the +nofp attr is enabled.

@llvmbot
Copy link
Collaborator

llvmbot commented Feb 14, 2024

@llvm/pr-subscribers-backend-aarch64

Author: Nashe Mncube (nasherm)

Changes

When performing lowering of the fptrunc opcode returning fp16 with the +nofp flag enabled we could trigger a compiler crash. This is because we had no custom lowering implemented. This patch implements a custom lowering for the case in which we need to promote an fp16 return type for fptrunc when the +nofp attr is enabled.


Full diff: https://github.com/llvm/llvm-project/pull/81724.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AArch64/AArch64ISelLowering.cpp (+9)
  • (added) llvm/test/CodeGen/AArch64/float16-promotion-with-nofp.ll (+21)
diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index a3b7e3128ac1a4..c58600ae386730 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -25008,6 +25008,15 @@ void AArch64TargetLowering::ReplaceNodeResults(
       Results.push_back(
           LowerToPredicatedOp(SDValue(N, 0), DAG, AArch64ISD::MULHU_PRED));
     return;
+  case ISD::FP_ROUND:{
+    if (N->getValueType(0) == MVT::f16 && !Subtarget->hasFullFP16()) {
+        // Promote fp16 result to legal type
+        SDLoc DL(N);
+        EVT NVT = getTypeToTransformTo(*DAG.getContext(), N->getValueType(0));
+        Results.push_back(DAG.getNode(ISD::FP16_TO_FP, DL, NVT, N->getOperand(0)));
+    }
+    return;
+  }
   case ISD::FP_TO_UINT:
   case ISD::FP_TO_SINT:
   case ISD::STRICT_FP_TO_SINT:
diff --git a/llvm/test/CodeGen/AArch64/float16-promotion-with-nofp.ll b/llvm/test/CodeGen/AArch64/float16-promotion-with-nofp.ll
new file mode 100644
index 00000000000000..03426579131a1d
--- /dev/null
+++ b/llvm/test/CodeGen/AArch64/float16-promotion-with-nofp.ll
@@ -0,0 +1,21 @@
+; RUN: llc -mcpu=cortex-r82 -O1 -o - %s | FileCheck %s
+
+; Source used:
+; __fp16 f2h(float a) { return a; }
+; Compiled with: clang --target=aarch64-arm-none-eabi -march=armv8-r+nofp
+
+define hidden noundef nofpclass(nan inf) half @f2h(float noundef nofpclass(nan inf) %a) local_unnamed_addr #0 {
+;CHECK:      f2h:                                    // @f2h
+;CHECK-NEXT: // %bb.0:                               // %entry
+;CHECK-NEXT:     str x30, [sp, #-16]!                // 8-byte Folded Spill
+;CHECK-NEXT:     bl  __gnu_h2f_ieee
+;CHECK-NEXT:     ldr x30, [sp], #16                  // 8-byte Folded Reload
+;CHECK-NEXT:     ret
+entry:
+  %0 = fptrunc float %a to half
+  ret half %0
+}
+
+
+attributes #0 = { mustprogress nofree norecurse nosync nounwind willreturn memory(none) "denormal-fp-math"="preserve-sign,preserve-sign" "no-infs-fp-math"="true" "no-nans-fp-math"="true" "no-signed-zeros-fp-math"="true" "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic" "target-features"="+crc,+lse,+pauth,+ras,+rcpc,+sb,+ssbs,+v8r,-complxnum,-dotprod,-fmv,-fp-armv8,-fp16fml,-fullfp16,-jsconv,-neon,-rdm" }
+

Copy link

github-actions bot commented Feb 14, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@nasherm
Copy link
Contributor Author

nasherm commented Feb 14, 2024

Investigating why new test is failing

@nasherm nasherm marked this pull request as draft February 14, 2024 11:40
@@ -25008,6 +25008,16 @@ void AArch64TargetLowering::ReplaceNodeResults(
Results.push_back(
LowerToPredicatedOp(SDValue(N, 0), DAG, AArch64ISD::MULHU_PRED));
return;
case ISD::FP_ROUND: {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it is doing something that should be handled by the generic legalizer.

Would it be better to not mark the FP_RUND as custom instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm looking into not making this custom and offloading the work to the generic legalizer but it appears to cause another crash elsewhere. Currently debugging this

Copy link
Contributor Author

@nasherm nasherm Feb 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a generic legalizer we run into the following:

                                                                                                                     
Initial selection DAG: %bb.0 'f2h:entry'                                                                             
SelectionDAG has 11 nodes:                                                                                           
  t0: ch,glue = EntryToken                                                                                           
            t2: i32,ch = CopyFromReg t0, Register:i32 %0                                                             
          t3: f32 = bitcast t2                                                                                       
        t5: f16 = fp_round t3, TargetConstant:i64<0>                                                                 
      t6: i16 = bitcast t5                                                                                           
    t7: i32 = any_extend t6                                                                                          
  t9: ch,glue = CopyToReg t0, Register:i32 $w0, t7                                                                   
  t10: ch = AArch64ISD::RET_GLUE t9, Register:i32 $w0, t9:1                                                          
                                                                                                                     
                                                                                                                     
                                                                                                                     
Optimized lowered selection DAG: %bb.0 'f2h:entry'                                                                   
SelectionDAG has 11 nodes:                                                                                           
  t0: ch,glue = EntryToken                                                                                           
            t2: i32,ch = CopyFromReg t0, Register:i32 %0                                                             
          t3: f32 = bitcast t2                                                                                       
        t5: f16 = fp_round t3, TargetConstant:i64<0>                                                                 
      t6: i16 = bitcast t5                                                                                           
    t7: i32 = any_extend t6                                                                                          
  t9: ch,glue = CopyToReg t0, Register:i32 $w0, t7                                                                   
  t10: ch = AArch64ISD::RET_GLUE t9, Register:i32 $w0, t9:1                                                          
                                                                                                                     
                                                                                                                     
SoftenFloatResult #0: t15: f32 = INSERT_SUBREG undef:i32, t5, TargetConstant:i32<7>                                  
                                                                                                                     
LLVM ERROR: Do not know how to soften the result of this operator!

After investigating a bit it appears that INSERT_SUBREG is an MIR instruction that is used to implement any_extend. This is confusing to me as I wouldn't expect the DAG to try and legalize an MIR instruction. Am I right in thinking this is a bug?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is.. a very old bit of the code that probably would be better if it worked differently. But I'm not sure if it's easy to change.

I believe it comes from AArch64TargetLowering::LowerBITCAST? Could that function behave differently, or could the ISD::BITCAST be not marked Custom with no-fp?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My newest patch tries to just default to the generic legalizer when attempting to bitcast with the operand being an fp16. Do let me know your thoughts on this approach

llvm/test/CodeGen/AArch64/float16-promotion-with-nofp.ll Outdated Show resolved Hide resolved
llvm/test/CodeGen/AArch64/float16-promotion-with-nofp.ll Outdated Show resolved Hide resolved
llvm/test/CodeGen/AArch64/float16-promotion-with-nofp.ll Outdated Show resolved Hide resolved
@nasherm nasherm marked this pull request as ready for review February 15, 2024 13:47
@davemgreen
Copy link
Collaborator

This looks like it's on the right track. I think it should be based on hasFPARMv8, not hasFullFP16.

If I change this to only be custom when we have fp16:

-  setOperationAction(ISD::FP_ROUND, MVT::f16, Custom);  
+  if (!Subtarget->hasFPARMv8())                         
+    setOperationAction(ISD::FP_ROUND, MVT::f16, Custom);

It runs into problems with BITCAST, so If I do the same thing there for the i16/f16/bf16 types then the test compiles, but produces multiple calls that might not be expected. I didn't look too deeply into why that was happening, and there is a chance it might cause problems in other places, but we don't have a lot of tests for "no-fp" codegen so far.

@nasherm nasherm force-pushed the nashe/fp16-promotion-nofp branch 2 times, most recently from eba9ee8 to 02032f3 Compare February 21, 2024 16:44
Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. This looks OK to me with a little bit of cleanup so long as others dont disagree.
Make sure you update the strictfp_f16_abi_promote.ll test.

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp Outdated Show resolved Hide resolved
@hvdijk
Copy link
Contributor

hvdijk commented Feb 21, 2024

This effectively includes my PR #80576, which is ready to merge except for an unresolved question about where and how to document the change. If I can get some guidance on what documentation is wanted there, I can amend that, push it, so that when we then rebase this one, it shows fewer changes?

@davemgreen
Copy link
Collaborator

This effectively includes my PR #80576, which is ready to merge except for an unresolved question about where and how to document the change. If I can get some guidance on what documentation is wanted there, I can amend that, push it, so that when we then rebase this one, it shows fewer changes?

Oh I thought I had seen a patch like that, but didn't manage to find it in a quick search - only the Arm parts and the generic part, both of which I think have been committed. Thanks for the info, we can try and get that one in.

When performing lowering of the fptrunc opcode returning fp16 with
the +nofp flag enabled we could trigger a compiler crash.
This is because we had no custom lowering implemented. This patch
implements a custom lowering for the case in which we need to
promote an fp16 return type for fptrunc when the +nofp attr is enabled.

Change-Id: Ibea20a676d40fde3f25e1ade365620071f46ff2b
Change-Id: Icf71578773b4c44fe8d79edd984661ec79fe1b09
Change-Id: Ic64506bb76bcc8f059b1de9ea6041fe8c0093b9c
Change-Id: Ib18e0ceeaab18d31d3fc43daab838ea95d62c2c1
Change-Id: I92af1dc9413486d2c20d90aebf0e377b077ad428
Change-Id: Ic1b7f8774ae529680597b4e032c4a63416ac927a
Change-Id: Icdb7c3742dc30da0652701ffc6c7468b8504e416
@nasherm
Copy link
Contributor Author

nasherm commented Feb 22, 2024

Thanks. This looks OK to me with a little bit of cleanup so long as others dont disagree. Make sure you update the strictfp_f16_abi_promote.ll test.

Rebasing onto @hvdijk merged PR updates updates strictfp_f16_abi_promote.ll and reduces the size of this PR. Hopefully addressed all outstanding comments

@davemgreen
Copy link
Collaborator

Thanks. This looks good to me now. One last thing, can you uncomment the @f16_return test from strictfp_f16_abi_promote.ll, so that we test the strict-fp case too?

With that this LGTM if @ostannard doesn't disagree.

Copy link
Collaborator

@ostannard ostannard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM too, assuming the strict-fp test case @davemgreen mentioned passes.

Copy link
Collaborator

@davemgreen davemgreen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

Change-Id: I11779e4a7e4494b45c1a166ebc46ef5887b38770
@nasherm nasherm merged commit 744c005 into llvm:main Feb 22, 2024
3 of 4 checks passed
@nasherm nasherm deleted the nashe/fp16-promotion-nofp branch February 22, 2024 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants