[MLIR] Add sincos intrinsic to LLVM dialect #160561

ashermancinelli · 2025-09-24T17:01:05Z

Adds llvm.intr.sincos operation using LLVM_TwoResultIntrOp in the mold of the frexp intrinsic. I don't see many intrinsics with verifiers, so please let me know if the verifier and/or the invalid MLIR tests don't look right. Thanks in advance!

llvmbot · 2025-09-24T17:01:29Z

@llvm/pr-subscribers-mlir-llvm

@llvm/pr-subscribers-mlir

Author: Asher Mancinelli (ashermancinelli)

Changes

Adds llvm.intr.sincos operation using LLVM_TwoResultIntrOp in the mold of the frexp intrinsic. I don't see many intrinsics with verifiers, so please let me know if the verifier and/or the invalid MLIR tests don't look right.

Full diff: https://github.com/llvm/llvm-project/pull/160561.diff

4 Files Affected:

(modified) mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td (+9)
(modified) mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp (+18)
(modified) mlir/test/Dialect/LLVMIR/invalid.mlir (+21)
(modified) mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir (+7)

diff --git a/mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td b/mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td
index e12b8ac84ba23..398388bd720be 100644
--- a/mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td
+++ b/mlir/include/mlir/Dialect/LLVMIR/LLVMIntrinsicOps.td
@@ -184,6 +184,15 @@ def LLVM_UMinOp : LLVM_BinarySameArgsIntrOpI<"umin">;
 def LLVM_SinOp : LLVM_UnaryIntrOpF<"sin">;
 def LLVM_CosOp : LLVM_UnaryIntrOpF<"cos">;
 def LLVM_TanOp : LLVM_UnaryIntrOpF<"tan">;
+def LLVM_SincosOp : LLVM_TwoResultIntrOp<"sincos", [], [0],
+  [Pure], /*requiresFastmath=*/1> {
+  let arguments =
+      (ins LLVM_ScalarOrVectorOf<LLVM_AnyFloat>:$val,
+      DefaultValuedAttr<LLVM_FastmathFlagsAttr, "{}">:$fastmathFlags);
+  let assemblyFormat = "`(` operands `)` attr-dict `:` "
+      "functional-type(operands, results)";
+  let hasVerifier = 1;
+}
 
 def LLVM_ASinOp : LLVM_UnaryIntrOpF<"asin">;
 def LLVM_ACosOp : LLVM_UnaryIntrOpF<"acos">;
diff --git a/mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp b/mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
index a3d5d25b96ec2..0d5e9e87070df 100644
--- a/mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
+++ b/mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
@@ -4085,6 +4085,24 @@ printIndirectBrOpSucessors(OpAsmPrinter &p, IndirectBrOp op, Type flagType,
   p << "]";
 }
 
+//===----------------------------------------------------------------------===//
+// SincosOp (intrinsic)
+//===----------------------------------------------------------------------===//
+
+LogicalResult LLVM::SincosOp::verify() {
+  auto operandType = getOperand().getType();
+  auto resultType = getResult().getType();
+  auto resultStructType =
+      mlir::dyn_cast<mlir::LLVM::LLVMStructType>(resultType);
+  if (!resultStructType || resultStructType.getBody().size() != 2 ||
+      resultStructType.getBody()[0] != operandType ||
+      resultStructType.getBody()[1] != operandType) {
+    return emitOpError("expected result type to be an homogeneous struct with "
+                       "two elements matching the operand type");
+  }
+  return success();
+}
+
 //===----------------------------------------------------------------------===//
 // AssumeOp (intrinsic)
 //===----------------------------------------------------------------------===//
diff --git a/mlir/test/Dialect/LLVMIR/invalid.mlir b/mlir/test/Dialect/LLVMIR/invalid.mlir
index 1adecf264e8f6..627abd0665d8c 100644
--- a/mlir/test/Dialect/LLVMIR/invalid.mlir
+++ b/mlir/test/Dialect/LLVMIR/invalid.mlir
@@ -2014,3 +2014,24 @@ llvm.mlir.alias external @alias_resolver : !llvm.ptr {
 }
 // expected-error@+1 {{'llvm.mlir.ifunc' op must have a function resolver}}
 llvm.mlir.ifunc external @foo : !llvm.func<void (ptr, i32)>, !llvm.ptr @alias_resolver {dso_local}
+
+// -----
+
+llvm.func @invalid_sincos_nonhomogeneous_return_type(%f: f32) -> () {
+  // expected-error@+1 {{op expected result type to be an homogeneous struct with two elements matching the operand type}}
+  llvm.intr.sincos(%f) : (f32) -> !llvm.struct<(f32, f64)>
+}
+
+// -----
+
+llvm.func @invalid_sincos_non_struct_return_type(%f: f32) -> () {
+  // expected-error@+1 {{op expected result type to be an homogeneous struct with two elements matching the operand type}}
+  llvm.intr.sincos(%f) : (f32) -> f32
+}
+
+// -----
+
+llvm.func @invalid_sincos_gt_2_element_struct_return_type(%f: f32) -> () {
+  // expected-error@+1 {{op expected result type to be an homogeneous struct with two elements matching the operand type}}
+  llvm.intr.sincos(%f) : (f32) -> !llvm.struct<(f32, f32, f32)>
+}
diff --git a/mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir b/mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir
index cf3e129879d09..d63584e5e03ab 100644
--- a/mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir
+++ b/mlir/test/Target/LLVMIR/llvmir-intrinsics.mlir
@@ -146,6 +146,11 @@ llvm.func @trig_test(%arg0: f32, %arg1: vector<8xf32>) {
   llvm.intr.tan(%arg0) : (f32) -> f32
   // CHECK: call <8 x float> @llvm.tan.v8f32
   llvm.intr.tan(%arg1) : (vector<8xf32>) -> vector<8xf32>
+
+  // CHECK: call { float, float } @llvm.sincos.f32
+  llvm.intr.sincos(%arg0) : (f32) -> !llvm.struct<(f32, f32)>
+  // CHECK: call { <8 x float>, <8 x float> } @llvm.sincos.v8f32
+  llvm.intr.sincos(%arg1) : (vector<8xf32>) -> !llvm.struct<(vector<8xf32>, vector<8xf32>)>
   llvm.return
 }
 
@@ -1302,6 +1307,8 @@ llvm.func @experimental_constrained_fpext(%s: f32, %v: vector<4xf32>) {
 // CHECK-DAG: declare <8 x float> @llvm.ceil.v8f32(<8 x float>) #0
 // CHECK-DAG: declare float @llvm.cos.f32(float)
 // CHECK-DAG: declare <8 x float> @llvm.cos.v8f32(<8 x float>) #0
+// CHECK-DAG: declare { float, float } @llvm.sincos.f32(float)
+// CHECK-DAG: declare { <8 x float>, <8 x float> } @llvm.sincos.v8f32(<8 x float>) #0
 // CHECK-DAG: declare float @llvm.copysign.f32(float, float)
 // CHECK-DAG: declare float @llvm.rint.f32(float)
 // CHECK-DAG: declare double @llvm.rint.f64(double)

clementval

LGTM. Thanks @ashermancinelli for adding this missing intrinsic.

vzakhari

Thank you!

vzakhari · 2025-09-24T17:40:32Z

The verifier looks good to me.

krzysz00

lgtm here too

mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp

Adds llvm.intr.sincos operation using LLVM_TwoResultIntrOp in the mold of the frexp intrinsic.

ashermancinelli · 2025-09-25T14:29:41Z

Thanks for the reviews!

llvm-ci · 2025-09-25T14:58:24Z

LLVM Buildbot has detected a new failure on builder mlir-nvidia-gcc7 running on mlir-nvidia while building mlir at step 7 "test-build-check-mlir-build-only-check-mlir".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/116/builds/18872

Here is the relevant piece of the build log for the reference

Step 7 (test-build-check-mlir-build-only-check-mlir) failure: test (failure)
******************** TEST 'MLIR :: Integration/GPU/CUDA/async.mlir' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
# RUN: at line 1
/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-kernel-outlining  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -pass-pipeline='builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary="format=fatbin"  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-runner    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_cuda_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_async_runtime.so    --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_runner_utils.so    --entry-point-result=void -O0  | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-kernel-outlining
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt '-pass-pipeline=builtin.module(gpu.module(strip-debuginfo,convert-gpu-to-nvvm),nvvm-attach-target)'
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -gpu-async-region -gpu-to-llvm -reconcile-unrealized-casts -gpu-module-to-binary=format=fatbin
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -async-to-async-runtime -async-runtime-ref-counting
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-opt -convert-async-to-llvm -convert-func-to-llvm -convert-arith-to-llvm -convert-cf-to-llvm -reconcile-unrealized-casts
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/mlir-runner --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_cuda_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_async_runtime.so --shared-libs=/vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/lib/libmlir_runner_utils.so --entry-point-result=void -O0
# .---command stderr------------
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuStreamWaitEvent(stream, event, 0)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventSynchronize(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# | 'cuEventDestroy(event)' failed with 'CUDA_ERROR_CONTEXT_IS_DESTROYED'
# `-----------------------------
# executed command: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.obj/bin/FileCheck /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# .---command stderr------------
# | /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir:68:12: error: CHECK: expected string not found in input
# |  // CHECK: [84, 84]
# |            ^
# | <stdin>:1:1: note: scanning from here
# | Unranked Memref base@ = 0x5955387bd090 rank = 1 offset = 0 sizes = [2] strides = [1] data = 
# | ^
# | <stdin>:2:1: note: possible intended match here
# | [42, 42]
# | ^
# | 
# | Input file: <stdin>
# | Check file: /vol/worker/mlir-nvidia/mlir-nvidia-gcc7/llvm.src/mlir/test/Integration/GPU/CUDA/async.mlir
# | 
# | -dump-input=help explains the following input dump.
# | 
# | Input was:
# | <<<<<<
# |             1: Unranked Memref base@ = 0x5955387bd090 rank = 1 offset = 0 sizes = [2] strides = [1] data =  
# | check:68'0     X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
# |             2: [42, 42] 
# | check:68'0     ~~~~~~~~~
# | check:68'1     ?         possible intended match
...

ashermancinelli · 2025-09-25T15:01:36Z

My change is the only one in the blame-list, but the test appears to be unrelated. I'll be watching the subsequent builds.

ashermancinelli · 2025-09-25T15:40:47Z

The subsequent build passed tests.

Now that `sincos` is a supported intrinsic in the LLVM dialect (llvm#160561) we are able to add the corresponding operation in the math dialect. We have several benchmarks that use sine and cosine in hot-loops, and saving some calculations by performing sine and cosine together can benefit performance. We would like to have a way to represent sincos in the math dialect. Two parts I'm unsure about: * What do we think of the assembly format? `math.sincos %floatlike : f32 -> f32, f32` With a custom assembly format we could omit the `->` and everything after, but I couldn't get the ODS to do that. Open to suggestions. * I implement `getShapeForUnroll()` here, but where is the best place to test the unroller interfaces? I'll keep poking around after sending this out for review.

Now that `sincos` is a supported intrinsic in the LLVM dialect (#160561) we are able to add the corresponding operation in the math dialect and add conversion patterns for LLVM and NVVM. We have several benchmarks that use sine and cosine in hot-loops, and saving some calculations by performing them together can benefit performance. We would like to have a way to represent sincos in the math dialect.

We see performance improvements from using sincos to reuse calculations in hot loops that compute sin() and cos() on the same operand. Add a pass to identify sin() and cos() calls in the same block with the same operand and fast-math flags, and fuse them into a sincos op. Follow-up to: * llvm#160561 * llvm#160772

We see performance improvements from using sincos to reuse calculations in hot loops that compute sin() and cos() of the same operand. Add a pass to identify sin() and cos() calls in the same block with the same operand and fast-math flags, and fuse them into a sincos op. Follow-up to: * #160561 * #160772

Adds llvm.intr.sincos operation using LLVM_TwoResultIntrOp in the mold of the frexp intrinsic.

Now that `sincos` is a supported intrinsic in the LLVM dialect (llvm#160561) we are able to add the corresponding operation in the math dialect and add conversion patterns for LLVM and NVVM. We have several benchmarks that use sine and cosine in hot-loops, and saving some calculations by performing them together can benefit performance. We would like to have a way to represent sincos in the math dialect.

We see performance improvements from using sincos to reuse calculations in hot loops that compute sin() and cos() of the same operand. Add a pass to identify sin() and cos() calls in the same block with the same operand and fast-math flags, and fuse them into a sincos op. Follow-up to: * llvm#160561 * llvm#160772

ashermancinelli requested review from clementval, ftynse, joker-eph and vzakhari September 24, 2025 17:01

ashermancinelli self-assigned this Sep 24, 2025

ashermancinelli added mlir:llvm mlir labels Sep 24, 2025

clementval reviewed Sep 24, 2025

View reviewed changes

vzakhari approved these changes Sep 24, 2025

View reviewed changes

krzysz00 approved these changes Sep 24, 2025

View reviewed changes

joker-eph reviewed Sep 24, 2025

View reviewed changes

mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp Outdated Show resolved Hide resolved

ashermancinelli added 2 commits September 24, 2025 12:05

[MLIR] Add sincos intrinsic to LLVM dialect

8853d5f

Adds llvm.intr.sincos operation using LLVM_TwoResultIntrOp in the mold of the frexp intrinsic.

format

b692b77

ashermancinelli force-pushed the ajm/llvm-sincos branch from 1ee7201 to b692b77 Compare September 24, 2025 19:05

ashermancinelli merged commit 9aa5d5a into llvm:main Sep 25, 2025
9 checks passed

ashermancinelli mentioned this pull request Sep 25, 2025

[MLIR] Add sincos op to math dialect #160772

Merged

3 tasks

ashermancinelli mentioned this pull request Sep 30, 2025

[MLIR] Add sincos fusion pass #161413

Merged

mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025

[MLIR] Add sincos intrinsic to LLVM dialect (llvm#160561)

a21b952

Adds llvm.intr.sincos operation using LLVM_TwoResultIntrOp in the mold of the frexp intrinsic.

aemerson mentioned this pull request Oct 16, 2025

[clang-reorder-fields] Check for flexible array member (#160262) #163697

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MLIR] Add sincos intrinsic to LLVM dialect #160561

[MLIR] Add sincos intrinsic to LLVM dialect #160561

Uh oh!

ashermancinelli commented Sep 24, 2025 •

edited

Loading

Uh oh!

llvmbot commented Sep 24, 2025 •

edited

Loading

Uh oh!

clementval left a comment •

edited

Loading

Uh oh!

vzakhari left a comment

Uh oh!

vzakhari commented Sep 24, 2025

Uh oh!

krzysz00 left a comment

Uh oh!

Uh oh!

ashermancinelli commented Sep 25, 2025

Uh oh!

Uh oh!

llvm-ci commented Sep 25, 2025

Uh oh!

ashermancinelli commented Sep 25, 2025

Uh oh!

ashermancinelli commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

[MLIR] Add sincos intrinsic to LLVM dialect #160561

[MLIR] Add sincos intrinsic to LLVM dialect #160561

Uh oh!

Conversation

ashermancinelli commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clementval left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vzakhari left a comment

Choose a reason for hiding this comment

Uh oh!

vzakhari commented Sep 24, 2025

Uh oh!

krzysz00 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ashermancinelli commented Sep 25, 2025

Uh oh!

Uh oh!

llvm-ci commented Sep 25, 2025

Uh oh!

ashermancinelli commented Sep 25, 2025

Uh oh!

ashermancinelli commented Sep 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

ashermancinelli commented Sep 24, 2025 •

edited

Loading

llvmbot commented Sep 24, 2025 •

edited

Loading

clementval left a comment •

edited

Loading