Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 10 additions & 1 deletion flang/lib/Lower/ConvertCall.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -639,9 +639,18 @@ Fortran::lower::genCallOpAndResult(
caller.getCallDescription().chevrons()[2], stmtCtx)));

mlir::Value stream; // stream is optional.
if (caller.getCallDescription().chevrons().size() > 3)
if (caller.getCallDescription().chevrons().size() > 3) {
stream = fir::getBase(converter.genExprAddr(
caller.getCallDescription().chevrons()[3], stmtCtx));
if (!fir::unwrapRefType(stream.getType()).isInteger(64)) {
auto i64Ty = mlir::IntegerType::get(builder.getContext(), 64);
mlir::Value newStream = builder.createTemporary(loc, i64Ty);
mlir::Value load = fir::LoadOp::create(builder, loc, stream);
mlir::Value conv = fir::ConvertOp::create(builder, loc, i64Ty, load);
fir::StoreOp::create(builder, loc, conv, newStream);
stream = newStream;
}
}

cuf::KernelLaunchOp::create(builder, loc, funcType.getResults(),
funcSymbolAttr, grid_x, grid_y, grid_z, block_x,
Expand Down
15 changes: 15 additions & 0 deletions flang/test/Lower/CUDA/cuda-stream.cuf
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
! RUN: bbc -emit-hlfir -fcuda %s -o - | FileCheck %s

attributes(global) subroutine sharedmem()
real, shared :: s(*)
integer :: t
t = threadIdx%x
s(t) = t
end subroutine

program test
call sharedmem<<<1, 1, 1024, 0>>>()
end

! CHECK-LABEL: func.func @_QQmain()
! CHECK: cuf.kernel_launch @_QPsharedmem<<<%c1{{.*}}, %c1{{.*}}, %c1{{.*}}, %c1{{.*}}, %c1{{.*}}, %c1{{.*}}, %c1024{{.*}}, %{{.*}} : !fir.ref<i64>>>>()
Copy link
Contributor

@vzakhari vzakhari Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why the stream operand is made a reference while all the other operands are just values. Maybe it will be more consitent to make it an optional AnyInteger operand, then you do not have to do anything special in lowering, and instead hide all the details of the kernel launch inside the cuf.kernel_launch conversion.

Just a thought. I am not suggesting changing it in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stream has to be a i64 reference because it can be written to as well with different cuda API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean it can be written by cuf.kernel_launch operation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CUDA Fortran stream is represented as a i64 integer. There is a special kind for it cuda_stream_kind. It can be written to by other API such as cudaStreamCreate not directly by cuf.kernel_launch. So for consistency I think it is better to keep the restriction here. It is quite rare to pass the stream as a constant like in this test. Usually the user has a local variable and use the proper API to create the stream and pass it to the launch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. To me the consistency of the dialect operation is more important than following to the letter the source APIs, so I would prefer just loading the value of stream before cuf.kernel_launch. But this is just my preference :)

Thanks for the change!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point and I think I like it also to have the op taking an integer value instead of a reference. I would need to change the kernel launch API as well since it is currently taking a reference.
I'm gonna make this change in a follow up patch.