Skip to content

Conversation

@llvmbot
Copy link
Member

llvmbot commented Aug 29, 2025

@llvm/pr-subscribers-flang-fir-hlfir

@llvm/pr-subscribers-offload

Author: Kareem Ergawy (ergawy)

Changes

Adds end-to-end tests for do concurrent offloading to the device.


Full diff: https://github.com/llvm/llvm-project/pull/155993.diff

2 Files Affected:

  • (added) offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90 (+53)
  • (added) offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90 (+53)
diff --git a/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90 b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
new file mode 100644
index 0000000000000..c6f576acb90b6
--- /dev/null
+++ b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy-2d.f90
@@ -0,0 +1,53 @@
+! REQUIRES: flang, amdgpu
+
+! RUN: %libomptarget-compile-fortran-generic -fdo-concurrent-to-openmp=device
+! RUN: env LIBOMPTARGET_INFO=16 %libomptarget-run-generic 2>&1 | %fcheck-generic
+module saxpymod
+   use iso_fortran_env
+   public :: saxpy
+contains
+
+subroutine saxpy(a, x, y, n, m)
+   use iso_fortran_env
+   implicit none
+   integer,intent(in) :: n, m
+   real(kind=real32),intent(in) :: a
+   real(kind=real32), dimension(:,:),intent(in) :: x
+   real(kind=real32), dimension(:,:),intent(inout) :: y
+   integer :: i, j
+
+   do concurrent(i=1:n, j=1:m)
+       y(i,j) = a * x(i,j) + y(i,j)
+   end do
+
+   write(*,*) "plausibility check:"
+   write(*,'("y(1,1) ",f8.6)') y(1,1)
+   write(*,'("y(n,m) ",f8.6)') y(n,m)
+end subroutine saxpy
+
+end module saxpymod
+
+program main
+   use iso_fortran_env
+   use saxpymod, ONLY:saxpy
+   implicit none
+
+   integer,parameter :: n = 1000, m=10000
+   real(kind=real32), allocatable, dimension(:,:) :: x, y
+   real(kind=real32) :: a
+   integer :: i
+
+   allocate(x(1:n,1:m), y(1:n,1:m))
+   a = 2.0_real32
+   x(:,:) = 1.0_real32
+   y(:,:) = 2.0_real32
+
+   call saxpy(a, x, y, n, m)
+
+   deallocate(x,y)
+end program main
+
+! CHECK:  "PluginInterface" device {{[0-9]+}} info: Launching kernel {{.*}}
+! CHECK:  plausibility check:
+! CHECK:  y(1,1) 4.0
+! CHECK:  y(n,m) 4.0
diff --git a/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90 b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90
new file mode 100644
index 0000000000000..e094a1d7459ef
--- /dev/null
+++ b/offload/test/offloading/fortran/do-concurrent-to-omp-saxpy.f90
@@ -0,0 +1,53 @@
+! REQUIRES: flang, amdgpu
+
+! RUN: %libomptarget-compile-fortran-generic -fdo-concurrent-to-openmp=device
+! RUN: env LIBOMPTARGET_INFO=16 %libomptarget-run-generic 2>&1 | %fcheck-generic
+module saxpymod
+   use iso_fortran_env
+   public :: saxpy
+contains
+
+subroutine saxpy(a, x, y, n)
+   use iso_fortran_env
+   implicit none
+   integer,intent(in) :: n
+   real(kind=real32),intent(in) :: a
+   real(kind=real32), dimension(:),intent(in) :: x
+   real(kind=real32), dimension(:),intent(inout) :: y
+   integer :: i
+
+   do concurrent(i=1:n)
+       y(i) = a * x(i) + y(i)
+   end do
+
+   write(*,*) "plausibility check:"
+   write(*,'("y(1) ",f8.6)') y(1)
+   write(*,'("y(n) ",f8.6)') y(n)
+end subroutine saxpy
+
+end module saxpymod
+
+program main
+   use iso_fortran_env
+   use saxpymod, ONLY:saxpy
+   implicit none
+
+   integer,parameter :: n = 10000000
+   real(kind=real32), allocatable, dimension(:) :: x, y
+   real(kind=real32) :: a
+   integer :: i
+
+   allocate(x(1:n), y(1:n))
+   a = 2.0_real32
+   x(:) = 1.0_real32
+   y(:) = 2.0_real32
+
+   call saxpy(a, x, y, n)
+
+   deallocate(x,y)
+end program main
+
+! CHECK:  "PluginInterface" device {{[0-9]+}} info: Launching kernel {{.*}}
+! CHECK:  plausibility check:
+! CHECK:  y(1) 4.0
+! CHECK:  y(n) 4.0

@ergawy ergawy force-pushed the users/ergawy/upstream_dc_device_4_lit_tests branch from 02636ca to 3dd383b Compare September 1, 2025 06:27
@ergawy ergawy force-pushed the users/ergawy/upstream_dc_device_5_offload_tests branch from b201f91 to f1bbd24 Compare September 1, 2025 06:28
@ergawy ergawy requested a review from agozillon September 1, 2025 06:42
@ergawy ergawy force-pushed the users/ergawy/upstream_dc_device_4_lit_tests branch from 3dd383b to f2e47d9 Compare September 1, 2025 11:40
@ergawy ergawy force-pushed the users/ergawy/upstream_dc_device_5_offload_tests branch from f1bbd24 to c50d3e6 Compare September 1, 2025 11:40
@ergawy ergawy force-pushed the users/ergawy/upstream_dc_device_4_lit_tests branch from f2e47d9 to 77181e6 Compare September 2, 2025 05:25
@ergawy ergawy force-pushed the users/ergawy/upstream_dc_device_5_offload_tests branch from c50d3e6 to 2fd2022 Compare September 2, 2025 05:26
@ergawy
Copy link
Member Author

ergawy commented Sep 2, 2025

Ping! Please have a look when you have time.

@ergawy ergawy force-pushed the users/ergawy/upstream_dc_device_4_lit_tests branch from 77181e6 to bd8fab0 Compare September 4, 2025 09:39
@ergawy ergawy force-pushed the users/ergawy/upstream_dc_device_5_offload_tests branch from 2fd2022 to d967c72 Compare September 4, 2025 09:39
@ergawy ergawy force-pushed the users/ergawy/upstream_dc_device_4_lit_tests branch from bd8fab0 to f19a301 Compare September 8, 2025 12:00
@ergawy ergawy force-pushed the users/ergawy/upstream_dc_device_5_offload_tests branch from d967c72 to d592609 Compare September 8, 2025 12:00
ergawy added a commit that referenced this pull request Sep 8, 2025
…ide values (#155754)

Following up on #154483, this PR introduces further refactoring to
extract some shared utils between OpenMP lowering and `do concurrent`
conversion pass. In particular, this PR extracts 2 utils that handle
mapping or cloning values used inside target regions but defined
outside.

Later `do concurrent` PR(s) will also use these utils.

PR stack:
- #155754 ◀️
- #155987
- #155992
- #155993
- #156589
- #156610
- #156837
@ergawy ergawy force-pushed the users/ergawy/upstream_dc_device_4_lit_tests branch from f19a301 to db09d54 Compare September 8, 2025 12:35
@ergawy ergawy force-pushed the users/ergawy/upstream_dc_device_5_offload_tests branch from d592609 to 315f521 Compare September 8, 2025 12:36
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 8, 2025
… clone outside values (#155754)

Following up on #154483, this PR introduces further refactoring to
extract some shared utils between OpenMP lowering and `do concurrent`
conversion pass. In particular, this PR extracts 2 utils that handle
mapping or cloning values used inside target regions but defined
outside.

Later `do concurrent` PR(s) will also use these utils.

PR stack:
- llvm/llvm-project#155754 ◀️
- llvm/llvm-project#155987
- llvm/llvm-project#155992
- llvm/llvm-project#155993
- llvm/llvm-project#156589
- llvm/llvm-project#156610
- llvm/llvm-project#156837
@ergawy ergawy force-pushed the users/ergawy/upstream_dc_device_4_lit_tests branch from db09d54 to 6d564c6 Compare September 9, 2025 10:10
@ergawy ergawy force-pushed the users/ergawy/upstream_dc_device_5_offload_tests branch from 315f521 to e681a9f Compare September 9, 2025 10:13
@ergawy ergawy force-pushed the users/ergawy/upstream_dc_device_5_offload_tests branch from e36db59 to 2177ccc Compare September 13, 2025 11:38
Base automatically changed from users/ergawy/upstream_dc_device_4_lit_tests to main September 16, 2025 07:41
ergawy added a commit that referenced this pull request Sep 16, 2025
…5992)

Adds more lit tests for `do concurrent` device mapping.

PR stack:
- #155754
- #155987
- #155992 ◀️
- #155993
- #157638
- #156610
- #156837
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 16, 2025
Adds end-to-end tests for `do concurrent` offloading to the device.
@ergawy ergawy force-pushed the users/ergawy/upstream_dc_device_5_offload_tests branch from 2177ccc to fd66849 Compare September 16, 2025 09:50
Copy link
Contributor

@bhandarkar-pranav bhandarkar-pranav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, LGTM.

@ergawy ergawy merged commit c286a42 into main Sep 17, 2025
9 checks passed
@ergawy ergawy deleted the users/ergawy/upstream_dc_device_5_offload_tests branch September 17, 2025 05:04
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 17, 2025
kimsh02 pushed a commit to kimsh02/llvm-project that referenced this pull request Sep 19, 2025
kimsh02 pushed a commit to kimsh02/llvm-project that referenced this pull request Sep 19, 2025
itzexpoexpo pushed a commit to itzexpoexpo/llvm-project that referenced this pull request Sep 21, 2025
itzexpoexpo pushed a commit to itzexpoexpo/llvm-project that referenced this pull request Sep 21, 2025
ergawy added a commit that referenced this pull request Sep 23, 2025
Extends support for mapping `do concurrent` on the device by adding
support for `local` specifiers. The changes in this PR map the local
variable to the `omp.target` op and uses the mapped value as the
`private` clause operand in the nested `omp.parallel` op.

- #155754
- #155987
- #155992
- #155993
- #157638 ◀️
- #156610
- #156837
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 23, 2025
… (#157638)

Extends support for mapping `do concurrent` on the device by adding
support for `local` specifiers. The changes in this PR map the local
variable to the `omp.target` op and uses the mapped value as the
`private` clause operand in the nested `omp.parallel` op.

- llvm/llvm-project#155754
- llvm/llvm-project#155987
- llvm/llvm-project#155992
- llvm/llvm-project#155993
- llvm/llvm-project#157638 ◀️
- llvm/llvm-project#156610
- llvm/llvm-project#156837
ergawy added a commit that referenced this pull request Sep 23, 2025
Extends `do concurrent` to OpenMP device mapping by adding support for
mapping `reduce` specifiers to omp `reduction` clauses. The changes
attach 2 `reduction` clauses to the mapped OpenMP construct: one on the
`teams` part of the construct and one on the `wloop` part.

- #155754
- #155987
- #155992
- #155993
- #157638
- #156610 ◀️
- #156837
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 23, 2025
…e (#156610)

Extends `do concurrent` to OpenMP device mapping by adding support for
mapping `reduce` specifiers to omp `reduction` clauses. The changes
attach 2 `reduction` clauses to the mapped OpenMP construct: one on the
`teams` part of the construct and one on the `wloop` part.

- llvm/llvm-project#155754
- llvm/llvm-project#155987
- llvm/llvm-project#155992
- llvm/llvm-project#155993
- llvm/llvm-project#157638
- llvm/llvm-project#156610 ◀️
- llvm/llvm-project#156837
ergawy added a commit that referenced this pull request Sep 23, 2025
… GPU (#156837)

Fixes a bug related to insertion points when inlining multi-block
combiner reduction regions. The IP at the end of the inlined region was
not used resulting in emitting BBs with multiple terminators.

PR stack:
- #155754
- #155987
- #155992
- #155993
- #157638
- #156610
- #156837 ◀️
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 23, 2025
…ions on the GPU (#156837)

Fixes a bug related to insertion points when inlining multi-block
combiner reduction regions. The IP at the end of the inlined region was
not used resulting in emitting BBs with multiple terminators.

PR stack:
- llvm/llvm-project#155754
- llvm/llvm-project#155987
- llvm/llvm-project#155992
- llvm/llvm-project#155993
- llvm/llvm-project#157638
- llvm/llvm-project#156610
- llvm/llvm-project#156837 ◀️
jwu10003 pushed a commit to jwu10003/llvm-project that referenced this pull request Sep 23, 2025
Extends support for mapping `do concurrent` on the device by adding
support for `local` specifiers. The changes in this PR map the local
variable to the `omp.target` op and uses the mapped value as the
`private` clause operand in the nested `omp.parallel` op.

- llvm#155754
- llvm#155987
- llvm#155992
- llvm#155993
- llvm#157638 ◀️
- llvm#156610
- llvm#156837
jwu10003 pushed a commit to jwu10003/llvm-project that referenced this pull request Sep 23, 2025
)

Extends `do concurrent` to OpenMP device mapping by adding support for
mapping `reduce` specifiers to omp `reduction` clauses. The changes
attach 2 `reduction` clauses to the mapped OpenMP construct: one on the
`teams` part of the construct and one on the `wloop` part.

- llvm#155754
- llvm#155987
- llvm#155992
- llvm#155993
- llvm#157638
- llvm#156610 ◀️
- llvm#156837
jwu10003 pushed a commit to jwu10003/llvm-project that referenced this pull request Sep 23, 2025
… GPU (llvm#156837)

Fixes a bug related to insertion points when inlining multi-block
combiner reduction regions. The IP at the end of the inlined region was
not used resulting in emitting BBs with multiple terminators.

PR stack:
- llvm#155754
- llvm#155987
- llvm#155992
- llvm#155993
- llvm#157638
- llvm#156610
- llvm#156837 ◀️
SeongjaeP pushed a commit to SeongjaeP/llvm-project that referenced this pull request Sep 23, 2025
SeongjaeP pushed a commit to SeongjaeP/llvm-project that referenced this pull request Sep 23, 2025
SeongjaeP pushed a commit to SeongjaeP/llvm-project that referenced this pull request Sep 23, 2025
Extends support for mapping `do concurrent` on the device by adding
support for `local` specifiers. The changes in this PR map the local
variable to the `omp.target` op and uses the mapped value as the
`private` clause operand in the nested `omp.parallel` op.

- llvm#155754
- llvm#155987
- llvm#155992
- llvm#155993
- llvm#157638 ◀️
- llvm#156610
- llvm#156837
SeongjaeP pushed a commit to SeongjaeP/llvm-project that referenced this pull request Sep 23, 2025
)

Extends `do concurrent` to OpenMP device mapping by adding support for
mapping `reduce` specifiers to omp `reduction` clauses. The changes
attach 2 `reduction` clauses to the mapped OpenMP construct: one on the
`teams` part of the construct and one on the `wloop` part.

- llvm#155754
- llvm#155987
- llvm#155992
- llvm#155993
- llvm#157638
- llvm#156610 ◀️
- llvm#156837
YixingZhang007 pushed a commit to YixingZhang007/llvm-project that referenced this pull request Sep 27, 2025
Extends support for mapping `do concurrent` on the device by adding
support for `local` specifiers. The changes in this PR map the local
variable to the `omp.target` op and uses the mapped value as the
`private` clause operand in the nested `omp.parallel` op.

- llvm#155754
- llvm#155987
- llvm#155992
- llvm#155993
- llvm#157638 ◀️
- llvm#156610
- llvm#156837
YixingZhang007 pushed a commit to YixingZhang007/llvm-project that referenced this pull request Sep 27, 2025
)

Extends `do concurrent` to OpenMP device mapping by adding support for
mapping `reduce` specifiers to omp `reduction` clauses. The changes
attach 2 `reduction` clauses to the mapped OpenMP construct: one on the
`teams` part of the construct and one on the `wloop` part.

- llvm#155754
- llvm#155987
- llvm#155992
- llvm#155993
- llvm#157638
- llvm#156610 ◀️
- llvm#156837
YixingZhang007 pushed a commit to YixingZhang007/llvm-project that referenced this pull request Sep 27, 2025
… GPU (llvm#156837)

Fixes a bug related to insertion points when inlining multi-block
combiner reduction regions. The IP at the end of the inlined region was
not used resulting in emitting BBs with multiple terminators.

PR stack:
- llvm#155754
- llvm#155987
- llvm#155992
- llvm#155993
- llvm#157638
- llvm#156610
- llvm#156837 ◀️
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flang:fir-hlfir flang Flang issues not falling into any other category offload
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants