Port 14 benchmarks to Kokkos, bringing total to 158#69
Merged
kento merged 10 commits intoApr 19, 2026
Merged
Conversation
- Replace #pragma omp target offloading with Kokkos::parallel_for - Replace raw device arrays with Kokkos::View<T*> - Use Kokkos::initialize/finalize wrapping main logic - clip_plus/clip_minus annotated with KOKKOS_INLINE_FUNCTION for device use - 2D kernel flattened to 1D parallel_for (tid = tx*length_x + ty) - cpu_relextrema_1D/2D kept as CPU reference implementations - Makefile follows norm2-kokkos template exactly Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: kento <1034379+kento@users.noreply.github.com>
- Replace OMP target offloading with Kokkos::parallel_for - Replace raw device arrays with Kokkos::View<float*> - Use Kokkos::MDRangePolicy<Kokkos::Rank<2>> for the 2D stencil kernel - Use Kokkos::initialize/finalize around device computation - Use Kokkos::deep_copy and mirror views for host<->device transfers - Makefile follows norm2-kokkos template with KOKKOS_INC/KOKKOS_LIB paths - Run target: ./main 1024 1024 100 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: kento <1034379+kento@users.noreply.github.com>
- Use {HALF_LENGTH, HALF_LENGTH} to {nR-HALF_LENGTH, nC-HALF_LENGTH}
as the MDRangePolicy bounds, avoiding launching idle boundary threads
- Add comment explaining the alternation pattern and why d_next is used
for validation comparison
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: kento <1034379+kento@users.noreply.github.com>
- Replace OMP target offloading with Kokkos::parallel_for - Replace raw device arrays with Kokkos::View and host mirrors - Add Kokkos::initialize/finalize - Convert mixRemainder to KOKKOS_INLINE_FUNCTION - Keep mix/final/rot macros (work unchanged in device lambdas) - Use RangePolicy<IndexType<unsigned long>> matching original N type - Makefile follows norm2-kokkos template Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: kento <1034379+kento@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: kento <1034379+kento@users.noreply.github.com>
- Replace omp target offloading with Kokkos::parallel_for (MDRangePolicy) for the 2D red and black Gauss-Seidel kernels - Replace omp target reduction with Kokkos::parallel_reduce for norm - Replace raw device arrays with Kokkos::View; use host mirrors for fill_coeffs and output - Add Kokkos::initialize / Kokkos::finalize scope - Makefile follows the norm2-kokkos template exactly Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: kento <1034379+kento@users.noreply.github.com>
- feynman-kac-kokkos: util.h with KOKKOS_INLINE_FUNCTION annotations, Kokkos::parallel_reduce over MDRangePolicy<Rank<2>>, combined ErrCount struct reducer with reduction_identity specialization, per-thread seed via seed + tid - henry-kokkos: KOKKOS_INLINE_FUNCTION on LCG_random_double and compute, Kokkos::View<StructureAtom*> for device atoms, Kokkos::parallel_for with flat RangePolicy, per-thread seed = id, host accumulation of boltzmannFactors after each cycle Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: kento <1034379+kento@users.noreply.github.com>
- vol2col-kokkos: 4D parallel_for using flat index with absolute index arithmetic (no pointer mutation); col2vol uses simple 1D parallel_for - pointwise-kokkos: LSTM elementwise kernel with per-array integer offsets replacing pointer arithmetic; LCG_random/sigmoidf marked KOKKOS_INLINE_FUNCTION - vmc-kokkos: all device functions marked KOKKOS_INLINE_FUNCTION; propagate/initran/initialize/zero_stats converted to parallel_for; SumWithinBlocks uses flat parallel_for cycling over blocks Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: kento <1034379+kento@users.noreply.github.com>
…ts and readability - vol2col: pass d_data_col directly to col2vol_kernel (implicit const conversion) - vmc: expand SumWithinBlocks stride comment to explain the cycling invariant - pointwise: extract complex offset arithmetic into named variables for clarity Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: kento <1034379+kento@users.noreply.github.com>
…lace, feynman-kac, henry, vol2col, pointwise, vmc, gpp, matern, doh, thomas, log2) Agent-Logs-Url: https://github.com/kento/HeCBench/sessions/147c1df8-7e1e-4f38-9683-f01a503623e6 Co-authored-by: kento <1034379+kento@users.noreply.github.com>
Copilot created this pull request from a session on behalf of
kento
April 19, 2026 10:21
View session
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
HeCBench had 144 Kokkos ports out of ~496 unique benchmarks. This PR adds 14 new ports (all compiling cleanly against system Kokkos 3.7 via OpenMP backend).
New Kokkos ports
extremaiso2dfdjenkins-hashlaplacefeynman-kachenryvol2colpointwisevmcgppmaterndohthomaslog2Conversion patterns applied
#pragma omp declare targetfunctions →KOKKOS_INLINE_FUNCTION#pragma omp target teams distribute parallel for→Kokkos::parallel_forreduction(+:x)clauses →Kokkos::parallel_reduceKokkos::View<T*>withcreate_mirror_view/deep_copy/usr/include//usr/lib/x86_64-linux-gnuScope notes
Of the remaining ~338 unported benchmarks, ~73 require vendor-specific libraries (cuBLAS, cuFFT, oneMKL, etc.) that have no direct Kokkos equivalent and cannot be straightforwardly ported.