Kokkos: fix arithmetic bugs in existing ports + add 7 new ports#63
Merged
Conversation
aobench-kokkos: fix transposed x/y pixel coordinates - The 1D->2D index decomposition used idx/h and idx%h, which assigned the row to x and the column to y (opposite of CUDA). Fix: y = idx/w (row), x = idx%w (column). aop-kokkos: fix missing sums.w reduction in prepare_svd_kernel - The CUDA version reduces all four moment sums (x, y, z, w) for the QR/SVD assembly. The Kokkos port omitted the atomic_add for sums.w (sum of S^4 for in-the-money paths), leaving final_sums.w always zero and corrupting the SVD and subsequent regression. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: kento <1034379+kento@users.noreply.github.com>
adam-kokkos: eps constant was 1e-10f instead of 1e-8f from the CUDA reference. The smaller epsilon makes the Adam optimizer denominator smaller, producing numerically incorrect parameter updates. romberg-kokkos: getFirstSetBitPos used logf(x)/logf(2.f) to compute log2. Due to float32 rounding, logf(8192)/logf(2.f) = 12.999... which truncates to 12 instead of 13, and logf(32768)/logf(2.f) = 14.999... which truncates to 14 instead of 15. This misroutes 5 of the 65535 function evaluations into wrong Richardson extrapolation buckets. Fixed with the direct log2f intrinsic, matching the CUDA reference. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> Co-authored-by: kento <1034379+kento@users.noreply.github.com>
Agent-Logs-Url: https://github.com/kento/HeCBench/sessions/078c5e19-017b-4615-9e7a-4e2cd0914222 Co-authored-by: kento <1034379+kento@users.noreply.github.com>
Copilot created this pull request from a session on behalf of
kento
April 12, 2026 04:51
View session
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Reviews all 60 existing Kokkos benchmark ports for arithmetic correctness and begins porting the 432 benchmarks not yet on Kokkos.
Bug Fixes in Existing Ports
adam-kokkos:epswas1e-10finstead of1e-8f, causing the Adam denominatorsqrt(v_corrected + eps)to be too small and producing oversized parameter updatesromberg-kokkos:getFirstSetBitPosused(int)(logf(n)/logf(2.f))which truncates incorrectly at certain powers-of-two (e.g.logf(8192)/logf(2.f) = 12.999...→ 12 instead of 13); replaced withlog2f(n)to match the CUDA referenceNew Kokkos Ports (7)
cbsfil-kokkoscobahh-kokkosdepixel-kokkosecdh-kokkosexpdist-kokkosmemcpy-kokkospso-kokkosAll new ports follow the established pattern:
Kokkos::View+create_mirror_view/deep_copyfor data movement,parallel_for/parallel_reducefor kernels, andKokkos::atomic_*where needed. 425 benchmarks remain to be ported.