Skip to content

Fix TopologyMapper crash on HIP#1832

Merged
BradWhitlock merged 13 commits intodevelopfrom
bugfix/whitlock/fix_recent_warnings
Mar 30, 2026
Merged

Fix TopologyMapper crash on HIP#1832
BradWhitlock merged 13 commits intodevelopfrom
bugfix/whitlock/fix_recent_warnings

Conversation

@BradWhitlock
Copy link
Copy Markdown
Member

@BradWhitlock BradWhitlock commented Mar 27, 2026

TopologyMapper in Axom develop was reported to be crashing on GPU with HIP when built into a host code. This PR fixes that. Axom's own build and tests did not exhibit this problem so it could be related to compiler, flags, etc.

  • Changed 2D polygon clipping so it uses fewer intermediate polygon objects since on GPU they have static storage and are largish. This alleviated some register pressure, improving reliability of the affected kernel.
  • Changed some error checking code in the affected kernel that was using SLIC_ASSERT() in a few places. For HIP, we now just return early from the lambda. The SLIC_ASSERT() macros were indicating some kind of out of bounds but it seems more likely that using device assert() in this case was introducing errors.
  • Fixed some newer warning messages in the code
  • Made slight changes to StaticArray since it seemed implicated in the crash and added new tests.

NOTE: I have held off making the 0.14.0 release to create this fix and can get to that once this is merged.

@BradWhitlock BradWhitlock requested review from Arlie-Capps, bmhan12, kennyweiss, publixsubfan and rhornung67 and removed request for kennyweiss March 28, 2026 00:09

echo "~~~~~~ RUNNING TESTS ~~~~~~~~"
make CTEST_OUTPUT_ON_FAILURE=1 test ARGS='-T Test -VV -j$NUM_BUILD_PROCS'
make CTEST_OUTPUT_ON_FAILURE=1 test ARGS='-T Test --output-on-failure -j$NUM_BUILD_PROCS'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the previous -VV flag here?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be there. Was there an issue with too much output?

// Uncomment to emit debugging messages
//#define AXOM_DEBUG_TOPOLOGY_MAPPER

#if defined(AXOM_DEVICE_CODE) && defined(AXOM_USE_HIP)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about the CUDA case?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment, this is a HIP-specific workaround. I have not run into problems with SLIC_ASSERT() with CUDA.

Copy link
Copy Markdown
Member

@rhornung67 rhornung67 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of minor questions.

@BradWhitlock BradWhitlock merged commit cb9e11e into develop Mar 30, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants