You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There are two isolated test failures on Incline - one seg fault and one timeout. These are not occurring on Deception or Newell. TBD on other AMD platforms. These were introduced potentially with hiop@1.0.0
Creating a separate issue for these failures to isolate from #3 and #43 and let #84 continue without these tests blocking.
Exact commands to reproduce, if applicable
tests are being skipped in CI now, but either run tests manually or delete the incline-skip tag from those tests in the CMake.
20/57 Test #20: FUNCTIONALITY_TEST_OPFLOW_RAJAHIOP_SPARSE_GPU_TOML_TESTSUITE .................***Failed 2.76 sec
[ExaGO] Creating OPFlow Functionality Test
Test Description: datafiles/case9/case9mod.m base case
[Warning] Hiop does not understand option 'dualsInitialization' and will ignore its value 'zero'.
[Warning] Detected 1 fixed variables out of a total of 24.
===============
Hiop SOLVER
===============
Using 1 MPI ranks.
---------------
Problem Summary
---------------
Total number of variables: 24
lower/upper/lower_and_upper bounds: 16 / 16 / 16
Total number of equality constraints: 18
Total number of inequality constraints: 18
lower/upper/lower_and_upper bounds: 18 / 18 / 18
iter objective inf_pr inf_du lg(mu) alpha_du alpha_pr linesrch
0 1.0318125e+04 1.800e+00 4.460e+03 -1.00 0.000e+00 0.000e+00 -(-)
[0]PETSC ERROR: ------------------------------------------------------------------------
[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[0]PETSC ERROR: or see https://petsc.org/release/faq/#valgrind and https://petsc.org/release/faq/
[0]PETSC ERROR: configure using --with-debugging=yes, recompile, link, and run
[0]PETSC ERROR: to get more information on the crash.
[0]PETSC ERROR: Run with -malloc_debug to check if memory corruption is causing the crash.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 59.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
Is this issue only on Ascent OR does this happen on other platforms too?
This behavior is only happening on Incline (not Deception, Newell, or Ascent), @nkoukpaizan was also seeing similar failures on Frontier in #89 so likely AMD related
Issue type
Relates to
Summary
There are two isolated test failures on Incline - one seg fault and one timeout. These are not occurring on Deception or Newell. TBD on other AMD platforms. These were introduced potentially with hiop@1.0.0
Creating a separate issue for these failures to isolate from #3 and #43 and let #84 continue without these tests blocking.
Exact commands to reproduce, if applicable
incline-skip
tag from those tests in the CMake.Relevant logs and/or screenshots, if applicable
FUNCTIONALITY_TEST_OPFLOW_RAJAHIOP_SPARSE_GPU_TOML_TESTSUITE
FUNCTIONALITY_TEST_SOPFLOW_SCENARIO_RAJA_GPU_TOML
The text was updated successfully, but these errors were encountered: