Regression test fail on Spock/Crusher/Frontier/HIP #659

pgrete · 2022-03-24T16:10:22Z

$ ctest -L regression -LE "mpi-no|perf-reg" --output-on-failure
Test project /gpfs/alpine/ast146/world-shared/parth/build-hip-mpi
      Start 42: regression_mpi_test:particle_leapfrog
 1/10 Test #42: regression_mpi_test:particle_leapfrog ...........   Passed    1.30 sec
      Start 44: regression_mpi_test:particle_leapfrog_outflow
 2/10 Test #44: regression_mpi_test:particle_leapfrog_outflow ...   Passed    0.97 sec
      Start 46: regression_mpi_test:restart
 3/10 Test #46: regression_mpi_test:restart .....................   Passed   32.23 sec
      Start 48: regression_mpi_test:calculate_pi
 4/10 Test #48: regression_mpi_test:calculate_pi ................   Passed    0.94 sec
      Start 50: regression_mpi_test:advection_convergence
 5/10 Test #50: regression_mpi_test:advection_convergence .......   Passed   49.83 sec
      Start 52: regression_mpi_test:output_hdf5
 6/10 Test #52: regression_mpi_test:output_hdf5 .................***Failed   22.07 sec


test_dir=['/ccs/home/pgrete/src/parthenon/tst/regression/test_suites/output_hdf5']
output_dir='/gpfs/alpine/ast146/world-shared/parth/build-hip-mpi/tst/regression/outputs/output_hdf5_mpi'
driver=['/gpfs/alpine/ast146/world-shared/parth/build-hip-mpi/example/advection/advection-example']
driver_input=['/ccs/home/pgrete/src/parthenon/tst/regression/test_suites/output_hdf5/parthinput.advection']
kokkos_args=['--kokkos-num-devices=4']
num_steps=4
mpirun=['srun']
mpirun_ranks_flag='-n'
mpirun_ranks_num=4
mpirun_opts=[]
coverage=False
analyze=False
sparse_disabled=False
*****************************************************************
Beginning Python regression testing script
*****************************************************************

Initializing Test Case
Using:
driver at:       /gpfs/alpine/ast146/world-shared/parth/build-hip-mpi/example/advection/advection-example
driver input at: /ccs/home/pgrete/src/parthenon/tst/regression/test_suites/output_hdf5/parthinput.advection
test folder:     /ccs/home/pgrete/src/parthenon/tst/regression/test_suites/output_hdf5
output sent to:  /gpfs/alpine/ast146/world-shared/parth/build-hip-mpi/tst/regression/outputs/output_hdf5_mpi

Make output folder in test if does not exist
*****************************************************************
Preparing Test Case Step 1
*****************************************************************

*****************************************************************
Running Driver
*****************************************************************

Command to execute driver
srun -n 4 /gpfs/alpine/ast146/world-shared/parth/build-hip-mpi/example/advection/advection-example -i /ccs/home/pgrete/src/parthenon/tst/regression/test_suites/output_hdf5/parthinput.advection --kokkos-num-devices=4
*****************************************************************
Preparing Test Case Step 2
*****************************************************************

*****************************************************************
Running Driver
*****************************************************************

Command to execute driver
srun -n 4 /gpfs/alpine/ast146/world-shared/parth/build-hip-mpi/example/advection/advection-example -i /ccs/home/pgrete/src/parthenon/tst/regression/test_suites/output_hdf5/parthinput.advection parthenon/job/problem_id=advection_3d parthenon/mesh/numlevel=2 parthenon/mesh/nx1=32 parthenon/meshblock/nx1=8 parthenon/mesh/nx2=32 parthenon/meshblock/nx2=8 parthenon/mesh/nx3=32 parthenon/meshblock/nx3=8 parthenon/time/integrator=rk1 Advection/cfl=0.3 --kokkos-num-devices=4
*****************************************************************
Preparing Test Case Step 3
*****************************************************************

*****************************************************************
Preparing Test Case Step 4
*****************************************************************

Running with coverage
False
*****************************************************************
Analysing Driver Output
*****************************************************************

/gpfs/alpine/ast146/world-shared/parth/build-hip-mpi/tst/regression/outputs/output_hdf5_mpi

-------------------------------------------
    Filename=advection_2d.out0.final.phdf
-------------------------------------------
                Time=1.0000
              NCycle=569
             NumDims=2
            MaxLevel=2
           NumBlocks=130
       MeshBlockSize=[16 16  1]
       CellsPerBlock=256
          TotalCells=33280
      TotalCellsReal=33280
               NumPE=4
         BlocksPerPE=[32 32 33 33]
              NGhost=2
       IncludesGhost=0
         Coordinates=UniformCartesian
--------------------------------------------
           Variables=['Blocks', 'Info', 'Input', 'Levels', 'Locations', 'LogicalLocations', 'Params', 'SparseInfo', 'VolumeLocations', 'advected', 'one_minus_advected', 'one_minus_advected_sq', 'one_minus_sqrt_one_minus_advected_sq_12', 'one_minus_sqrt_one_minus_advected_sq_37']
--------------------------------------------


-------------------------------------------
    Filename=/autofs/nccs-svm1_home1/pgrete/src/parthenon/tst/regression/gold_standard/advection_2d.out0.final.phdf
-------------------------------------------
                Time=1.0000
              NCycle=569
             NumDims=2
            MaxLevel=2
           NumBlocks=130
       MeshBlockSize=[16 16  1]
       CellsPerBlock=256
          TotalCells=33280
      TotalCellsReal=33280
               NumPE=0
         BlocksPerPE=130
              NGhost=2
       IncludesGhost=0
         Coordinates=UniformCartesian
--------------------------------------------
           Variables=['Blocks', 'Info', 'Input', 'Levels', 'Locations', 'LogicalLocations', 'Params', 'SparseInfo', 'VolumeLocations', 'advected', 'one_minus_advected', 'one_minus_advected_sq', 'one_minus_sqrt_one_minus_advected_sq_12', 'one_minus_sqrt_one_minus_advected_sq_37']
--------------------------------------------

Checking metadata
                  Info: no diffs
                Params: no diffs
              Blocks/loc.lx123: no diffs
              Blocks/xmin: no diffs
                Levels: no diffs
     VolumeLocations/x: no diffs
     VolumeLocations/y: no diffs
     VolumeLocations/z: no diffs
      LogicalLocations: no diffs
            SparseInfo: no diffs
           Locations/x: no diffs
           Locations/y: no diffs
           Locations/z: no diffs
Metadata matches
____Comparing on a per variable basis with tolerance 1e-12
Tolerance = 1e-12
  advected            : no diffs
  one_minus_advected  : no diffs
  one_minus_sqrt_one_minus_advected_sq_12: no diffs
  one_minus_advected_sq: no diffs
  one_minus_sqrt_one_minus_advected_sq_37: no diffs

-------------------------------------------
    Filename=advection_3d.out0.final.phdf
-------------------------------------------
                Time=1.0000
              NCycle=214
             NumDims=3
            MaxLevel=1
           NumBlocks=204
       MeshBlockSize=[8 8 8]
       CellsPerBlock=512
          TotalCells=104448
      TotalCellsReal=104448
               NumPE=4
         BlocksPerPE=[51 51 51 51]
              NGhost=2
       IncludesGhost=0
         Coordinates=UniformCartesian
--------------------------------------------
           Variables=['Blocks', 'Info', 'Input', 'Levels', 'Locations', 'LogicalLocations', 'Params', 'SparseInfo', 'VolumeLocations', 'advected', 'one_minus_advected', 'one_minus_advected_sq', 'one_minus_sqrt_one_minus_advected_sq_12', 'one_minus_sqrt_one_minus_advected_sq_37']
--------------------------------------------


-------------------------------------------
    Filename=/autofs/nccs-svm1_home1/pgrete/src/parthenon/tst/regression/gold_standard/advection_3d.out0.final.phdf
-------------------------------------------
                Time=1.0000
              NCycle=214
             NumDims=3
            MaxLevel=1
           NumBlocks=204
       MeshBlockSize=[8 8 8]
       CellsPerBlock=512
          TotalCells=104448
      TotalCellsReal=104448
               NumPE=0
         BlocksPerPE=204
              NGhost=2
       IncludesGhost=0
         Coordinates=UniformCartesian
--------------------------------------------
           Variables=['Blocks', 'Info', 'Input', 'Levels', 'Locations', 'LogicalLocations', 'Params', 'SparseInfo', 'VolumeLocations', 'advected', 'one_minus_advected', 'one_minus_advected_sq', 'one_minus_sqrt_one_minus_advected_sq_12', 'one_minus_sqrt_one_minus_advected_sq_37']
--------------------------------------------

Checking metadata
                  Info: no diffs
                Params: no diffs
              Blocks/loc.lx123: no diffs
              Blocks/xmin: no diffs
                Levels: no diffs
     VolumeLocations/x: no diffs
     VolumeLocations/y: no diffs
     VolumeLocations/z: no diffs
      LogicalLocations: no diffs
            SparseInfo: no diffs
           Locations/x: no diffs
           Locations/y: no diffs
           Locations/z: no diffs
Metadata matches
____Comparing on a per variable basis with tolerance 1e-12
Tolerance = 1e-12
Diff in advected            
    bkji: ( 190,   7,   0,   3)
    zyx: (0.117188,0.007812,0.179688)
    err_mag: 0.008055
    f0: 3.5469e-02
    f1: 2.7414e-02
  advected            : differs
Diff in one_minus_advected  
    bkji: ( 190,   7,   0,   3)
    zyx: (0.117188,0.007812,0.179688)
    err_mag: 0.008055
    f0: 9.6453e-01
    f1: 9.7259e-01
  one_minus_advected  : differs
Diff in one_minus_sqrt_one_minus_advected_sq_12
    bkji: ( 190,   7,   0,   3)
    zyx: (0.117188,0.007812,0.179688)
    err_mag: 0.008055
    f0: 3.5469e-02
    f1: 2.7414e-02
  one_minus_sqrt_one_minus_advected_sq_12: differs
Diff in one_minus_advected_sq
    bkji: ( 190,   7,   0,   3)
    zyx: (0.117188,0.007812,0.179688)
    err_mag: 0.015603
    f0: 9.3032e-01
    f1: 9.4592e-01
  one_minus_advected_sq: differs
Diff in one_minus_sqrt_one_minus_advected_sq_37
    bkji: ( 190,   7,   0,   3)
    zyx: (0.117188,0.007812,0.179688)
    err_mag: 0.008055
    f0: 9.6453e-01
    f1: 9.7259e-01
  one_minus_sqrt_one_minus_advected_sq_37: differs
Wrong total in hst output of 3D problem
Traceback (most recent call last):
  File "/ccs/home/pgrete/src/parthenon/tst/regression/run_test.py", line 204, in <module>
    main(**vars(args))
  File "/ccs/home/pgrete/src/parthenon/tst/regression/run_test.py", line 93, in main
    raise TestError("Test " + test_manager.test + " failed")
__main__.TestError: Test output_hdf5 failed

      Start 54: regression_mpi_test:advection_outflow
 7/10 Test #54: regression_mpi_test:advection_outflow ...........   Passed    1.80 sec
      Start 56: regression_mpi_test:bvals
 8/10 Test #56: regression_mpi_test:bvals .......................   Passed    2.79 sec
      Start 58: regression_mpi_test:poisson
 9/10 Test #58: regression_mpi_test:poisson .....................   Passed    4.72 sec
      Start 60: regression_mpi_test:sparse_advection
10/10 Test #60: regression_mpi_test:sparse_advection ............   Passed   18.31 sec

90% tests passed, 1 tests failed out of 10

Label Time Summary:
cuda          = 134.95 sec*proc (10 tests)
mpi-yes       = 134.95 sec*proc (10 tests)
poisson       =   4.72 sec*proc (1 test)
regression    = 134.95 sec*proc (10 tests)

Total Test time (real) = 134.96 sec

The following tests FAILED:
	 52 - regression_mpi_test:output_hdf5 (Failed)
Errors while running CTest

The text was updated successfully, but these errors were encountered:

pgrete · 2022-03-24T16:17:01Z

All tests pass for non-MPI runs.

pgrete · 2022-10-27T08:55:08Z

@lroberts36 I think I identified that the root cause of the issue is related to sparse variables.

When I run the regression test manually

./example/advection/advection-example -i ../tst/regression/test_suites/advection_convergence/parthinput.advection parthenon/mesh/nx1=80 parthenon/meshblock/nx1=40 parthenon/mesh/nx2=60 parthenon/meshblock/nx2=30 parthenon/mesh/nx3=72 parthenon/meshblock/nx3=36 parthenon/mesh/x1min=-1.5 parthenon/mesh/x1max=1.5 parthenon/mesh/x2min=-0.75 parthenon/mesh/x2max=0.75 parthenon/mesh/x3min=-1.0 parthenon/mesh/x3max=1.0 Advection/vx=3.0 Advection/vy=1.5 Advection/vz=2.0 Advection/profile=smooth_gaussian Advection/amp=1.0

The test will fail at random times (sometime cycles 4, sometimes 153, sometimes during init) with the following error :0: : Device-side assertion Advected not properly initialized.' failed.`.
(Soft) disabling sparse make the code run without problems.

I still don't know what the root cause is, so if it's a bug in the sparse machinery or if it's a bug/problem with HIP/AMD GPUs.
Note that this

happens without MPI
happens with fewer optimizations, e.g., opmi-O2 or even -O0
happens with launch and api blocking enabled
does not happen any more when there are fewer than 8 blocks for the mesh (I varied the block/nx# above for various combinations and with a split to just 4 blocks, independent of whether just in one or two dims, it works)
does not happen with pack_size=1 (rather than -1)
does not happen in debug mode

Yurlungur · 2022-10-27T14:02:53Z

I wonder if this is related to a failure to identify whether or not a pack should be rebuilt. @lroberts36 and I discussed this and a solution at some point. I think there's a version in riot with the fix.

lroberts36 · 2022-11-01T18:35:12Z

Tried to reproduce on a Darwin amd-epyc-gpu node with an MI25 GPU. Compiled and ran using

Currently Loaded Modules:
  1) rocm/4.5.2   2) cmake/3.22.2
  
$ cmake .. -DKokkos_ENABLE_HIP=ON -DCMAKE_CXX_COMPILER=hipcc -DPARTHENON_DISABLE_HDF5=ON -DPARTHENON_DISABLE_MPI=ON -DKokkos_ARCH_VEGA900=ON

$ for X in $(seq 100); do echo RUN $X; ./example/advection/advection-example -i ../tst/regression/test_suites/advection_convergence/parthinput.advection parthenon/mesh/nx1=80 parthenon/meshblock/nx1=40 parthenon/mesh/nx2=60 parthenon/meshblock/nx2=30 parthenon/mesh/nx3=72 parthenon/meshblock/nx3=36 parthenon/mesh/x1min=-1.5 parthenon/mesh/x1max=1.5 parthenon/mesh/x2min=-0.75 parthenon/mesh/x2max=0.75 parthenon/mesh/x3min=-1.0 parthenon/mesh/x3max=1.0 Advection/vx=3.0 Advection/vy=1.5 Advection/vz=2.0 Advection/profile=smooth_gaussian Advection/amp=1.0 |grep ERROR; done

but was unable to reproduce the issue.

pgrete · 2022-11-01T18:37:53Z

I also tried to reproduce on Spock with MI100 and was not able to.

Now the question is whether this is an issue of performance (MI250[x] vs slower devices), libs or hardware...

pgrete · 2022-11-09T15:29:21Z

Alright, "good" news. I got one more data point. On a different machine and a different environment the test similarly fails on MI250X (so we can rule out the OLCF environment).
As before, disabling sparse makes the code work.

pgrete · 2022-11-09T15:36:08Z

Also #699 does not fix the issue.

pgrete · 2022-12-12T16:53:23Z

Alright, here's what went wrong after a long, joint debugging session with @Yurlungur

The offending kernel was related to checking/setting the allocation status.
A minimal reproducer is:

auto main(int argc, char *argv[]) -> int {
  Kokkos::initialize(argc, argv);
  {

    size_t num_b = 26 * 4;
    Kokkos::View<bool *> nonzero_flags("nonzero_flags", num_b);

    size_t num_idx = 16;

    Kokkos::View<double **> work("work", num_b, num_idx);

    Kokkos::parallel_for(
        "set flags", Kokkos::TeamPolicy<>(num_b, Kokkos::AUTO),
        KOKKOS_LAMBDA(const Kokkos::TeamPolicy<>::member_type &member) {
          const int b = member.league_rank();

          nonzero_flags(b) = false;

          Kokkos::parallel_for(Kokkos::TeamThreadRange<>(member, num_idx),
                               [&](const int idx) {
                                 work(b, idx) = 1.0;
                                 if (std::abs(work(b, idx)) >= 0.0) {
                                   nonzero_flags(b) = true;
                                 }
                               });
        });

    auto nonzero_flags_h =
        Kokkos::create_mirror_view_and_copy(Kokkos::HostSpace(), nonzero_flags);

    for (auto b = 0; b < num_b; b++) {
      if (!nonzero_flags_h(b)) {
        std::cerr << "HEEEEEEEEEEEEEELP!!!\n";
      }
    }
  }
  Kokkos::finalize();

There are actually two issues:

The more obvious one: having multiple thread in parallel write to nonzero_flags(b) = true; is technically undefined behavior. However, this is practically not the issue here, because for the given kernel we were just interested if any thread, at any point, set that variable to true, which seems to work on all architectures (but again may not be guaranteed)
The more subtle one: there may not be enough work for all threads in the team for the inner loop (particularly on devices with high concurrency like the MI250x). Therefore, a first fraction of the threads in a team may reach the inner loop and finish it, and only then other fractions/the remainder of threads start the outer loop, reset nonzero_flags(b) = false; but never enter the inner loop (as it's already done) so that the value remains false when the entire team is done.

…tus) (#800) * Fixes #659 * Replace lambda with if * Typo: BOr to LOr * Apply suggestions from code review Co-authored-by: Luke Roberts <lfroberts@lanl.gov> Co-authored-by: Jonah Miller <jonah.maxwell.miller@gmail.com> * format * Fix dealloc Co-authored-by: Luke Roberts <lfroberts@lanl.gov> Co-authored-by: Jonah Miller <jonah.maxwell.miller@gmail.com>

pgrete · 2022-12-16T11:04:54Z

Serial runs seem to work now. The test suite passed 10 times in a row

Test project /gpfs/alpine/ast146/world-shared/crusher-debug/parthenon/build-frontier-fix-next-nompi
      Start 45: regression_test:particle_leapfrog
 1/10 Test #45: regression_test:particle_leapfrog ...........   Passed    1.25 sec
      Start 46: regression_test:particle_leapfrog_outflow
 2/10 Test #46: regression_test:particle_leapfrog_outflow ...   Passed    0.75 sec
      Start 47: regression_test:restart
 3/10 Test #47: regression_test:restart .....................   Passed   43.83 sec
      Start 48: regression_test:calculate_pi
 4/10 Test #48: regression_test:calculate_pi ................   Passed    0.77 sec
      Start 49: regression_test:advection_convergence
 5/10 Test #49: regression_test:advection_convergence .......   Passed   53.88 sec
      Start 50: regression_test:output_hdf5
 6/10 Test #50: regression_test:output_hdf5 .................   Passed   26.22 sec
      Start 51: regression_test:advection_outflow
 7/10 Test #51: regression_test:advection_outflow ...........   Passed    1.80 sec
      Start 52: regression_test:bvals
 8/10 Test #52: regression_test:bvals .......................   Passed    1.92 sec
      Start 53: regression_test:poisson
 9/10 Test #53: regression_test:poisson .....................   Passed    2.58 sec
      Start 54: regression_test:sparse_advection
10/10 Test #54: regression_test:sparse_advection ............   Passed   28.42 sec

100% tests passed, 0 tests failed out of 10

Label Time Summary:
poisson       =   2.58 sec*proc (1 test)
regression    = 161.41 sec*proc (10 tests)
serial        = 161.41 sec*proc (10 tests)

Total Test time (real) = 161.42 sec

with MPI, less so

$ ctest -L regression -L mpi -LE perf-reg
Test project /gpfs/alpine/ast146/world-shared/crusher-debug/parthenon/build-frontier-fix-next-mpi-rocm54
      Start 47: regression_mpi_test:particle_leapfrog
 1/10 Test #47: regression_mpi_test:particle_leapfrog ...........***Failed    2.25 sec
      Start 49: regression_mpi_test:particle_leapfrog_outflow
 2/10 Test #49: regression_mpi_test:particle_leapfrog_outflow ...***Failed    2.13 sec
      Start 51: regression_mpi_test:restart
 3/10 Test #51: regression_mpi_test:restart .....................   Passed   25.15 sec
      Start 53: regression_mpi_test:calculate_pi
 4/10 Test #53: regression_mpi_test:calculate_pi ................   Passed    2.14 sec
      Start 55: regression_mpi_test:advection_convergence
 5/10 Test #55: regression_mpi_test:advection_convergence .......***Failed   55.81 sec
      Start 57: regression_mpi_test:output_hdf5
 6/10 Test #57: regression_mpi_test:output_hdf5 .................***Failed   15.51 sec
      Start 59: regression_mpi_test:advection_outflow
 7/10 Test #59: regression_mpi_test:advection_outflow ...........   Passed    2.73 sec
      Start 61: regression_mpi_test:bvals
 8/10 Test #61: regression_mpi_test:bvals .......................   Passed    6.57 sec
      Start 63: regression_mpi_test:poisson
 9/10 Test #63: regression_mpi_test:poisson .....................   Passed    5.09 sec
      Start 65: regression_mpi_test:sparse_advection
10/10 Test #65: regression_mpi_test:sparse_advection ............***Failed   11.33 sec

50% tests passed, 5 tests failed out of 10

Label Time Summary:
cuda          = 128.72 sec*proc (10 tests)
mpi           = 128.72 sec*proc (10 tests)
poisson       =   5.09 sec*proc (1 test)
regression    = 128.72 sec*proc (10 tests)

Total Test time (real) = 128.73 sec

The following tests FAILED:
	 47 - regression_mpi_test:particle_leapfrog (Failed)
	 49 - regression_mpi_test:particle_leapfrog_outflow (Failed)
	 55 - regression_mpi_test:advection_convergence (Failed)
	 57 - regression_mpi_test:output_hdf5 (Failed)
	 65 - regression_mpi_test:sparse_advection (Failed)
Errors while running CTest

pgrete · 2022-12-16T11:16:28Z

https://github.com/parthenon-hpc-lab/parthenon/blob/develop/src/interface/update.cpp#L162

  Kokkos::parallel_for(
      "SparseDealloc",
      Kokkos::TeamPolicy<>(parthenon::DevExecSpace(), pack.GetNBlocks(), Kokkos::AUTO),
      KOKKOS_LAMBDA(parthenon::team_mbr_t team_member) {
        const int b = team_member.league_rank();

        const int lo = pack.GetLowerBound(b);
        const int hi = pack.GetUpperBound(b);

        for (int v = lo; v <= hi; ++v) {
          const auto &var = pack(b, v);
          const Real threshold = var.deallocation_threshold;
          bool all_zero = true;
          Kokkos::parallel_reduce(
              Kokkos::TeamThreadRange<>(team_member, NkNjNi),
              [&](const int idx, bool &lall_zero) {
                const int k = kb.s + idx / NjNi;
                const int j = jb.s + (idx % NjNi) / Ni;
                const int i = ib.s + idx % Ni;
                if (std::abs(var(k, j, i)) > threshold) {
                  lall_zero = false;
                  return;
                }
              },
              Kokkos::LAnd<bool, DevMemSpace>(all_zero));
          Kokkos::single(Kokkos::PerTeam(team_member),
                         [&]() { is_zero(b, v) = all_zero; });
        }
      });

@lroberts36 what does this check mean for vector or tensor sparse variable (as the fourth index is extracted from a pack and but the threshold is for the var)?

pgrete · 2022-12-16T15:10:50Z

Good news! Looks like this was actually a (fixed) Kokkos bug.
After some back and forth updating the Kokkos submodule to 3.7.01 now also makes the tests pass on Crusher with MPI enabled.
I'll open a PR next week (including a machine file for Crusher and Frontier).

Independently, the question above on the SparseDealloc kernel remains open.

lroberts36 · 2022-12-16T16:09:26Z

https://github.com/parthenon-hpc-lab/parthenon/blob/develop/src/interface/update.cpp#L162

  Kokkos::parallel_for(
      "SparseDealloc",
      Kokkos::TeamPolicy<>(parthenon::DevExecSpace(), pack.GetNBlocks(), Kokkos::AUTO),
      KOKKOS_LAMBDA(parthenon::team_mbr_t team_member) {
        const int b = team_member.league_rank();

        const int lo = pack.GetLowerBound(b);
        const int hi = pack.GetUpperBound(b);

        for (int v = lo; v <= hi; ++v) {
          const auto &var = pack(b, v);
          const Real threshold = var.deallocation_threshold;
          bool all_zero = true;
          Kokkos::parallel_reduce(
              Kokkos::TeamThreadRange<>(team_member, NkNjNi),
              [&](const int idx, bool &lall_zero) {
                const int k = kb.s + idx / NjNi;
                const int j = jb.s + (idx % NjNi) / Ni;
                const int i = ib.s + idx % Ni;
                if (std::abs(var(k, j, i)) > threshold) {
                  lall_zero = false;
                  return;
                }
              },
              Kokkos::LAnd<bool, DevMemSpace>(all_zero));
          Kokkos::single(Kokkos::PerTeam(team_member),
                         [&]() { is_zero(b, v) = all_zero; });
        }
      });

@lroberts36 what does this check mean for vector or tensor sparse variable (as the fourth index is extracted from a pack and but the threshold is for the var)?

@pgrete: I believe this should work as expected (i.e. if any component of a vector or tensor valued field is above threshold it should not trigger deallocation). This loop checks every (scalar/vector/tensor) element of the control fields for being above zero and stores in is_zero. Then in the next set of loops, we iterate over all components of a given variable and check that is_zero_h is true for all of them before incrementing the deallocation counter.

pgrete · 2023-01-17T10:39:38Z

Fixed in #805

pgrete added the bug Something isn't working label Mar 24, 2022

This was referenced Apr 26, 2022

Fix MPI tag creation #661

Merged

sparse_advection-example unallocated sparse fluxes #668

Open

pgrete mentioned this issue Oct 7, 2022

recommended build settings for Crusher #750

Closed

pgrete changed the title ~~Regression test fail on Spock/HIP~~ Regression test fail on Spock/Crusher/Frontier/HIP Oct 27, 2022

pgrete pinned this issue Oct 27, 2022

pgrete mentioned this issue Nov 17, 2022

Static mesh refinement and GLMMHD fail on Delta GPU A40s parthenon-hpc-lab/athenapk#10

Closed

lroberts36 mentioned this issue Dec 2, 2022

Bug fix in boundary communication #797

Merged

6 tasks

pgrete added a commit that referenced this issue Dec 12, 2022

Fixes #659

77c5acd

pgrete mentioned this issue Dec 12, 2022

Fixes #659 (rare and intermitted race condition to set allocation status) #800

Merged

7 tasks

pgrete closed this as completed in #800 Dec 15, 2022

pgrete reopened this Dec 16, 2022

pgrete closed this as completed Jan 17, 2023

pgrete linked a pull request Jan 17, 2023 that will close this issue

Add Crusher/Frontier machine file and bump Kokkos to 3.7.01 #805

Merged

7 tasks

pgrete unpinned this issue Jan 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regression test fail on Spock/Crusher/Frontier/HIP #659

Regression test fail on Spock/Crusher/Frontier/HIP #659

pgrete commented Mar 24, 2022

pgrete commented Mar 24, 2022

pgrete commented Oct 27, 2022

Yurlungur commented Oct 27, 2022

lroberts36 commented Nov 1, 2022 •

edited

pgrete commented Nov 1, 2022

pgrete commented Nov 9, 2022

pgrete commented Nov 9, 2022

pgrete commented Dec 12, 2022

pgrete commented Dec 16, 2022

pgrete commented Dec 16, 2022

pgrete commented Dec 16, 2022

lroberts36 commented Dec 16, 2022

pgrete commented Jan 17, 2023

Regression test fail on Spock/Crusher/Frontier/HIP #659

Regression test fail on Spock/Crusher/Frontier/HIP #659

Comments

pgrete commented Mar 24, 2022

pgrete commented Mar 24, 2022

pgrete commented Oct 27, 2022

Yurlungur commented Oct 27, 2022

lroberts36 commented Nov 1, 2022 • edited

pgrete commented Nov 1, 2022

pgrete commented Nov 9, 2022

pgrete commented Nov 9, 2022

pgrete commented Dec 12, 2022

pgrete commented Dec 16, 2022

pgrete commented Dec 16, 2022

pgrete commented Dec 16, 2022

lroberts36 commented Dec 16, 2022

pgrete commented Jan 17, 2023

lroberts36 commented Nov 1, 2022 •

edited