parallel runs fail for nozzle driver #280

anderson2981 · 2021-03-09T16:30:16Z

The driver fails to run in parallel with the following error:

The particular driver used to generate the error lives in the startup_ramp_euler subdirectory. This driver requires the y1_production branch to run correctly. The issue appears to be linked to incorrectly setting (or processing) boundary normals on ranks that have no points on a particular boundary.

`ValueError: DOFArray objects in binary operator must have same length, got 1 and 0
Traceback (most recent call last):
  File "/usr/WS1/xpacc/Users/manders/software/Install/Lassen/Conda/envs/mirgeDriver.Y1nozzle/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/WS1/xpacc/Users/manders/software/Install/Lassen/Conda/envs/mirgeDriver.Y1nozzle/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/WS1/xpacc/Users/manders/software/Install/Lassen/Conda/envs/mirgeDriver.Y1nozzle/lib/python3.8/site-packages/mpi4py/__main__.py", line 7, in <module>
    main()
  File "/usr/WS1/xpacc/Users/manders/software/Install/Lassen/Conda/envs/mirgeDriver.Y1nozzle/lib/python3.8/site-packages/mpi4py/run.py", line 196, in main
    run_command_line(args)
  File "/usr/WS1/xpacc/Users/manders/software/Install/Lassen/Conda/envs/mirgeDriver.Y1nozzle/lib/python3.8/site-packages/mpi4py/run.py", line 47, in run_command_line
    run_path(sys.argv[0], run_name='__main__')
  File "/usr/WS1/xpacc/Users/manders/software/Install/Lassen/Conda/envs/mirgeDriver.Y1nozzle/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/usr/WS1/xpacc/Users/manders/software/Install/Lassen/Conda/envs/mirgeDriver.Y1nozzle/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/usr/WS1/xpacc/Users/manders/software/Install/Lassen/Conda/envs/mirgeDriver.Y1nozzle/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "./nozzle.py", line 565, in <module>
    main(use_profiling=use_profiling,use_logmgr=use_logging)
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/mirgecom/mirgecom/mpi.py", line 154, in wrapped_func
    func(*args, **kwargs)
  File "./nozzle.py", line 525, in main
    advance_state(rhs=my_rhs, timestepper=timestepper,
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/mirgecom/mirgecom/steppers.py", line 85, in advance_state
    state = timestepper(state=state, t=t, dt=dt, rhs=rhs)
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/mirgecom/mirgecom/integrators/lsrk.py", line 67, in euler_step
    return lsrk_step(EulerCoefs, state, t, dt, rhs)
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/mirgecom/mirgecom/integrators/lsrk.py", line 53, in lsrk_step
    k = coefs.A[i]*k + dt*rhs(t + coefs.C[i]*dt, state)
  File "./nozzle.py", line 495, in my_rhs
    return ( inviscid_operator(discr, q=state, t=t,boundaries=boundaries, eos=eos)
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/mirgecom/mirgecom/euler.py", line 324, in inviscid_operator
    domain_boundary_flux = sum(
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/mirgecom/mirgecom/euler.py", line 325, in <genexpr>
    _facial_flux(
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/mirgecom/mirgecom/euler.py", line 271, in _facial_flux
    flux_avg @ normal
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/meshmode/meshmode/dof_array.py", line 269, in __mul__
    def __mul__(self, arg): return self._bop(op.mul, self, arg)  # noqa: E704
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/meshmode/meshmode/dof_array.py", line 252, in _bop
    raise ValueError("DOFArray objects in binary operator must have "
ValueError: DOFArray objects in binary operator must have same length, got 1 and 0`

The text was updated successfully, but these errors were encountered:

MTCam · 2021-03-09T16:45:32Z

The issue appears to be related to whether a given partition has points on the boundary. I'm trying to come up with a small example that runs out of ~~master~~ main.

MTCam · 2021-03-09T19:08:38Z

On a partition that does not own part of a given boundary, the following triggers the error:

File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge/mirgecom/mirgecom/euler.py", line 274, in _facial_flux
    flux_avg @ normal
  File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge/meshmode/meshmode/dof_array.py", line 269, in __mul__
building face restriction: done
    def __mul__(self, arg): return self._bop(op.mul, self, arg)  # noqa: E704
  File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-
Timing/timing/emirge/meshmode/meshmode/dof_array.py", line 252, in _bop
    raise ValueError("DOFArray objects in binary operator must have "
ValueError: DOFArray objects in binary operator must have same length, got 1 and 0
flux_avg=array([[DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),))],
       [DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),))],
       [DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),))],
       [DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),))],
       [DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),))]],
      dtype=object)
normal=array([DOFArray(()), DOFArray(()), DOFArray(())], dtype=object)

MTCam · 2021-03-09T19:26:25Z

This snippet in @anderson2981's nozzle driver fixes the issue:

    local_boundaries = {}
    for btag in boundaries:
        bnd_discr = discr.discr_from_dd(btag)
        bnd_nodes = thaw(actx, bnd_discr.nodes())
        if bnd_nodes[0][0].shape[0] > 0:
            local_boundaries[btag] = boundaries[btag]
    boundaries = local_boundaries

It just removes the non-local boundaries. After chatting with @majosm, it seems the right fix may be having the normal = thaw(actx, discr.normal(dd)) return the proper structure. See above post where flux_avg and normal are printed for a failing case.

MTCam · 2021-03-15T21:45:46Z

Fixed by inducer/grudge#56.

anderson2981 assigned MTCam Mar 9, 2021

MTCam mentioned this issue Mar 10, 2021

Boundary bug snippet #281

Closed

w-hagen added a commit to w-hagen/mirgecom that referenced this issue Mar 15, 2021

Fix some style issues and add temporary fix for Issue illinois-ceesd#280

c1cd479

MTCam mentioned this issue Mar 15, 2021

Empty state mismatch inducer/grudge#54

Closed

MTCam closed this as completed Mar 15, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel runs fail for nozzle driver #280

parallel runs fail for nozzle driver #280

anderson2981 commented Mar 9, 2021 •

edited by MTCam

Loading

MTCam commented Mar 9, 2021

MTCam commented Mar 9, 2021

MTCam commented Mar 9, 2021

MTCam commented Mar 15, 2021

parallel runs fail for nozzle driver #280

parallel runs fail for nozzle driver #280

Comments

anderson2981 commented Mar 9, 2021 • edited by MTCam Loading

MTCam commented Mar 9, 2021

MTCam commented Mar 9, 2021

MTCam commented Mar 9, 2021

MTCam commented Mar 15, 2021

anderson2981 commented Mar 9, 2021 •

edited by MTCam

Loading