Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallel runs fail for nozzle driver #280

Closed
anderson2981 opened this issue Mar 9, 2021 · 4 comments
Closed

parallel runs fail for nozzle driver #280

anderson2981 opened this issue Mar 9, 2021 · 4 comments
Assignees

Comments

@anderson2981
Copy link
Contributor

anderson2981 commented Mar 9, 2021

The driver fails to run in parallel with the following error:

The particular driver used to generate the error lives in the startup_ramp_euler subdirectory. This driver requires the y1_production branch to run correctly. The issue appears to be linked to incorrectly setting (or processing) boundary normals on ranks that have no points on a particular boundary.

`ValueError: DOFArray objects in binary operator must have same length, got 1 and 0
Traceback (most recent call last):
  File "/usr/WS1/xpacc/Users/manders/software/Install/Lassen/Conda/envs/mirgeDriver.Y1nozzle/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/WS1/xpacc/Users/manders/software/Install/Lassen/Conda/envs/mirgeDriver.Y1nozzle/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/usr/WS1/xpacc/Users/manders/software/Install/Lassen/Conda/envs/mirgeDriver.Y1nozzle/lib/python3.8/site-packages/mpi4py/__main__.py", line 7, in <module>
    main()
  File "/usr/WS1/xpacc/Users/manders/software/Install/Lassen/Conda/envs/mirgeDriver.Y1nozzle/lib/python3.8/site-packages/mpi4py/run.py", line 196, in main
    run_command_line(args)
  File "/usr/WS1/xpacc/Users/manders/software/Install/Lassen/Conda/envs/mirgeDriver.Y1nozzle/lib/python3.8/site-packages/mpi4py/run.py", line 47, in run_command_line
    run_path(sys.argv[0], run_name='__main__')
  File "/usr/WS1/xpacc/Users/manders/software/Install/Lassen/Conda/envs/mirgeDriver.Y1nozzle/lib/python3.8/runpy.py", line 265, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/usr/WS1/xpacc/Users/manders/software/Install/Lassen/Conda/envs/mirgeDriver.Y1nozzle/lib/python3.8/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/usr/WS1/xpacc/Users/manders/software/Install/Lassen/Conda/envs/mirgeDriver.Y1nozzle/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "./nozzle.py", line 565, in <module>
    main(use_profiling=use_profiling,use_logmgr=use_logging)
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/mirgecom/mirgecom/mpi.py", line 154, in wrapped_func
    func(*args, **kwargs)
  File "./nozzle.py", line 525, in main
    advance_state(rhs=my_rhs, timestepper=timestepper,
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/mirgecom/mirgecom/steppers.py", line 85, in advance_state
    state = timestepper(state=state, t=t, dt=dt, rhs=rhs)
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/mirgecom/mirgecom/integrators/lsrk.py", line 67, in euler_step
    return lsrk_step(EulerCoefs, state, t, dt, rhs)
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/mirgecom/mirgecom/integrators/lsrk.py", line 53, in lsrk_step
    k = coefs.A[i]*k + dt*rhs(t + coefs.C[i]*dt, state)
  File "./nozzle.py", line 495, in my_rhs
    return ( inviscid_operator(discr, q=state, t=t,boundaries=boundaries, eos=eos)
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/mirgecom/mirgecom/euler.py", line 324, in inviscid_operator
    domain_boundary_flux = sum(
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/mirgecom/mirgecom/euler.py", line 325, in <genexpr>
    _facial_flux(
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/mirgecom/mirgecom/euler.py", line 271, in _facial_flux
    flux_avg @ normal
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/meshmode/meshmode/dof_array.py", line 269, in __mul__
    def __mul__(self, arg): return self._bop(op.mul, self, arg)  # noqa: E704
  File "/p/gpfs1/manders/work/CEESD/Drivers/CEESD-Y1_nozzle/emirge/meshmode/meshmode/dof_array.py", line 252, in _bop
    raise ValueError("DOFArray objects in binary operator must have "
ValueError: DOFArray objects in binary operator must have same length, got 1 and 0`
@MTCam
Copy link
Member

MTCam commented Mar 9, 2021

The issue appears to be related to whether a given partition has points on the boundary. I'm trying to come up with a small example that runs out of master main.

@MTCam
Copy link
Member

MTCam commented Mar 9, 2021

On a partition that does not own part of a given boundary, the following triggers the error:

File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge/mirgecom/mirgecom/euler.py", line 274, in _facial_flux
    flux_avg @ normal
  File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-Timing/timing/emirge/meshmode/meshmode/dof_array.py", line 269, in __mul__
building face restriction: done
    def __mul__(self, arg): return self._bop(op.mul, self, arg)  # noqa: E704
  File "/p/gpfs1/mtcampbe/CEESD/AutomatedTesting/MIRGE-
Timing/timing/emirge/meshmode/meshmode/dof_array.py", line 252, in _bop
    raise ValueError("DOFArray objects in binary operator must have "
ValueError: DOFArray objects in binary operator must have same length, got 1 and 0
flux_avg=array([[DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),))],
       [DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),))],
       [DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),))],
       [DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),))],
       [DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),)),
        DOFArray((cl.Array([], shape=(0, 3), dtype=float64),))]],
      dtype=object)
normal=array([DOFArray(()), DOFArray(()), DOFArray(())], dtype=object)

@MTCam
Copy link
Member

MTCam commented Mar 9, 2021

This snippet in @anderson2981's nozzle driver fixes the issue:

    local_boundaries = {}
    for btag in boundaries:
        bnd_discr = discr.discr_from_dd(btag)
        bnd_nodes = thaw(actx, bnd_discr.nodes())
        if bnd_nodes[0][0].shape[0] > 0:
            local_boundaries[btag] = boundaries[btag]
    boundaries = local_boundaries

It just removes the non-local boundaries. After chatting with @majosm, it seems the right fix may be having the normal = thaw(actx, discr.normal(dd)) return the proper structure. See above post where flux_avg and normal are printed for a failing case.

@MTCam
Copy link
Member

MTCam commented Mar 15, 2021

Fixed by inducer/grudge#56.

@MTCam MTCam closed this as completed Mar 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants