This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating weights from multiple threads/processes fails (ESMC_GridCreateNoPeriDim) #141
Comments
@rokuingh can you please take a look |
I just tried to reproduce this issue, but was unable to install one of the dependencies, pooch, due to a number of sub dependency conflicts. The rc code 545 generally indicates unmapped points. However, the log file has a message about a vm not being available.. which is a much more serious issue that is most likely due to something outside of ESMPy. |
@rokuingh - if you have access to a conda (or mamba installation), this command will get you an exact replica of the environment this test was made in:
|
@jhamman We are happy to provide support for ESMPy, but we don't currently have the resources (or expertise) to support the additional layers within xESMF. Could you please provide an ESMPy only reproducer that exhibits the issue? We can definitely work from that. |
@rsdunlapiv and @rokuingh - here's a reproducer that does not use xesmf or xarray. import ESMF
import numpy as np
import dask
@dask.delayed
def make_grid(shape):
g = ESMF.Grid(
np.array(shape),
staggerloc=ESMF.StaggerLoc.CENTER,
coord_sys=ESMF.CoordSys.SPH_DEG,
num_peri_dims=None, # with out this, ESMF seems to seg fault (clue?)
)
return g
tasks = [make_grid((59, 87)), make_grid((60, 88))]
# this works
dask.compute(tasks, scheduler='single-threaded')
# this fails
dask.compute(tasks, scheduler='threads') |
@jhamman one question is whether we would even expect the second one to work. I am not sure of the semantics of Dask 'single-threaded' versus 'threads'. In general, ESMF is going to be multi-process, but not multi-threaded. So Dask would need to give ESMF/ESMPy multiple MPI processes in order to run in parallel. Can this be done through Dask? |
@rsdunlapiv - I'm not sure if this should work or not. If its not supposed to work, it may be nice to put a thread lock in place, or alternatively, raise a more informative error. Also, I think it would be good to restate my intended parallel behavior here. I want to generate regridding weights for two datasets in parallel. I do not want ESMF to do anything in parallel (or with MPI). I will say that my first example (most important) does work if called with the ...
/srv/conda/envs/notebook/lib/python3.9/site-packages/cloudpickle/cloudpickle_fast.py in dump()
600 def dump(self, obj):
601 try:
--> 602 return Pickler.dump(self, obj)
603 except RuntimeError as e:
604 if "recursion" in e.args[0]:
ValueError: ctypes objects containing pointers cannot be pickled |
In case others are having this same problem and find this thread as I did: I was getting this same error and log message ( from dask.distributed import Client
client=Client() |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
What happened:
When using xesmf inside a parallel framework, an opaque error is raised. I've observed this behavior using dask's threaded and distributed schedulers.
What you expected to happen:
I expected to be able to use xesmf within multiple processes. Or, if this is not possible, a descriptive error and/or documentation on the subject.
Minimal Complete Verifiable Example:
This simple example is just a slightly modified version of the basic example from the xesmf docs.
The traceback is here:
The
ESMF_LogFile
includes the following lines:Anything else we need to know?:
xref: JiaweiZhuang/xESMF#88
Environment:
Output of xr.show_versions() + xesmf + esmf
INSTALLED VERSIONS
commit: None
python: 3.9.7 | packaged by conda-forge | (default, Sep 29 2021, 19:20:46)
[GCC 9.4.0]
python-bits: 64
OS: Linux
OS-release: 5.4.0-1062-azure
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.12.1
libnetcdf: 4.8.1
xarray: 0.20.1
pandas: 1.3.4
numpy: 1.21.4
scipy: 1.7.3
netCDF4: 1.5.8
pydap: installed
h5netcdf: 0.11.0
h5py: 3.6.0
Nio: None
zarr: 2.10.3
cftime: 1.5.1.1
nc_time_axis: 1.4.0
PseudoNetCDF: None
rasterio: 1.2.10
cfgrib: None
iris: None
bottleneck: 1.3.2
dask: 2021.10.0
distributed: 2021.10.0
matplotlib: 3.5.0
cartopy: 0.20.1
seaborn: 0.11.2
numbagg: None
fsspec: 2021.11.1
cupy: None
pint: None
sparse: 0.13.0
setuptools: 59.4.0
pip: 21.3.1
conda: None
pytest: 6.2.5
IPython: 7.30.1
sphinx: None
xesmf: 0.6.2
ESMF: 8.2.0
cc @rokuingh, @norlandrhagen, @theurich
The text was updated successfully, but these errors were encountered: