Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New library for handling complex numbers in netCDF #8288

Closed
ZedThree opened this issue Oct 9, 2023 · 7 comments
Closed

New library for handling complex numbers in netCDF #8288

ZedThree opened this issue Oct 9, 2023 · 7 comments

Comments

@ZedThree
Copy link
Contributor

ZedThree commented Oct 9, 2023

Is your feature request related to a problem?

Complex numbers are not natively handled by netCDF, and currently only the h5netcdf engine in xarray can handle them. However, it produces files unreadable by current versions of netCDF. There are also many different conventions, and h5netcdf can only read a few of them.

Describe the solution you'd like

I've written a drop-in extension to netCDF that seamlessly handles complex numbers, and can read many different conventions.

The Python API is built on top of netCDF4-python, so only a very simple change to xarray is needed to use it:

modified   xarray/backends/api.py
@@ -115,7 +115,10 @@ def _get_default_engine_gz() -> Literal["scipy"]:
 def _get_default_engine_netcdf() -> Literal["netcdf4", "scipy"]:
     engine: Literal["netcdf4", "scipy"]
     try:
-        import netCDF4  # noqa: F401
+        try:
+            import nc_complex  # noqa: F401
+        except ImportError:
+            import netCDF4  # noqa: F401
 
         engine = "netcdf4"
     except ImportError:  # pragma: no cover
modified   xarray/backends/netCDF4_.py
@@ -327,7 +327,10 @@ class NetCDF4DataStore(WritableCFDataStore):
     def __init__(
         self, manager, group=None, mode=None, lock=NETCDF4_PYTHON_LOCK, autoclose=False
     ):
-        import netCDF4
+        try:
+            import nc_complex as netCDF4
+        except ImportError:
+            import netCDF4
 
         if isinstance(manager, netCDF4.Dataset):
             if group is None:
@@ -364,7 +367,10 @@ class NetCDF4DataStore(WritableCFDataStore):
         lock_maker=None,
         autoclose=False,
     ):
-        import netCDF4
+        try:
+            import nc_complex as netCDF4
+        except ImportError:
+            import netCDF4
 
         if isinstance(filename, os.PathLike):
             filename = os.fspath(filename)

and then it works out of the box:

import numpy as np
import xarray as xr

complex_array = np.array([0 + 0j, 1 + 0j, 0 + 1j, 1 + 1j, 0.25 + 0.75j], dtype="c16")

ds = xr.Dataset({"complex_data": ("x", complex_array)})
ds.to_netcdf("test.nc")

with xr.open_dataset("test.nc") as df:
    print(df.complex_data)
    assert ds == df

# <xarray.DataArray 'complex_data' (x: 5)>
# [5 values with dtype=complex128]
# Dimensions without coordinates: x

If the library is built against the latest main of netCDF-C, then it also supports files written by h5netcdf. And while the package isn't on PyPI yet, built wheels with this feature are available from CI jobs.

nc_complex is not quite ready yet, but I wanted to gauge interest for adding it as an optional replacement for netCDF4 in xarray.

Describe alternatives you've considered

No response

Additional context

No response

@dcherian
Copy link
Contributor

Hi @ZedThree this looks like a great improvement.

We've built out infrastructure that would allow you to pass engine="nc_complex" to open_*dataset. See https://docs.xarray.dev/en/latest/internals/how-to-add-new-backend.html and https://tutorial.xarray.dev/advanced/backends/backends.html

Can you give that a try and let us know how it goes?

@ZedThree
Copy link
Contributor Author

Thanks @dcherian, I've had a look and implemented my own backend. It does work fine, but unfortunately it takes about 170 lines as I need to duplicate a lot of the existing netcdf4 backend in order to change two lines. I'm worried I'd need to keep this up-to-date with xarray.

I think the only thing from netCDF4 used directly is netCDF4.Dataset. Maybe if there was a way to switch out this class in the backend, that would be better for everyone? For example, adding a DatasetClass property that I could then override:

class NetCDF4DataStore(WritableCFDataStore):
    ...

    @property
    def DatasetClass(self) -> type:
        import netCDF4
        return netCDF4.Dataset


class NcComplexDataStore(NetCDF4DataStore):
    @property
    def DatasetClass(self) -> type:
        from ._wrapper import Dataset

        return Dataset

@kmuehlbauer
Copy link
Contributor

kmuehlbauer commented Oct 19, 2023

@ZedThree First of all I want to thank you for your perseverance going through the instances.

These issues

and the CF hackaton meeting notes availabe from cf-convention/discuss#243 are interesting reads.

It seems that finally complex data is about to be standardized within cf-conventions.

From my perspective and from all what I've read so far in the above discussions and documents handling complex numbers should be a first class citizen upstream and seems to be best placed directly into netcdf4-python instead trying to retrofit that into xarray via backends.

I do not advocate against any backend usage until this is sorted out. A package which provides a backend to be used as suggested by @dcherian above and which would be available via PyPI (and conda-forge) would be really helpful in that sense.

@dcherian
Copy link
Contributor

@ZedThree that seems OK to me. However, this is a bit risky since we will not commit to not breaking your downstream code :) In practice, I suspect that will rarely happen.

Can you send in a PR and lets see what everyone else thinks?

Since netCDF4_.py is specifically a netCDF4 engine, can we just be simpler:

class NetCDF4DataStore(WritableCFDataStore):
	def __init__(self, ...):
        # do this to allow subclassing by external backend engines.
        DatasetClass = netCDF4.Dataset

@ZedThree
Copy link
Contributor Author

@kmuehlbauer I think it would be great if complex numbers were handled natively by netCDF, but as you've seen, it keeps tumbling upstream :) Even if we do get HDF5-native complex numbers, we will still be left with codes that can't/won't use them, which is why I would like to also solve this issue downstream too. netcdf4-python might be best middle ground though, so we'll see how the maintainers feel about it. More pressure from more communities is always welcome of course :)

@dcherian Sorry, I think the indentation in your snippet is messed up a little. Did you mean to just make DatasetClass a simple class-level attribute? If so, I think that can't work due to the delayed import of netCDF4 in the body of the functions. Apologies if I've misunderstood!

I'll try to whip up a PR so we can discuss details there

@kmuehlbauer
Copy link
Contributor

@ZedThree Yes, sometimes it takes a lot of effort, I'm totally with you ;-)

Looking forward to your PR.

@kmuehlbauer
Copy link
Contributor

This was implemented in netcdf4-python: Unidata/netcdf4-python#1295

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants