HDF error when trying to write Dataset read with rasterio to NetCDF #2535

loicdtx · 2018-11-01T13:21:15Z

I'm getting an HDF error when trying to write a Dataset read from GeoTiff (rasterio backend) to NetCDF. See reproducible example below:

import urllib.request
import tempfile
import os

import xarray as xr

path = tempfile.gettempdir()
url = 'https://earthexplorer.usgs.gov/browse/gisready/landsat_8/LC08_L1TP_026047_20180110_20180119_01_T1.zip'
filename = os.path.join(path, url.split('/')[-1])
nc_name = os.path.join(path, 'landsat_rgb.nc')

# Download file if not exist (11 Mb)
if not os.path.isfile(filename):
    urllib.request.urlretrieve(url, filename)

# Read rgb file using rasterio backend
rgb_name = '/'.join(['/vsizip', filename,
                     os.path.basename(filename).split('.')[-2] + '.tif'])
ds = xr.open_rasterio(rgb_name)
ds = ds.to_dataset('band').rename({1:'blue', 2:'green', 3:'red'})
print(ds)

# <xarray.Dataset>
# Dimensions:  (x: 7611, y: 7761)
# Coordinates:
#   * y        (y) float64 2.193e+06 2.193e+06 2.193e+06 ... 1.961e+06 1.960e+06
#   * x        (x) float64 3.732e+05 3.732e+05 3.733e+05 ... 6.015e+05 6.015e+05
# Data variables:
#     blue     (y, x) uint8 ...
#     red      (y, x) uint8 ...
#     green    (y, x) uint8 ...
# Attributes:
#     transform:   (30.0, 0.0, 373185.0, 0.0, -30.0, 2193315.0)
#     crs:         +init=epsg:32614
#     res:         (30.0, 30.0)
#     is_tiled:    1
#     nodatavals:  (nan, nan, nan)


# Write to netcdf
ds.to_netcdf(nc_name)

Output of `xr.show_versions()`

python -c "import xarray as xr; xr.show_versions()"

INSTALLED VERSIONS

commit: None
python: 3.6.5.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.0-36-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

xarray: 0.10.9
pandas: 0.23.4
numpy: 1.15.3
scipy: None
netCDF4: 1.4.2
h5netcdf: None
h5py: None
Nio: None
zarr: 2.2.0
cftime: 1.0.2.1
PseudonetCDF: None
rasterio: 1.0.9
iris: None
bottleneck: None
cyordereddict: None
dask: 0.20.0
distributed: None
matplotlib: None
cartopy: None
seaborn: None
setuptools: 40.5.0
pip: 18.1
conda: None
pytest: None
IPython: 7.1.1
sphinx: None

The text was updated successfully, but these errors were encountered:

jhamman · 2018-11-05T20:07:29Z

@loicdtx - thanks for the report. I just tried this using xarray master and didn't get an error. So, you could try that and see if that works. Since our last release, we've had a fairly significant refactor of the IO backends in xarray.

loicdtx · 2018-11-05T22:07:16Z

Hi @jhamman, just tried what you suggest. Apparently it's a scipy vs netCDF4 thing; it works with scipy but not with netCDF4 on both master HEAD and the latest stable release.

jhamman · 2018-11-05T22:11:11Z

Interesting. I'm not seeing any difference when using scipy or netcdf4 or h5netcdf.

(by the way, thank you for the reproducible example)

loicdtx · 2018-11-05T22:39:36Z

yes, that's interesting... are you using a different OS? Let me know if there's something else I can help with.

shoyer · 2018-11-06T05:09:37Z

Exactly what error message do you see?

loicdtx · 2018-11-06T11:05:39Z

@shoyer, below the full traceback; I installed everything with pip

mktmpenv -p python3
pip install xarray numpy netcdf4
pip install rasterio

Traceback (most recent call last):
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/backends/api.py", line 724, in to_netcdf
    unlimited_dims=unlimited_dims, compute=compute)
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/core/dataset.py", line 1179, in dump_to_store
    unlimited_dims=unlimited_dims)
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/backends/common.py", line 374, in store
    unlimited_dims=unlimited_dims)
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 406, in set_variables
    super(NetCDF4DataStore, self).set_variables(*args, **kwargs)
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/backends/common.py", line 413, in set_variables
    self.writer.add(source, target)
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/backends/common.py", line 272, in add
    target[...] = source
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 48, in __setitem__
    data[key] = value
  File "netCDF4/_netCDF4.pyx", line 4648, in netCDF4._netCDF4.Variable.__setitem__
  File "netCDF4/_netCDF4.pyx", line 4913, in netCDF4._netCDF4.Variable._put
  File "netCDF4/_netCDF4.pyx", line 1754, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "rasterio_reprex.py", line 41, in <module>
    ds.to_netcdf(nc_name)
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/core/dataset.py", line 1254, in to_netcdf
    compute=compute)
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/backends/api.py", line 729, in to_netcdf
    store.close()
  File "/home/loic/.virtualenvs/tmp-3caa6b25124e2f31/lib/python3.6/site-packages/xarray/backends/netCDF4_.py", line 474, in close
    ds.close()
  File "netCDF4/_netCDF4.pyx", line 2276, in netCDF4._netCDF4.Dataset.close
  File "netCDF4/_netCDF4.pyx", line 2260, in netCDF4._netCDF4.Dataset._close
  File "netCDF4/_netCDF4.pyx", line 1754, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error

jhamman · 2018-11-06T13:34:30Z

Is this the same error you get with xarray/master?

fmaussion · 2018-11-06T13:45:34Z

I can reproduce this on my machine - also linux. This is going to be hard to track down though.

loicdtx · 2018-11-06T13:57:52Z

@jhamman, yes, same error message when installing from master

ghost · 2018-12-03T19:33:17Z

I have similar problem, when importing rasterio in the same script (not even using it for anything). This fails with HDF error:

import xarray as xa
import numpy as np
#import netCDF4
import rasterio

ds = xa.Dataset()
ds['z'] = (('y', 'x'), np.zeros((100, 100), np.float32))
print(ds)
ds.to_netcdf('test.nc')
ds.close()

with xa.open_dataset('test.nc') as ds:
    print(ds)

If I import netCDF4 before rasterio it works fine (uncomment line 3). This is probably an issue with rasterio somehow.

I installed everything with pip:

$ pip install Cython
$ pip install netCDF4 xarray rasterio numpy

From pip freeze:

affine==2.2.1
attrs==18.2.0
cftime==1.0.3
Click==7.0
click-plugins==1.0.4
cligj==0.5.0
Cython==0.29.1
netCDF4==1.4.2
numpy==1.15.4
pandas==0.23.4
pyparsing==2.3.0
python-dateutil==2.7.5
pytz==2018.7
rasterio==1.0.11
six==1.11.0
snuggs==1.4.2
xarray==0.11.0

ghost · 2018-12-10T19:23:14Z

It seems that this is not a problem with xarray but only with rasterio and netCDF4. Also this fails:

import rasterio
import netCDF4

with netCDF4.Dataset('test.nc', mode='w') as ds:
    ds.createDimension('x')
    ds.createVariable('foo', float, dimensions=('x'))
    print(ds)

Commenting out import rasterio removes the HDF error. I’ll report this to rasterio.

shoyer · 2018-12-10T19:37:02Z

This looks like a binary incompatibility issue with wheels for rasterio and netCDF4 on PyPI.

Good alternatives (for now) would be building from source or using conda-forge.

ChristianF88 · 2019-06-07T13:04:33Z

Hi there,

I just wanted to let you know, that I do get the same error, when working with the following script. It does not use xarray or rasterio. So those will most likely not be the problem.

class Gif2NetCDF():
    def __init__(self, netcdf_name, netcdf_destfolder, gif_folder,
                 gif_filter_pattern=None, detailled_conversion=True):
        """

        Parameters
        ----------
        netcdf_name : str
            specifying the name of the netcdf file to write.

        netcdf_destfolder : str
            specifying the folder where the netcdf file is supposed to be created

        giffolder : str
            specifying the folder that contains the gif files, that are supposed to be written as an netcdf

        gif_filter_pattern : str
            specifying a re pattern to filter the all files in directory so you end up with the gif files you want.
            The here specified string will be passed to re.match which will be checked for each file in the provided giffolder


        Examples
        --------

        ## defining variables
        import netCDF4
        netcdf_name = "2016-07.nc"
        netcdf_destfolder = "./example/radar"
        giffolder = "./example/radar/2016-07/"

        ## create Instance
        cdf = Gif2NetCDF(netcdf_name, netcdf_destfolder, giffolder,gif_filter_pattern)
        ## write all (filtered) gifs in folder to netcdf file
        cdf.writeCDF()
        """

        ## creating global vars

        self.netcdf_name = netcdf_name
        self.netcdf_destfolder = netcdf_destfolder
        self.giffolder = gif_folder
        self.refilterpattern = gif_filter_pattern
        self.detailled_conversion = detailled_conversion

        self.netcdfFP = os.path.join(netcdf_destfolder, netcdf_name)

        # preparing the coordinate vectors
        self._lat_range = np.arange(479500, -160500, step=-1000)  # north
        self._long_range = np.arange(255500, 965500, step=1000)  # east

        # preparing time origin
        self.time_origin = dt.datetime.strptime("1970-01-01 00:00:00", "%Y-%m-%d %H:%M:%S")
        
        self.raincodes = np.array([....]) # this array is quite large, so I left it out... Which unfortunately means the code does not run... If anybody needs it please let me know

        return

    def list_gifs(self):
        self._gifs = os.listdir(self.giffolder)
        return self._gifs

    def filter_gifs(self):
        self._gifs = [file for file in self._gifs if re.match(self.refilterpattern, file)]
        return self._gifs

    def addDimensions(self):

        # adds dimensions to empty netcdf file
        self.latD = self.netcdfFile.createDimension('lat', self._lat_range.shape[0])  # north-south
        self.lonD = self.netcdfFile.createDimension('lon', self._long_range.shape[0])  # east-west
        self.timeD = self.netcdfFile.createDimension('time', None)

        return

    def addVariables(self):

        ## creating variables
        self.latV = self.netcdfFile.createVariable("chy", np.float32, ("lat",), complevel=9, zlib=True)  # north-south
        self.lonV = self.netcdfFile.createVariable("chx", np.float32, ("lon",), complevel=9, zlib=True)  # east-west
        self.timeV = self.netcdfFile.createVariable("time", np.float64, ("time",), complevel=9, zlib=True)

        self.rainV = self.netcdfFile.createVariable("rain", np.float32, ("time", "lat", "lon"), complevel=9, zlib=True,
                                                    fill_value=-100)

        ## adding units
        self.latV.units = "meters"
        self.lonV.units = "meters"

        self.timeV.units = "seconds"
        self.timeV.calender = "standard"

        self.rainV.units = "millimeter/hour"

        ## adding longname
        self.latV.long_name = "swiss northing CH1903"
        self.lonV.long_name = "swiss easting CH1903"
        self.timeV.long_name = "seconds since 1970-01-01 00:00:00"

        self.rainV.long_name = "precipitation intensity forecast"

        return

    def addDescription(self):

        self.netcdfFile.description = """..."""

        self.netcdfFile.history = """Created: {}""".format(dt.datetime.now().strftime("%Y-%m-%d %H:%M"))
        self.netcdfFile.source = '...'
        return

    def _write_static_dims(self):
        self.latV[:] = self._lat_range
        self.lonV[:] = self._long_range
        return

    def _write_time(self, file, datetime=None):

        if datetime is None:
            datestr = re.findall("\.([0-9]+)\.gif", file)[0]
            date = dt.datetime.strptime(datestr, "%Y%m%d%H%M")
        else:
            date = datetime

        seconds = (date - self.time_origin).total_seconds()
        current_size = self.timeV.size
        self.timeV[current_size] = seconds
        
        return

    def gif2array(self, file):
    
        xpix = 0
        ypix = 76
        n_pixel_x = 710 + xpix
        n_pixel_y = 640 + ypix

        gif = np.array(Image.open(file))[ypix:n_pixel_y, xpix:n_pixel_x].astype("float64")
        for idx, raincode in enumerate(self.raincodes):
            gif[gif == idx] = raincode[3]
        return gif

    def _write_rain(self, file):
        array = self.gif2array(os.path.join(self.giffolder, file))
        idx = self.rainV.shape[0] - 1
        self.rainV[idx, :, :] = array
        return

    def writeCDF(self):
        self.netcdfFile = Dataset(self.netcdfFP, 'w', format='NETCDF4_CLASSIC', )
        try:
            giflist = self.list_gifs()
            if self.refilterpattern is not None:
                fgiflist = self.filter_gifs()

            self.addDimensions()
            self.addVariables()
            self.addDescription()
            self._write_static_dims()

            for file in tqdm(self._gifs):
                self._write_time(file)
                self._write_rain(file)
                
        except Exception:
            self.netcdfFile.close()
            raise

        self.netcdfFile.close()

        return

Error

Traceback (most recent call last):
.
.
.
  File "C:\Users\foerstch\AppData\Local\Programs\Python\Python37\lib\site-packages\archiving\radar.py", line 358, in _write_rain
    idx = self.rainV.shape[0] - 1
  File "netCDF4\_netCDF4.pyx", line 4031, in netCDF4._netCDF4.Variable.shape.__get__
  File "netCDF4\_netCDF4.pyx", line 3369, in netCDF4._netCDF4.Dimension.__len__
  File "netCDF4\_netCDF4.pyx", line 1857, in netCDF4._netCDF4._ensure_nc_success
RuntimeError: NetCDF: HDF error

Does anybody have some advice on how to fix this?

Thanks a bunch!
Christian

dcherian · 2019-06-07T13:55:36Z

HDF error usually means a corrupt file. Does ncdump -h work on your file?

I've found that sometimes ncdump -h will succeed but there'll still be some corrupt data which will result in an error when you try to load variable values

ChristianF88 · 2019-06-07T15:00:08Z

Well the error occurs while writing the file. So if the reason is, that the file is corrupt then necdf4 is corrupting it... right? The problem is rather, that it does not even finish writing the file.

FYI: The error does not occur at the same file every time.

Have a good weekend!! ;)

alpha-beta-soup · 2019-09-04T21:50:38Z

Similar error when using xr.open_rasterio, a workaround seems to be to change the order in which my datasets are opened. Example:

import xarray as xr
ds = xr.open_dataset('/data/someFile.nc') # netcdf
m = xr.open_rasterio('/data/otherFile.tif') # geotif
# Everything is happy

import xarray as xr
m = xr.open_rasterio('/data/otherFile.tif') # geotif
ds = xr.open_dataset('/data/someFile.nc') # netcdf
# Results in the following error

Traceback (most recent call last):
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/file_manager.py", line 186, in _acquire_with_cache_info
    file = self._cache[self._key]
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/lru_cache.py", line 42, in __getitem__
    value = self._cache[key]
KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('/data/someFile.nc',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/api.py", line 420, in open_dataset
    filename_or_obj, group=group, lock=lock, **backend_kwargs)
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/netCDF4_.py", line 335, in open
    autoclose=autoclose)
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/netCDF4_.py", line 293, in __init__
    self.format = self.ds.data_model
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/netCDF4_.py", line 344, in ds
    return self._acquire()
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/netCDF4_.py", line 338, in _acquire
    with self._manager.acquire_context(needs_lock) as root:
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/file_manager.py", line 174, in acquire_context
    file, cached = self._acquire_with_cache_info(needs_lock)
  File "/usr/local/lib/python3.6/dist-packages/xarray/backends/file_manager.py", line 192, in _acquire_with_cache_info
    file = self._opener(*self._args, **kwargs)
  File "netCDF4/_netCDF4.pyx", line 2291, in netCDF4._netCDF4.Dataset.__init__
  File "netCDF4/_netCDF4.pyx", line 1855, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -101] NetCDF: HDF error: b'/data/someFile.nc'

>>> xr.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 3.6.6 (default, Sep 12 2018, 18:26:19) 
[GCC 8.0.1 20180414 (experimental) [trunk revision 259383]]
python-bits: 64
OS: Linux
OS-release: 4.15.0-58-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: C.UTF-8
LANG: C.UTF-8
LOCALE: en_US.UTF-8
libhdf5: 1.10.2
libnetcdf: 4.6.3

xarray: 0.12.3
pandas: 0.25.1
numpy: 1.13.3
scipy: 1.3.1
netCDF4: 1.5.1.2
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: 1.0.3.4
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.26
cfgrib: None
iris: None
bottleneck: None
dask: 2.3.0
distributed: 2.3.2
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: None
numbagg: None
setuptools: 39.0.1
pip: 9.0.1
conda: None
pytest: None
IPython: None
sphinx: None

And I have the HDF5_USE_FILE_LOCKING environment variable set to FALSE.

Refer to pydata/xarray#2535, rasterio/rasterio-wheels#12

* documentation: Update index. * documentation: Add TODOs. * documentation: Add 'Examples' category. * resource.py add for accessing solarpanels and windturbines utlis.py add class arrowdict * Update configuration.rst * documentation: Add user guide and api ref. * Update README.md * documentation: Auto API reference documentation. * documentation: Structural updates. * doc: api reference: bug fixes, add structure * Delete os.path Accidentally uploaded. * example: Create cutout notebook. * documentation: Add release notes for v0.2. * doc enable napoleon extension * convert.py docstring update * wind.py update resource.turbines when downloading_turbineconf * fix strings, exclude '.yaml' * documentation: Update (WIP) * Licensing: Add copyright and license to all files (REUSE). * wind.py import resource.turbines in function due to import errror otherwise * Adjust documentation config for copyright and project name. * instructions.rst add images and 'how it works' description basics.rst distribute among other chapters * Update atlite-doc environment file with recommended environment dependencies. * documentation: Update SPHINX conf.py strings and automatic version number. * doc/cutout.rst update text and add commands convert.py small docstring corrections * delete basics.rst * doc/user-guide continue work * update environment_docs.yaml further writing in doc/user-guide * temporarly comment nbextension due to extensive memory consumption * reduce environment_doc.yaml dependencies radically (try out if it still works) * fix missing space in environment_docs.yaml * convert: Substitute deprecated cutout.meta with cutout.data. (#29) * Fix typo in README.md link to contributors. * doc/contributing.rst: Fix typo in heading. * __init__.py: Include new package docstring. * setup.py: Include new description and authors. * utils: Provide detailed re-creation information and catch error (fixes #33) * Move download_turbineconf to resource module To avoid circular dependencies. * config: Add _update_hooks list for registering for config updates * resource: Update resource dictionaries on config change * __init__.py load turbines and panel from resource.py wind.py remove unnecessary imports resource.py fix missing imports, use _update_resource_dict in download_windturbineconfig ensure replacing '-' in wind_config strings * Add minimal SPHINX Makefile. * environment_docs.yaml: try again with nbshinx extension, include other packages as required * Update names to windturbines and solarpanels for consistency * environment_docs.yaml comment nbsphinx installation again * configuration.rst: Update doc. Fixes #44 changse in documentation. * config: Allow more formats for configuration updates. (#44) * doc/makefile bugfix environment_doc test pypi installations due to memory efficiency doc/* small continuations * environment_docs another try with pyyaml in dependencies * environment_docs move back to conda, but exclude unnecessary packages * environment_docs build failed again (move nbsphinx into pip installation) * environment_docs commenting out nbsphinx due to memory error (again!) * Try to install "nbsphinx" in RTD with pip instead of conda. * Try different configuration for RTD. * Add pyyaml as extra required and RTD config cleanup * setup.py: Include toolz for doc installation. * Disable system_packages in RTD config. * setup.py: Include python-dateutil for doc install. * Substitute deprecated option in autodoc. * setup.py move missing packages to install_requires * Try RTD installation with conda/pip mixture again. * Change RTD config for installing package. * Remove local dir from RTD installation environment. * Change RTD environment to use fixed versions for lower RAM requirements. * Fix package version numbers in RTD environment. * Switch back to RST and remove duplications by softlinking. * MD does not allow for RST include directives making the switch necessary * By using the includes, we remove duplications in author listings in the documentation and README.rst and have the chance to have a separate AUTHORS.rst file * The release notes are also no longer duplicated and have a separate file in the root directory. An include-link brings them into the documentation. * Replace README.md with README.rst. * Prepare RTD for displaying and linking Jupyter notebooks. * Provide examples in documentation from subfolder via link to repo root. * Provide examples in documentation subfolder via link to repo root. (2) Examples which where missing from the commit before #72c0eda . * Disable nbsphinx cell execution in SPHINX for empty notebooks. Workaround for execution errors (do not execute noetbooks. Any output wanted has to be generated locally and uploaded). * setup.py: Fix wrong README import (.md instead of .rst). * Update RTD environment.yaml for nbsphinx. * Update formatting in example create_cutout.ipynb. * Rename Logfiles_and_messages.ipynb to logfiles_and_messages.ipynb Windows - Linux problem: Now solved. * Include a note on contributing examples in notebook format. * Try Jinja2 header for linking the doc with the notebook files. * Add how-to warnings to example and spell checking. * Add end-of-string to fix RTD build. * Try another preamble for the RTD Jinja2 filter. * Update RTD environment for improved performance. * Minor changes in the documentation on contributing. * Update RTD conf.py trying to fix nbsphinx_prolog / Jinja2 recipe. * Update RTDs conf.py to get correct path to examples in the repository. * Update RTDs conf.py to get correct path to examples in the repository (try 2). * Update RTDs conf.py to get correct path to examples in the repository (try 3). * Update RTDs conf.py to get correct path to examples in the repository (last try). * utils: Re-raise MergeError on unsuccessful automatic migration (#33) * Update module docstrings and update Cutout class docstring. * examples: add notebook fot plotting * examples and doc: make plotting_nb visible in docs * doc/examples create nb_link for plotting_with_atlite.ipynb * Update plotting_with_atlite.ipynb * Autoformatting (autoformat pep8) * Spell checks * Axis labels * Add note on descartes * Add our basic logging recommendation * era5: Move sanitation into dataset preparal * convert: Refactor from direct access to high-level functions * setup: Add requests dependency * sarah: Port atlite.datasets.sarah to new framework sarah: Update for data retrieval update sarah: Fix imports sarah: Fix _get_filenames sarah: Fix get_coords sarah: Add static_features sarah: Fix imports sarah: Fix get_data_era5 sarah: Add debug messages sarah: Fix get_data sarah: Fix _get_filenames sarah: Fixup implementation * era5: Fix handling of periods * era5: Create and clean up separate tmpdir for intermediate downloads The finalizer deletes files too early in a distributed setting, where the file handles are pickled and restored. * era5: Use gebco_path from config instead * data: Only download a single day of data if that is all we need * gis: Fix default crs setting rio.warp.reproject fails when using 'latlong', so we use the epsg number instead * data: Clean up tmpdir on exception * data: Add progress bar for writing to disk * sarah: Use a half-day chunk Leading to a memory use of about 2.2GB for a cutout the size of Europe. * datasets.common: Don't show the cdsapi download progress * sarah: Fix interpolation for newer dask versions * Update make.bat for windows to correspond to current minimal SPHINX file. * Update dependencies toolz is required by dask. * Fixes #46 . * Unset unused directories for gebco, ncep, sarah, cordex. * Provide error message for unsuccessfull gdalwarp subprocess calls. * Extends and partially reverses euronion/atlite@7a225ca . * Reverse default config. * Rather than checking for a set gebco_path, we check if the path specified exists. * cutout: Add missing construct_filepath for properly treating config paths. * Change configuration mechanism for using GEBCO height. * cutout: Work around binary wheel incompatibility of netCDF4 and rasterio Refer to pydata/xarray#2535, rasterio/rasterio-wheels#12 * Invert default ordering in latitude dimension The default array ordering traverses now from small latitudes to large latitudes, since this is how ERA-5 organises its data by default and it lead to non-intuitive confusion several times. The most important changes are: - One now specifies cutout bound slices always from small to large: atlite.Cutout("bla", x=slice(-10, 10), y=slice(40, 45), time="2012-01") - Plotting using imshow has to be inverted explicitly from now on capacity_factor = cutout.wind(turbine="Vestas...", capacity_factor=True) plt.imshow(capacity_factor.transpose('y', 'x').values[::-1], extent=cutout.extent) We encourage users to use xarrays plotting facilities, instead, capacity_factor.plot() * Ensure temporary files are released before deleting (fixes #47) * Open cutout files with cache=False for saving memory Caching is still possible with `cutout.data = cutout.data.load()`. * convert: Make convert_wind fully dask-compatible * datasets.era5: Support requesting interpolated datasets * config: Store cutouts by default in the current directory * Remove cordex config settings CORDEX is available from CDS. * pv: Add simple latitude orientation With azimuth=180, it is angled to the south on the northern hemisphere and to the north on the southern hemisphere (due to the negative values for slope) * era5: Enable creating global cutouts One needs to use: x=slice(None), y=slice(None) * Revert "era5: Enable creating global cutouts" This reverts commit 3e7b434. * utils: Prevent deprecation warning from pkg_resources. Absolute paths will be deprecated in pkg_resources.resource_filename. Including a leading (back-)slash is recognised as such and raises a warning, this change fixes this behaviour. * example plotting: Include figures, update to match v0.2 and misc. * Now including the figures for the example to be displayed in the doc. * Update the plotting commands to xarray's plot() to match the new v0.2 reverse indexing order * Misc: Show warnings and use the logger. * Add new example: Historic comparison for Germany (PV, wind in 2012). Example is based on the old "openmod-atlite-de.ipynb" example. * data: Fix no-missing-no-overwrite-branch (fixes #47) * utils: Correct error message for cutout migration. * On trying to automatically migrate an existing cutout, display the correct (new) order (min,max) for coordinates in the cutout recreation command. * Use the logging.error(...) facilities to correctly categorise the MergeError incl. the stack trace. * logging: Use .warning() API instead of deprecated .warn() calls. * utils: Only sort indices which can be in the wrong order * Remove config in favour of dataset parameters * sarah: Couple of small fixes and improvements * datasets.sarah: Make file searching more robust * setup.py: Update name and email * resource: Fix path of oedb wind_turbine_library in docstring * doc: Update cutout creation * Fix unmatched braket. * Update cutout creation example for ERA5 to new cutout signature. * convert: Fix the use of atlite.windturbines dict * gis: Do not pickle the spatial index Turns out the spatial index gets distorted by pickling, so we turn to_file and from_file to no-ops, until we find a work-around. * Update examples for new cutout signature. * Add documentation for cutout creation from SARAH-2 dataset. * Remove references to old configuration system and update index in doc. * Fix runoff conversion broken by #55ddd7f97c . * Add documentation for GEBCO in cutout creation. * Bump and versions in RTD environment to workaround compiling error. * examples: update plotting notebook * fix #62 * Fix cutout creation with odd bounds (era5) (#65) * fix #64 * ensure min/max assingments in coordinates * ensure correct ordering of slices * small fix * small fixup of #68 * use strtree instead of rtree * cutout.py: style, add line breaks to very long lines * Ensure backwards compatability in cutout creation. * fix literal and undefined name * Remove cutout_dir from cutout constructor main signature. * add test scripts * add test for loading and preparation * remove GridCell class * remove sindex references * change 'name' arg to 'path' * adjust test: delete temporary cutout files * add CachedAttribute decorator for property caching * fix sarah selector for file parsing * style only: break too long lines in [convert, data, sarah, gis, utils] * Add PR and issue templates. Co-authored-by: FabianHofmann <hofmann@fias.uni-frankfurt.de> * add conversion tests, fix small typo from previous commit * rename test script * Substitute deprecated .drop() by .drop_vars(). * test scripts: modify mktemp * add dx, dy, dt as properties to Cutout class * Dask compatibility (#77) * make interpolation optional * replace hourly_mean by resampling function * sarah module: apply autopep8 * fix typo * autopep8; move get_coords to high level * revise data.py, sarah.py * sarah use again as_slice * spread common.py among dataset modules * fix gebco as module * adjust test * change cutout representation * fix era5 static * fix features and allows multiple modules in cutout * fix feature preparation * sanitize prepare function, unify output of get_data * fix logging for requests * enable optional parallel loading * data.py: - remove literal_eval -> cutout_parameters are directly given to cutout.data.attr, slices are directly processes in coordinates creation - re-enable tmp_dir in cutout prepare - to_netcdf has different mode 'a'/'w' depending on whether file exists gebco.py: fix output of get_data sarah.py: set interpolate always to true * sarah.py re-enable optional interpolation * - add docstrings for all functions in data.py/sarah.py/era5.py - modify input variables of get_coords function. * add docstrings for all of cutout.py and gebco.py * cutout.py: - update docstrings for cutout class - remove support for cutou_dir, add warning and pointer to migration function - remove support for data argument as this requires further TODOs and can worked around very easily - remove default for module, this argument must be given - abolish is_view cases - add assertions for argument requirements 'x', 'y', 'time', 'module' when building new cutout * cutout.py: - Improve argument exception - Make cutout representation better - Ensure projection in cutout building * data.py make window class working with new feature handling tests: run tests for era5 and mixed ['sarah', 'era5'] cutouts cutout.py: reenable data as an optional argument * autopep8 in pv/*.py fix typo in migration function * first take for benchmarking: load data with chunks and apply conversion function without windows * cutout.py - clean imports - fully intergrate chunks as an cutout parameter and property - set chunking as standard loading of cutout data.py - use cutout.chunks property era5.py - use cutout.chunks property sarah.py - use cutout.chunks property * utils.py: fix import and swap_dimensions for old style cutout * pv module: - make module dask friendly, this commit removes all .values call which cause dask to be unable to chunk. - direct import of numpy functions which are often used convert.py: - restructure convert_and_aggregate function, this makes the function faster if only a layout is given. - change show_progress to bool only - change layout to be xr.DataArray only * convert: Fix heat demand hourshift for xarray 0.15.1 (#63) From xarray version 0.15.1, .values cannot be assigned. You should use the .assign_coords() method instead. See release notes for xarray 0.15.1 "breaking changes": http://xarray.pydata.org/en/stable/whats-new.html#v0-15-1-23-mar-2020 * convert.py - rename heatdemand array * cutout.py: - raise Error if old style * cutout.py: restructure handling of projection. The projection of different modules is tested when initializing the cutout. The property 'projection' will then only look at the projection of first module. * aggregate.py: remove aggregate_sum function convert.py: - try out dense operation for indicator matrix multiplication - replace aggregate_matrix function with tensor dot (still figuring out performance) gis.py: - argument shapes can now also be geopandas frame * convert.py: reinstatiate aggregate_matrix function, but fix name of index * convert.py: - make index valid for geopandas series and frame * Allow saving of pseudo-boolean cutout attributes to netCDF. * Fix interpolation option and defaults for SARAH cutouts. * convert.py fix division for capacity factor calculation cutout.py fix represention of features * Rename cutout parameters for interpolation of SARAH data. * review examples: - clean and run create_cutout and plotting notebook - delete tiny create_cutout.py script - add comparison script for verions 1 and 2 * Apply suggestions from code review Co-authored-by: Jonas Hörsch <jonas.hoersch@posteo.de> * add suggestions: - aggregate.py ensure index name - convert.py updat docstring - convert.py remove index.name defaults as done by aggregate_matrix function - convert.py reenable progrss bar for hydro - cutout.py fix import structure - data.py remove unneeded code - sarah.py add assertion for time resolution * data.py store booleans as int for to_netcdf * prepare cutout: move prepared_feature assignment directly before storing * update authors list fix suggestions in migration function * revise all imports * * change chunk size attribute from 'chunk_{dim}' to 'chunksize_{}' * fix cutout.prepared_features for one feature only Co-authored-by: Tom Brown <tom.brown@kit.edu> Co-authored-by: euronion <42553970+euronion@users.noreply.github.com> Co-authored-by: Jonas Hörsch <jonas.hoersch@posteo.de> * aggregate.py do not immute index with no name * examples: rerun and comment * example: historic reanalysis notebook, add head comment, fix headings * examples: rerun sarah notebook * doc: - remove User Guide section in favour of notebooks - add notebook for gebco heightmap cutout.py & data.py: small fix for cutout with spatial dimension only * tests: Use pytest fixtures and marks (#83) * data: Re-enable parallelized queueing (#87) * era5: Use lock to support download progress bars * era5: Fix retrieval_times * era5: Re-enable warnings * era5: Fix int64 is not serializable to JSON exception * data: Re-enable parallelized queueing Since queueing times are significantly longer than downloading, combine dask.delayed with a lock to queue in parallel but download in series. * Document breaking changes, user warning due to change in cutout index order. * fix updating features & sanitize static variables in era5 (#91) * era5.py - set output of get_data to xr.Dataset - sanitize static features, whereas the variable should not have a time dimension, the returned dataset of get_data should have data.py - parallize feature loading -> delayed call of get_data * era5.py - set delete file message to logger.debug * era5.py fix typo * era5.py - bug fix time keys * data.py: - store data file without appending to netcdf as it is unsecure, instead create a new one era5.py - revert time assigning to static feature as not needed test - add test for updating a variable * data.py - fix name of temporary file era5.py - stream info log of client through logger.debug * Update atlite/data.py Co-authored-by: euronion <42553970+euronion@users.noreply.github.com> * reenable progress bar * era5: Logging differentiates between request and download (#94) * cutou.py: - selection must have a separate name data.py: - make tempfile creation safe - logging for tmp dir test - adjust test for selection era5.py - fix division by zero * test - use fixture for updatable cutout * era5.py remove sleep time again cutout.py set chunks to 100 as it is faster Co-authored-by: euronion <42553970+euronion@users.noreply.github.com> Co-authored-by: Jonas Hörsch <jonas.hoersch@posteo.de> * [v0.2] review suggestions (#95) * cutout: Make new `path` argument optional * cutout: Make dx/dy calculation more robust * data: Do not leak the temporary directory * bugfixes of PR #95 * update README.rst * Fixup README * Fixup README II * add workflow chart * README: make workflow image look better * test remove dummy test * AUTHORS.rst fix table * README.rst fix inclusion of authors list * README.rst fix authors list II * Fix link to authors in README.rst. * README.rst reintegrate Installation section * data.py fix return_capacity with layout not None (#97) * era5.py disable warning for ds.drop() test: temporarly disable windows machines * Fix divisions by zero (#98) * solve #55 * fixup * era5.py remove errstate with block as obsolete * era5-module: tiny fix string formatting * data.py: fixup for no features given * cutout.py: add 'module' to __repr__ * data.py tiny code style fix * aggregate.py use dask_gufunc_kwargs * Ci mamba (#110) * ci: use mamba instead of conda * follow up, add comment [skip travis] * follow up * follow up, fix conda activate * ci: playaround, remove conda specifications * fix numpy <-> daks incompatibility in clip function (#111) * irradiation.py: replace .clip by .where due to new numpy/dask incompatibility * follow up, only apply .where where necessary * travis: do not fix mamba version * Windows machine fix (#109) * enable ci on windows * data.py: use TemporaryDirectory instead of mkdtemp * data.py revert last commit, try now with wrapper * fix travis env for windows machines * follow up: write pip and pytest dependencies in env file * env: add libspatialindex to requirements * travis: reintroduce strict channel order due to installation problems on windows * Cutout grid (#112) * introduce Cutout.grid make Cutout.grid_cells and Cutout.grid_coordinates deprecated * follow up * adjust plotting example * update release notes * test_creation.py adjust test * test: tiny fix up * add crs to Cutout.grid * follow up: add comment [skip travis] * release notes: fix typo [skip travis] * Update to pyproj 2 (#92) * Rename projection to crs Follows pyproj in nomenclature. See https://pyproj4.github.io/pyproj/stable/gotchas.html#upgrading-to-pyproj-2-from-pyproj-1 . * environment: Remove channel pinning Channel pinning has been superseed by strict channel_priority as proposed at https://conda-forge.org/docs/user/tipsandtricks.html. * gis: Add grid_cell_areas function to compute areas of grid cells * cutout: Fix forgotten conversion * gis: Improve grid_cell_areas * remove area calculation due to geopandas implementation * update release notes * gis.py: revise imports Co-authored-by: Fabian <fab.hof@gmx.de> * gebco: Extract and resample data from GEBCO using rasterio [DNMY] (#93) * gebco: Extract and resample data from GEBCO using rasterio * tiny fixup of inversed y-axis and data array accessing * fix numeric tags Co-authored-by: Fabian <fab.hof@gmx.de> * use 'cea' projection instead of 'aea' (alternative to #102) * Modified cutout creation (#114) * * add warning for ignoring cutoutparams if cutout already exists * reintroduce Cutout.prepared * follow up * gis.py: adjust deprecation warning * data.py set "feature" attribute for every variable (#115) cutout.py make prepared features more secure * Cutout merge (#116) * cutout.py add merge function pytest add merge test * cutout.py: when data is passed and path is non-existent, write out file path in cutout.merge and cutout.sel has to be non-existent * adjust docstrings * revert second last commit, add cutou.to_file function * revert unneeded assert * follow up: update docstrings [skip travis] * irradiation.py: add comment to clipping, use other approach solar_position.py: saver/cleaner approach for chunking * utils.py: track module and feature when migrating (#117) * Convert fix (#118) * convert.py catch case of no layout given * convert.py: restructure convert_and_aggregate for correctly handling all input combinations * test: pv add rounding to assert * convert.py secure division by zero * update examples * update years and authors Co-authored-by: euronion <42553970+euronion@users.noreply.github.com> Co-authored-by: Fabian <fab.hof@gmx.de> Co-authored-by: FabianHofmann <hofmann@fias.uni-frankfurt.de> Co-authored-by: Tom Brown <tom.brown@kit.edu>

kmuehlbauer · 2023-03-14T20:53:28Z

Just a heads up, I can't reproduce this with latest packages from conda-forge for neither for xr.open_rasterio nor xr.open_dataset with engine=rasterio. Maybe the issue got resolved upstream.

jhamman added the topic-backends label Nov 5, 2018

shoyer mentioned this issue Dec 10, 2018

Lock related problem in on travis-ci but not on local machine #2560

Closed

ghost mentioned this issue Dec 10, 2018

Importing rasterio causes a HDF error with netCDF4 rasterio/rasterio#1574

Closed

sgillies mentioned this issue Dec 10, 2018

Importing rasterio causes a HDF error with netCDF4 rasterio/rasterio-wheels#12

Closed

scottyhq mentioned this issue Feb 20, 2019

enable reading of file-like HDF5 objects #2781

Closed

ritviksahajpal pushed a commit to ritviksahajpal/pygeoutil that referenced this issue Jul 21, 2019

fix for pydata/xarray#2535

fc76e06

weiji14 mentioned this issue Aug 13, 2019

open_rasterio does not read coordinates from netCDF file properly with netCDF4>=1.4.2 #3185

Closed

coroa added a commit to PyPSA/atlite that referenced this issue Sep 19, 2019

cutout: Work around binary wheel incompatibility of netCDF4 and rasterio

782336e

Refer to pydata/xarray#2535, rasterio/rasterio-wheels#12

willgraf mentioned this issue Apr 23, 2020

HDF5/Tensorflow compatibility issues with NetCDF4 file saving vanvalenlab/deepcell-tf#242

Closed

monocongo mentioned this issue Aug 17, 2020

Issue with netcdf4 when running through Git Bash on Windows monocongo/climate_indices#397

Closed

scottyhq mentioned this issue Mar 24, 2023

Delete built-in rasterio backend #7671

Merged

headtr1ck added the plan to close May be closeable, needs more eyeballs label Mar 26, 2023

dcherian closed this as completed Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDF error when trying to write Dataset read with rasterio to NetCDF #2535

HDF error when trying to write Dataset read with rasterio to NetCDF #2535

loicdtx commented Nov 1, 2018

INSTALLED VERSIONS

jhamman commented Nov 5, 2018

loicdtx commented Nov 5, 2018

jhamman commented Nov 5, 2018 •

edited

loicdtx commented Nov 5, 2018

shoyer commented Nov 6, 2018

loicdtx commented Nov 6, 2018

jhamman commented Nov 6, 2018

fmaussion commented Nov 6, 2018

loicdtx commented Nov 6, 2018

ghost commented Dec 3, 2018

ghost commented Dec 10, 2018

shoyer commented Dec 10, 2018

ChristianF88 commented Jun 7, 2019

dcherian commented Jun 7, 2019

ChristianF88 commented Jun 7, 2019

alpha-beta-soup commented Sep 4, 2019

kmuehlbauer commented Mar 14, 2023

HDF error when trying to write Dataset read with rasterio to NetCDF #2535

HDF error when trying to write Dataset read with rasterio to NetCDF #2535

Comments

loicdtx commented Nov 1, 2018

Output of xr.show_versions()

INSTALLED VERSIONS

jhamman commented Nov 5, 2018

loicdtx commented Nov 5, 2018

jhamman commented Nov 5, 2018 • edited

loicdtx commented Nov 5, 2018

shoyer commented Nov 6, 2018

loicdtx commented Nov 6, 2018

jhamman commented Nov 6, 2018

fmaussion commented Nov 6, 2018

loicdtx commented Nov 6, 2018

ghost commented Dec 3, 2018

ghost commented Dec 10, 2018

shoyer commented Dec 10, 2018

ChristianF88 commented Jun 7, 2019

dcherian commented Jun 7, 2019

ChristianF88 commented Jun 7, 2019

alpha-beta-soup commented Sep 4, 2019

kmuehlbauer commented Mar 14, 2023

Output of `xr.show_versions()`

jhamman commented Nov 5, 2018 •

edited