Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suggest variable has no dimension scale associated with axis 0 warning, not error #43

Closed
denis-bz opened this issue May 17, 2018 · 8 comments

Comments

@denis-bz
Copy link

Stephan,
(noob here, just want numpy-able data)

I have a dataset with 2 keys [u'NS', u'AlgorithmRuntimeInfo'] which

  • h5py reads fine
  • h5netcdf reads fine, but print f["AlgorithmRuntimeInfo"]
    ValueError: variable u'/AlgorithmRuntimeInfo' has no dimension scale associated with axis 0
  • xr.open_dataset( infile, engine="h5netcdf" )
    dies with the same ValueError.

Make this a warning, not an error ?

(The dataset is 50M, not mine; shall I copy it to some netcdf zoo for you ?
Its original name is 2A.GPM.Ku.V720170308.20180502-S014128-E021127.V05A.RT-H5
if that tells you anything.)

Thanks, cheers
-- denis

@denis-bz denis-bz changed the title suggest variable has no dimension scale associated with axis 0 suggest variable has no dimension scale associated with axis 0 warning, not error May 17, 2018
@shoyer
Copy link
Collaborator

shoyer commented May 17, 2018

Unfortunately I think this means that this file is HDF5 file but not an netCDF4 file, so we can't really expect netCDF tools to work on it. netCDF4 is a specialized file format built on HDF5.

This could probably be made to work with a default dimension, but in general dimension scales are a pretty core part of the netCDF4 data model, so maybe it makes sense to use h5py here instead.

@denis-bz
Copy link
Author

Stephan,
Thanks. Is there any program in nc* that would tell me that ?
nccopy -k nc4 my.h5 my.nc creates a .nc that works in h5netcdf ?!
ncinfo looks ok to me --

<type 'netCDF4._netCDF4.Dataset'>
root group (NETCDF4 data model, file format HDF5):
    dimensions(sizes): 
    variables(dimensions): 
    groups: NS

Ideally I'd like a description of common differences between HDF5 and netCDF4;
is there a test suite for that
(preferably .cdl files -- diffing binary files is, um, Fortran-age.)
I realize that's a tall order, and not your job; pass it on ?

@shoyer
Copy link
Collaborator

shoyer commented May 24, 2018

If netCDF4 supports it, then ideally h5netcdf should, too. Can you share an example file so I can take a look?

I don't know a general way to check for netCDF4 API compatibility, unfortunately. Recent versions of netCDF4 do add some special attributes for checking if an HDF5 file was created via a netCDF library (e.g., the _NCProperties attribute), but other HDF5 files might also be valid netCDF fiels.

@denis-bz
Copy link
Author

I could put up a binary file, but don't want to waste your time
on non-reproducible stuff (it works after`nccopy).

All I really wanted to do is read an .h5 file that somebody sent me into xarray;
there's a tiny test case for that under https://gist.github.com/denis-bz
test.cdl -> test.nc -> read3.py, read with h5py h5netcdf xarray

looks like more for xarray people than for you -- if so could you pass it on ?
Thanks

@shoyer
Copy link
Collaborator

shoyer commented May 25, 2018 via email

@kmuehlbauer
Copy link
Collaborator

Sorry for reviving this, but I have some similar issue reading HDF5 files.

Currently I'm using the netCDF4 based engine within xarray to load hdf5 files (without dimension-scales). The dimensions are picked up by the netcdf-c library as phony_dim_N, where N is some number of first occurrence. As h5netcdf seems to be faster (with smaller memory footprint) for my use case I tried to get it running, but the error mentioned in the OP prevents me reading the data.

If tried to fix this by just replacing
https://github.com/shoyer/h5netcdf/blob/f71e4a692bc11689a3ad7850e8ce4ac1cb536659/h5netcdf/core.py#L98-L109

with:

        for axis, dim in enumerate(self._h5ds.dims):
            if len(dim) == 0:
                name = 'phony_dim_{}'.format(axis)
            else:
                name = _name_from_dimension(dim)
            dims.append(name)

This works, but will initialize new phony_dim_0 to phony_dim_N for every subgroup. Means phony_dim_0 in subgroup X and phony_dim_0 in subgroup Y aren't neccessarily the same. To completely mimic the netCDF4 behaviour, the dimensions have to be named in some distinct order while iterating over the whole file contents.

To find my way through the code to correctly implement this I would need some help and guidance.

@shoyer Where should I start looking?

@shoyer
Copy link
Collaborator

shoyer commented Dec 4, 2019

Yes, it this seems like a reasonable feature to add.

@kmuehlbauer
Copy link
Collaborator

@denis-bz It's been a while, but this should be resolved with #64. If you encounter any problems, please reopen or create a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants