Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New API #6

Closed
shoyer opened this issue Apr 20, 2015 · 5 comments
Closed

New API #6

shoyer opened this issue Apr 20, 2015 · 5 comments

Comments

@shoyer
Copy link
Collaborator

shoyer commented Apr 20, 2015

I've started to write a new API for h5netcdf as an alternative to h5netcdf.legacyapi which will continue to mirror netCDF4-python. The new API will follow PEP-8 standards and adhere much more closely to the design of h5py. The idea is to (1) explore alternative options for a low level netCDF API in Python and (2) make it more natural for h5py users. In general, h5py seems to have a better design for exploring hierarchical datasets.

Some ideas (some of these are implemented on master, some are not):

import h5netcdf

# to reduce confusion, we don't define h5netcdf.Dataset
f = h5netcdf.File('data.nc', mode='w')

# you can set dimensions either with a dictionary
f.dimensions = {'a': 2, 'b': 3}
# or with dictionary like assignment (this mirrors .attrs)
f.dimensions['c'] = 4

# you don't need to explicitly create a dimension first if you supply data with create_variable
f.create_variable('foo', ('x',), data=np.arange(5))

Some questions:

  1. Should we use dims or dimensions as an identifier? The former is shorter, and there is some precedence with attrs in h5py (and dims in xray), but the later is more descriptive.
  2. Should we eliminate create_dimensions as a method entirely? The alternative is to only support setting dimensions via the dictionary like f.dimensions (see examples above). This would preclude suggestion - make it possible to use dimension objects instead of names #5.

CC @mangecoeur

@mangecoeur
Copy link

  1. IMHO, use dimensions. It's much clearer ;)
  2. keep create_dimensions. Best not to second-guess someone's use case, maybe someone wants to create a dimension without immediately adding a variable. Or someone wants to create a dimension that's re-used by several variables.

And an idea: Make it possible to dump the structure of file as a descriptive dict. Makes exploring an existing file easier. Something like:

{
    'distance': {
             'dimension': 'x',
             'units': 'm',
             'data': <array float[100]>
            },
   'time' {
             'dimension': 't',
             'units': 'm2',
             'data':<array datetime[100]>
   },
   'readings' {
             'dimension': ('x', 't'),
             'units': 'W',
             'data': <array float[100, 100]>
   }
}

Building on that, it might be nice to make it possible to create a file based on a spec described in a dict. This allows you to use a declarative style rather than an imperative one - rather than a series of commands to create your files, you describe what data you want and the build a file from the spec.

x_data = ...
times = ...
readings = ...

spec = {
    'distance': {
             'dimension': 'x',
             'units': 'm',
             'data': x_data
            },
   'time' {
             'dimension': 't',
             'units': 'm2',
             'data': times
   },
   'readings' {
             'dimension': ('x', 't'),
             'units': 'W',
             'data': readings
   }
}

hfile = h5netcdf.create_file(spec)

You could even enable a full round-trip, something like:

spec = h5netcdf.describe(file)

# Optionally manipulate the spec here, e.g. change the data

new_file = h5netcdf.create_file(spec)

Though less sure about this, might need to think about lazy-loading data and things like that...

@shoyer
Copy link
Collaborator Author

shoyer commented Apr 20, 2015

@mangecoeur To clarify, I was considering dropping create_dimension in favor of using dimensions with the dictionary-like interface for assignment.

@mangecoeur
Copy link

@shoyer ah right, yeah that makes sense

@shoyer
Copy link
Collaborator Author

shoyer commented Apr 21, 2015

I like the idea of declarative dataset specifications, but I'm not sure it's a good fit for this package. It could exist easily as an independent project that uses h5netcdf (and/or netCDF4/scipy.io.netcdf) as a library. In fact, I would encourage you to go out and build that project yourself :).

@kmuehlbauer
Copy link
Collaborator

@shoyer Can this be closed? We might think to compile everything into some documentation (to not overcrowd the README).

@shoyer shoyer closed this as completed Feb 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants