Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDF5 interface #101

Open
certik opened this issue Jan 8, 2020 · 9 comments
Open

HDF5 interface #101

certik opened this issue Jan 8, 2020 · 9 comments
Labels
topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ...

Comments

@certik
Copy link
Member

certik commented Jan 8, 2020

fortran-utils has a very minimal HDF5 wrapper interface that is just a little bit higher level and easier to use: https://github.com/certik/fortran-utils/blob/b43bd24cd421509a5bc6d3b9c3eeae8ce856ed88/src/h5_utils.f90.

But I would honestly actually vote not to include this in stdlib, or maybe not initially. It feels to me that this would be better off in a separate package. What do you think?

@jvdp1
Copy link
Member

jvdp1 commented Jan 8, 2020

There is also the one of @scivision : https://github.com/scivision/h5fortran
Anyway, since it depends on an external library I would not incluce in stdlib (at least now).

@certik
Copy link
Member Author

certik commented Jan 8, 2020

I think we will all be in agreement to rather contribute this into @scivision's h5fortran. So I am going to close this issue as out of scope for stdlib. At least in the foreseeable future.

@certik certik closed this as completed Jan 8, 2020
@scivision
Copy link
Member

I have tried to make the h5fortran (HDF5) and nc4fortran (NetCDF4) user-facing APIs as identical as possible, so that a user program can easily swap between HDF5 and NetCDF file IO by a configure-time flag.

I used object-oriented interface h5fortran and nc4fortran because there are multiple internal variables to manipulate when doing non-trivial operations. The basic user-facing operations are like:

type(hdf_file) :: h

h%initialize('foo.h5', 'rw')
h%write('x', x)
h%read('y', y)
h%finalize()

write() read() and other methods are rank-agnostic (scalar..7D) and kind-agnostic {real32,real64,int32,int64,character} within the limits of HDF5 and NetCDF. Yes they can use opaque data for really arbitrary stuff, but that wasn't my need for HPC simulation and data assimilation.

@scivision
Copy link
Member

scivision commented Jan 8, 2020

With regard to binary file I/O in my opinion, raw binary I/O should be discouraged for most cases in any programming language.

There are a lot of other scientific formats like CDF, FITS and so on, but for out-of-core and cloud storage/processing and broadest data science library support, it is best in my opinion to focus efforts on HDF5. I only made a NetCDF4 interface because it's a subset of HDF5 and used by the large simulation packages I interface my models with.

@scivision
Copy link
Member

scivision commented Jan 8, 2020

I think HDF5 and the like could be handled with a stdlib shim that presents a user API like {loadtxt,savetxt}. So instead of the h5fortran/nc4fortran initialize, write, finalize you would just have in stdlib

savefile('foo.h5', x)
loadfile('foo.h5', y)
  • like other external libraries libpng etc., make it an option.
  • straightforward to implement in the near term.
  • can likewise add shims for FITS or other file formats contributors feel are important

@certik certik reopened this Jan 8, 2020
@certik
Copy link
Member Author

certik commented Jan 8, 2020

The interface like savefile('foo.h5', x) would make sense for stdlib. So I reopened this issue. Thanks for the idea @scivision.

@scivision
Copy link
Member

scivision commented Jan 15, 2020

I made a new release v2.5.0 of h5fortran, which now works as simply as:

use h5fortran

call h5write('foo.h5', '/x', x)

call h5read('bar.h5', '/y', y)

that's polymorphic scalar..7d, int32,int64,real32,real64

@nncarlson
Copy link
Member

nncarlson commented Jan 15, 2020

When it comes to providing modern Fortran interfaces to libraries like HDF5, NetCDF, MPI, etc., if there is already a package out there that reasonably meets the "design principles" of stdlib, like I presume h5fortran does, I see no reason for stdlib to throw a layer over the top of it and assimilate the package into stdlib. Stdlib doesn't need to be the Borg of Fortran libraries. Let people use the package directly. I think there ought to be a compelling reason and value for stdlib to provide the interface, like perhaps it does for lapack.

@certik
Copy link
Member Author

certik commented Jan 15, 2020

@nncarlson It's not black and white where to draw the line what goes into stdlib and what does not, but I think we are all in agreement here, as indicated above, that h5fortran should stay as a separate package. (h5fortran is on my todo list to get working with fpm.)

@jvdp1 jvdp1 added the topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ... label Jan 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic: utilities containers, strings, files, OS/environment integration, unit testing, assertions, logging, ...
Projects
None yet
Development

No branches or pull requests

4 participants