Skip to content

An Open Source Mathematica Package that provides a platform independent way to import HDF5 (.h5) file's datasets with compound datatypes while hiding much of the HDF5 implementation from the user.

License

Notifications You must be signed in to change notification settings

m2orris/h5dumpImport

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

#h5dumpImport.m ###Version 1.0 #####Copyright (c) 2012 Michael Morris
This software is released under MIT Open Source License


##Objective

The h5dumpImport package provides a platform independent way to import HDF5 (.h5) file's datasets with compound datatypes while hiding much of the HDF5 implementation from the user.

Currently, the h5dumpImport package does not directly import the HDF5 (.h5) file format. The h5dumpImport package imports an ASCII dump of a dataset generated by the h5dump command line tool.

Source code and pre-built binary distributions of the HDF5 Software which includes the h5dump command line tool can be found at the The HDF Group's website.

Alternative methods for importing HDF5 (.h5) files into Mathematica include:

  • Mathematica 8.0's Import function provides limited support for importing HDF5 (.h5) files. Import currently does not support datasets with compound datatypes. For example:

      In[1]:=	 Import["testData.h5", {"Data", 1}]
      Out[1]=  {"Unsupported Datatype Class", "Compound"}
    
      In[2]:=  Import["testData.h5", {"Datasets", 1}]
               Import::h5type: The datatype of the dataset "/AllDatatypes" is not currently supported. >>
      Out[2]=  $Failed
    
  • Scot Martin's HDF5 API wrapper package is a solution for installations of Mathematica running on Windows.

##Installation

  1. Download and install the HDF5 Software which includes the h5dump command line tool from The HDF Group's website.

  2. Download the h5dumpImport package to a directory where Mathematica can find it.

  3. In a Mathematica Notebook, execute the expression:

     Needs["h5dumpImport`"]
    

##Usage

  1. Use Mathematica's Import function to query the available datasets in the HDF5 (.h5) file:

     In[3]:=  Import["testData.h5", {"Datasets"}]
     Out[3]=  {"/AllDatatypes", "/Fruit"}
    
  2. Create an ASCII dump file for the HDF5 (.h5) file's dataset using the h5dump convenience function in the h5dumpImport package:

     In[4]:=  dumpFile = h5dump["/usr/bin/h5dump", "testData.h5", "/AllDatatypes"];
    

An alternative is to use the 'h5dump' command line tool:

    $ h5dump -d /AllDatatypes testData.h5 > /tmp/h5dump_testData.h5_AllDatatypes.txt
    $
  1. Create an h5dumpImport object, read all the data, and close :

     In[5]:=  dumpImport = h5dumpImportNew[h5dumpImport[], dumpFile];
              dumpImport.h5dumpImportData[All]
              dumpImport.h5dumpImportClose[];
     Out[5]=  {{1, 11, 111, 1111, 11111, 111111, 1111111, 1.1, 11.11, "one"},
               {2, 22, 222, 2222, 22222, 222222, 2222222, 2.2, 22.22, "two"},
               {3, 33, 333, 3333, 33333, 333333, 3333333, 3.3, 33.33, "three"}}
    

##Documentation

  1. h5dumpImport package usage information. After loading h5dumpImport package in a Mathematica Notebook, execute the following expression:

     ?h5dumpImport`*
    

Click on the function names to see the usage information.

  1. "h5dumpImport Examples.nb" contains a set of examples.

  2. h5dumpImport unit tests in h5dumpImport.mt.
    Note: Wolfram Workbench 2.0 is required to run the unit tests.

##Known Limitations / Areas for Improvement / Seeking Collaborators

  1. The dependency on h5dump command line tool to extract an ASCII dump of a dataset from HDF5 (.h5) files is a time and disk space consuming step, that could be eliminated by directly reading the binary data from the HDF5 (.h5) file.

*Seeking a collaborator with experience parsing binary data within Mathematica and who understands the HDF5 (.h5) file format.*

  1. The parsing of the ASCII dump file data needs to be optimized. For a given amount of data, the time to read (and parse) the data is an order of magnitude greater than it is to fast forward (read without parsing) through it.

*Seeking a collaborator with seasoned Mathematica experience in reading and parsing strings.*

  1. Handling the enumerated datatype (H5T_ENUM). I was not able to successfully generate an HDF5 (.h5) file with enumerated data.

*Seeking a collaborator who can generate a valid HDF5 (.h5) file with a compound dataset that has an enumerated datatype.*

  1. Support for datasets without compound datatypes.

About

An Open Source Mathematica Package that provides a platform independent way to import HDF5 (.h5) file's datasets with compound datatypes while hiding much of the HDF5 implementation from the user.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published