Skip to content

Latest commit

 

History

History
143 lines (105 loc) · 5.67 KB

vds.rst

File metadata and controls

143 lines (105 loc) · 5.67 KB
.. currentmodule:: h5py

Virtual Datasets (VDS)

Starting with version 2.9, h5py includes high-level support for HDF5 'virtual datasets'. The VDS feature is available in version 1.10 of the HDF5 library; h5py must be built with a new enough version of HDF5 to create or read virtual datasets.

What are virtual datasets?

Virtual datasets allow a number of real datasets to be mapped together into a single, sliceable dataset via an interface layer. The mapping can be made ahead of time, before the parent files are written, and is transparent to the parent dataset characteristics (SWMR, chunking, compression etc...). The datasets can be meshed in arbitrary combinations, and even the data type converted.

Once a virtual dataset has been created, it can be read just like any other HDF5 dataset.

Warning

Virtual dataset files cannot be opened with versions of the hdf5 library older than 1.10.

The HDF Group has documented the VDS features in detail on the website: Virtual Datasets (VDS) Documentation.

Creating virtual datasets in h5py

To make a virtual dataset using h5py, you need to:

  1. Create a :class:`VirtualLayout` object representing the dimensions and data type of the virtual dataset.
  2. Create a number of :class:`VirtualSource` objects, representing the datasets the array will be built from. These objects can be created either from an h5py :class:`Dataset`, or from a filename, dataset name and shape. This can be done even before the source file exists.
  3. Map slices from the sources into the layout.
  4. Convert the :class:`VirtualLayout` object into a virtual dataset in an HDF5 file.

The following snippet creates a virtual dataset to stack together four 1D datasets from separate files into a 2D dataset:

layout = h5py.VirtualLayout(shape=(4, 100), dtype='i4')

for n in range(1, 5):
    filename = "{}.h5".format(n)
    vsource = h5py.VirtualSource(filename, 'data', shape=(100,))
    layout[n - 1] = vsource

# Add virtual dataset to output file
with h5py.File("VDS.h5", 'w', libver='latest') as f:
    f.create_virtual_dataset('data', layout, fillvalue=-5)

This is an extract from the vds_simple.py example in the examples folder.

Note

Slices up to h5py.h5s.UNLIMITED can be used to create an unlimited selection along a single axis. Resizing the source data along this axis will cause the virtual dataset to grow. E.g.:

layout[n - 1, :UNLIMITED] = vsource[:UNLIMITED]

A normal slice with no defined end point ([:]) is fixed based on the shape when you define it.

.. versionadded:: 3.0

Examples

In addition to the above example snippet, a few more complete examples can be found in the examples folder:

Reference

Object for building a virtual dataset.

Instantiate this class to define a virtual dataset, assign :class:`VirtualSource` objects to slices of it, and then pass it to :meth:`Group.create_virtual_dataset` to add the virtual dataset to a file.

This class does not allow access to the data; the virtual dataset must be created in a file before it can be used.

param tuple shape:The full shape of the virtual dataset.
param dtype:Numpy dtype or string.
param tuple maxshape:The virtual dataset is resizable up to this shape. Use None for axes you want to be unlimited.