Writing a dataset converter
The HDF5 file format is developed by The HDF Group. It is an open, binary file format with good support in a number of programming languages including Python. In principal, you could use any language to create your dataset converter but we would recommend the Python language and we have not tried writing converters in another other language. Working with HDF5 files is really easy in Python using Pandas (which, in turn, uses the excellent PyTables package).
If you haven't done so already, then it might be worth reading the Pandas tutorial on working with HDF5 files in Pandas.
If you would like to contribute a new dataset converter (or just better
understand the existing converters) then the best way to get up to speed
is probably to read this document first. This document explains the
layout of data in a NILMTK HDF5 file. Then take a look at the REDD
converter for NILMTK (in
NILMTK dataset converters output an HDF5 file which contains both the time series data from each meter as well as all relevant metadata.
Time series data
Data from each physical meter is stored in its own table (i.e. there is a one-to-one relationship between tables in the HDF5 file and physical meters).
Table location (keys)
Tables in HDF5 are identified by a hierarchical key. Each key is a
string. Levels in the hierarchy are separated by a
/ character in the
key. NILMTK uses keys in the form
j are integers starting from 1.
i is the building instance and
j is the meter instance. For example, the table storing data from
meter instance 1 in building instance 1 would have the key
/building1/elec/meter1. (We use
elec in the hierarchy to allow us,
in the future, to add support for other sensor types like water and gas
Contents of each table
The index column is a datetime represented on disk as a nano-second
precision (!) UNIX timestamp stored as an unsigned 64-bit int. In
Python, we used a timezone-aware
numpy.datetime64. The dataframe must
be sorted in ascending order on the index (timestamp) column.
Every column apart from the index column holds a measurement taken by the meter. These measurements could be power demand, energy, cumulative energy, voltage or current. Each measurement is represented as 32-bit floating point number.
We always use SI units or SI derived units and NILMTK assumes that no unit prefix (e.g. 'kilo-' or 'mega-') has been applied. In other words, NILMTK assumes a unit multiplier of 1. For example, we always use watts (not kW) for active power. If the source dataset uses, say, kW then please multiply these values by 1000 in your converter.
Column labels are hierarchical with 2 levels (hierarchical labels are
very well supported by
The top level describes the
physical_quantity being measured. The
seconds level describes the
type which, at present, is used for energy
and power columns to describe whether the alternating current
reactive. We use a controlled
vocabulary for both
type. For the full
details of this controlled vocabulary, please see the documentation for
type under the
measurements property for the
MeterDevice object in NILM
zlib to compress our HDF5 files.
bzip2 results in slightly
smaller files (261 MB for
bzip2 versus 273 MB for
zlib for REDD) but
doesn't appear to be compatible with
In order for NILMTK to be able to load the dataset, we need to add metadata to the HDF5 file. NILMTK uses the NILM Metadata schema.
If the dataset is already described in YAML using the NILM Metadata
schema then just call
If the dataset is not already described using the NILM Metadata schema then it will be necessary to do so. If it is a small dataset then you could manually write the YAML files and then convert these to HDF5 (this is how our REDD converter works). If it is a large dataset then it would be better to programmatically convert the dataset's own metadata to NILM Metadata and store the metadata directly in the HDF5 file. For an introduction to NILM Metadata first read the README and then the tutorial and finally refer to the dataset_metadata doc page for the full description of the metadata schema.