# Write/read SIDpy Dataset via pyNSID

*Date: September 2020*

A fast introduction into how to write SIDpy datasets to NSID formatted HDF5 files

Start with standard imports:

In [1]:
# Ensure python 3 compatibility:
from __future__ import (absolute_import, division, print_function,
                        unicode_literals)

import sys
import warnings

import h5py
import matplotlib.pylab as plt
import numpy as np

# we will also need a sidpy package
try:
    import sidpy as sid
except ModuleNotFoundError:
    !pip3 install git+https://github.com/pycroscopy/sidpy.git
    import sidpy as sid

sys.path.append('../')
import pyNSID as nsid

warnings.filterwarnings("ignore", module="numpy.core.fromnumeric")
warnings.filterwarnings("ignore", module="pyNSID.io.nsi_reader")

## Creating sidpy.Dataset object(s)

Let's create a simple sidpy Dataset from a numpy array:

In [2]:
dataset = sid.Dataset.from_array(np.random.random([4, 5, 10]), name='new')
dataset

Unnamed: 0,Array,Chunk
Bytes,1.60 kB,1.60 kB
Shape,"(4, 5, 10)","(4, 5, 10)"
Count,1 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.60 kB 1.60 kB Shape (4, 5, 10) (4, 5, 10) Count 1 Tasks 1 Chunks Type float64 numpy.ndarray",10  5  4,

Unnamed: 0,Array,Chunk
Bytes,1.60 kB,1.60 kB
Shape,"(4, 5, 10)","(4, 5, 10)"
Count,1 Tasks,1 Chunks
Type,float64,numpy.ndarray


Let's also define the dataset attributes...

In [3]:
dataset.data_type = 'SPECTRAL_IMAGE'
dataset.units = 'nA'
dataset.quantity = 'Current'

... and set individual dimensions. In the case of spectroscopic datasets, the first two dimensions are typically spatial units (e.g. nm) and the third one can be energy (e.g. $meV$ or $nm^{-1}$).

In [4]:
dataset.set_dimension(0, sid.Dimension(np.arange(dataset.shape[0]), 'x',
                                        units='nm', quantity='Length',
                                        dimension_type='spatial'))
dataset.set_dimension(1, sid.Dimension(np.linspace(-2, 2, num=dataset.shape[1], endpoint=True), 'y', 
                                        units='nm', quantity='Length',
                                        dimension_type='spatial'))
dataset.set_dimension(2, sid.Dimension(np.sin(np.linspace(0, 2 * np.pi, num=dataset.shape[2])), 'bias',
                                        units='mV', quantity='Voltage',
                                        dimension_type='spectral'))

In [5]:
print(dataset.dim_0)
print(dataset.dim_1)
print(dataset.dim_2)

x:  Length (nm) of size (4,)
y:  Length (nm) of size (5,)
bias:  Voltage (mV) of size (10,)


## Writing sidpy.Dataset object(s) to HDF5 files

Load NSID-formatted h5 file:

In [6]:
hf = h5py.File("test.hf5", 'r+')
print(*hf["Measurement_000"].keys())

Channel_000 Channel_001 Channel_002


Let's create a new channel where we are going to save our sidpy dataset:

In [7]:
hf.create_group('Measurement_000/Channel_003')

<HDF5 group "/Measurement_000/Channel_003" (0 members)>

Now let's write our sidpy dataset into the newly created channel:

In [8]:
nsid.hdf_io.write_nsid_dataset(dataset, hf['Measurement_000/Channel_003'], main_data_name="new_spectrum");

<HDF5 group "/Measurement_000/Channel_003" (0 members)> new_spectrum


Close h5 file:

In [9]:
hf.close()

## Reading sidpy.Dataset object(s)

Load back the file:

In [10]:
hf = h5py.File("test.hf5", 'r+')
print(*hf["Measurement_000"].keys())

Channel_000 Channel_001 Channel_002 Channel_003


Find our dataset:

In [11]:
dataset_hdf5  =  nsid.io.hdf_utils.find_dataset(hf,'new_spectrum')[0]
dataset_hdf5

<HDF5 dataset "new_spectrum": shape (4, 5, 10), type "<f8">

Read the dataset stored in HDF5 format as a sidpy object (Dataset) using NSIDReader:

In [12]:
dr = nsid.NSIDReader(dataset_hdf5)
dataset_sid = dr.read()[0]
assert isinstance(dataset_sid, sid.sid.dataset.Dataset)

View the attributes, which are stored as Python dictionary:

In [13]:
for k, v in dataset_sid.attrs.items():
    print("{}: {}".format(k, v))

DIMENSION_LABELS: [b'x' b'y' b'bias']
DIMENSION_LIST: [array([<HDF5 object reference>], dtype=object)
 array([<HDF5 object reference>], dtype=object)
 array([<HDF5 object reference>], dtype=object)]
data_type: SPECTRAL_IMAGE
main_data_name: new
modality: generic
nsid_version: 0.0.1
quantity: Current
source: generic
units: nA
