# Write/read SIDpy Dataset via pyNSID

*Author: Maxim Ziatdinov*

*Date: September 2020*

A fast introduction into how to write SIDpy datasets to NSID formatted HDF5 files

Start with standard imports:

In [None]:
# Ensure python 3 compatibility:
from __future__ import (absolute_import, division, print_function,
                        unicode_literals)

import sys
import warnings

import h5py
import matplotlib.pylab as plt
import numpy as np

# we will also need a sidpy package
try:
    import sidpy as sid
except ModuleNotFoundError:
    !pip3 install sidpy
    import sidpy as sid

sys.path.append('../')
import pyNSID as nsid

warnings.filterwarnings("ignore", module="numpy.core.fromnumeric")
warnings.filterwarnings("ignore", module="pyNSID.io.nsi_reader")

# Download test h5 file
!wget -qq -O 'test.hf5' 'https://github.com/ziatdinovmax/pyNSID/blob/master/notebooks/00_basic_usage/test.hf5?raw=true'

## Creating sidpy.Dataset object(s)

Let's create a simple sidpy Dataset from a numpy array:

In [None]:
dataset = sid.Dataset.from_array(np.random.random([4, 5, 10]), name='new')
dataset

Let's also define the dataset attributes...

In [None]:
dataset.data_type = 'SPECTRAL_IMAGE'
dataset.units = 'nA'
dataset.quantity = 'Current'

... and set individual dimensions. In the case of spectroscopic datasets, the first two dimensions are typically spatial units (e.g. nm) and the third one can be energy (e.g. $meV$ or $nm^{-1}$).

In [None]:
dataset.set_dimension(0, sid.Dimension(np.arange(dataset.shape[0]), 'x',
                                        units='nm', quantity='Length',
                                        dimension_type='spatial'))
dataset.set_dimension(1, sid.Dimension(np.linspace(-2, 2, num=dataset.shape[1], endpoint=True), 'y', 
                                        units='nm', quantity='Length',
                                        dimension_type='spatial'))
dataset.set_dimension(2, sid.Dimension(np.sin(np.linspace(0, 2 * np.pi, num=dataset.shape[2])), 'bias',
                                        units='mV', quantity='Voltage',
                                        dimension_type='spectral'))

In [None]:
print(dataset.dim_0)
print(dataset.dim_1)
print(dataset.dim_2)

## Writing sidpy.Dataset object(s) to HDF5 files

Load NSID-formatted h5 file:

In [None]:
hf = h5py.File("test.hf5", 'r+')
print(*hf["Measurement_000"].keys())

Let's create a new channel where we are going to save our sidpy dataset:

In [None]:
hf.create_group('Measurement_000/Channel_003')

Now let's write our sidpy dataset into the newly created channel:

In [None]:
nsid.hdf_io.write_nsid_dataset(dataset, hf['Measurement_000/Channel_003'], main_data_name="new_spectrum");

Close h5 file:

In [None]:
hf.close()

## Reading sidpy.Dataset object(s)

Load back the file:

In [None]:
hf = h5py.File("test.hf5", 'r+')
print(*hf["Measurement_000"].keys())

Find our dataset:

In [None]:
dataset_hdf5  =  nsid.io.hdf_utils.find_dataset(hf,'new_spectrum')[0]
dataset_hdf5

Read the dataset stored in HDF5 format as a sidpy object (Dataset) using NSIDReader:

In [None]:
dr = nsid.NSIDReader(dataset_hdf5)
dataset_sid = dr.read()[0]
assert isinstance(dataset_sid, sid.sid.dataset.Dataset)

View the attributes, which are stored as Python dictionary:

In [None]:
for k, v in dataset_sid.attrs.items():
    print("{}: {}".format(k, v))