
<font size = "5"> **AI in STEM - workshop 2020; Day03** </font>

<hr style="height:1px;border-top:4px solid #FF8200" />

# Write/read SIDpy Dataset via pyNSID

*Author: Maxim Ziatdinov/Gerd Duscher*

*Date: September 2020*

A fast introduction into how to write SIDpy datasets to NSID formatted HDF5 files

In [None]:
!pip install pyNSID
!pip install pyTEMlib



Start with standard imports:

In [None]:
# Ensure python 3 compatibility:
from __future__ import (absolute_import, division, print_function,
                        unicode_literals)

import sys
import warnings

import h5py
import matplotlib.pylab as plt
import numpy as np

import sidpy as sid
import pyNSID as nsid
from pyTEMlib.nsi_reader import NSIDReader

warnings.filterwarnings("ignore", module="numpy.core.fromnumeric")
warnings.filterwarnings("ignore", module="pyNSID.io.nsi_reader")


## Creating sidpy.Dataset object(s)

Let's create a simple sidpy Dataset from a numpy array:

In [None]:
dataset = sid.Dataset.from_array(np.random.random([4, 5, 10]), name='new')
dataset

Unnamed: 0,Array,Chunk
Bytes,1.60 kB,1.60 kB
Shape,"(4, 5, 10)","(4, 5, 10)"
Count,1 Tasks,1 Chunks
Type,float64,numpy.ndarray
"Array Chunk Bytes 1.60 kB 1.60 kB Shape (4, 5, 10) (4, 5, 10) Count 1 Tasks 1 Chunks Type float64 numpy.ndarray",10  5  4,

Unnamed: 0,Array,Chunk
Bytes,1.60 kB,1.60 kB
Shape,"(4, 5, 10)","(4, 5, 10)"
Count,1 Tasks,1 Chunks
Type,float64,numpy.ndarray


Let's also define the dataset attributes...

In [None]:
dataset.data_type = 'SPECTRAL_IMAGE'
dataset.units = 'nA'
dataset.quantity = 'Current'

dataset.metadata={'this': 'is just a random dataset'}

... and set individual dimensions. In the case of spectroscopic datasets, the first two dimensions are typically spatial units (e.g. nm) and the third one can be energy (e.g. $meV$ or $nm^{-1}$).

In [None]:
dataset.set_dimension(0, sid.Dimension(np.arange(dataset.shape[0]), 'x',
                                        units='nm', quantity='Length',
                                        dimension_type='spatial'))
dataset.set_dimension(1, sid.Dimension(np.linspace(-2, 2, num=dataset.shape[1], endpoint=True), 'y', 
                                        units='nm', quantity='Length',
                                        dimension_type='spatial'))
dataset.set_dimension(2, sid.Dimension(np.sin(np.linspace(0, 2 * np.pi, num=dataset.shape[2])), 'bias',
                                        units='mV', quantity='Voltage',
                                        dimension_type='spectral'))

In [None]:
print(dataset.dim_0)
print(dataset.dim_1)
print(dataset.dim_2)
print(dataset.bias.dimension_type)

x:  Length (nm) of size (4,)
y:  Length (nm) of size (5,)
bias:  Voltage (mV) of size (10,)
DimensionTypes.SPECTRAL


## Writing sidpy.Dataset object(s) to HDF5 files

Load NSID-formatted h5 file:

In [None]:
hf = h5py.File("test.hf5", 'a')

KeyError: ignored

Let's create a new channel where we are going to save our sidpy dataset:

In [None]:
del hf['Measurement_000/Channel_000']
hf.create_group('Measurement_000/Channel_000')

<HDF5 group "/Measurement_000/Channel_000" (0 members)>

Now let's write our sidpy dataset into the newly created channel:

In [None]:
dataset.axes = dataset._axes
dataset.attrs = {}
nsid.hdf_io.write_nsid_dataset(dataset, hf['Measurement_000/Channel_000'], main_data_name="new_spectrum");

<HDF5 group "/Measurement_000/Channel_000" (0 members)> new_spectrum


Close h5 file:

In [None]:
hf.close()

## Reading sidpy.Dataset object(s)

Load back the file:

In [None]:
hf = h5py.File("test.hf5", 'r+')
print(*hf["Measurement_000"].keys())

Channel_000


Find our dataset:

In [None]:
dataset_hdf5  =  nsid.io.hdf_utils.find_dataset(hf,'new_spectrum')[0]
dataset_hdf5

<HDF5 dataset "new_spectrum": shape (4, 5, 10), type "<f8">

Read the dataset stored in HDF5 format as a sidpy object (Dataset) using NSIDReader:

In [None]:
dr = nsid.NSIDReader(dataset_hdf5)
dataset_sid = dr.read()[0]
assert isinstance(dataset_sid, sid.sid.dataset.Dataset)

TypeError: ignored

View the attributes, which are stored as Python dictionary:

In [None]:
for k, v in dataset_sid.attrs.items():
    print("{}: {}".format(k, v))

DIMENSION_LABELS: ['x' 'y' 'bias']
DIMENSION_LIST: [array([<HDF5 object reference>], dtype=object)
 array([<HDF5 object reference>], dtype=object)
 array([<HDF5 object reference>], dtype=object)]
data_type: SPECTRAL_IMAGE
main_data_name: new
modality: generic
nsid_version: 0.0.1
quantity: Current
source: generic
units: nA


In [None]:
hf.close()

In [None]:
hdf5_file = h5py.File("test.hf5", 'r+')
print(*hdf5_file["Measurement_000"].keys())


nsid_reader = NSIDReader(hdf5_file['Measurement_000/Channel_000'])
sidpy_dataset = nsid_reader.read()
sidpy_dataset

Channel_000


[sidpy.Dataset of type SPECTRAL_IMAGE with:
  dask.array<generic, shape=(4, 5, 10), dtype=float64, chunksize=(4, 5, 10), chunktype=numpy.ndarray>
  data contains: Current (nA)
  and Dimensions: 
   x:  Length (nm) of size (4,)
   y:  Length (nm) of size (5,)
   bias:  Voltage (mV) of size (10,)
  with metadata: ['DIMENSION_LABELS', 'DIMENSION_LIST', 'data_type', 'main_data_name', 'modality', 'nsid_version', 'quantity', 'source', 'units']]

In [None]:
hdf5_file.close()

In [None]:
print(hdf5_file)

<Closed HDF5 file>
