<font size = "5"> **Chapter 1: [Introduction](CH1_00-Introduction.ipynb)** </font>


<hr style="height:1px;border-top:4px solid #FF8200" />

# Open DM3 Images, Spectra, Spectrum-Images and  Image-Stacks with pyNSID 

[Download](https://raw.githubusercontent.com/gduscher/MSE672-Introduction-to-TEM//main/Introduction/CH1_04-Open_File.ipynb)
 
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](
    https://colab.research.google.com/github/gduscher/MSE672-Introduction-to-TEM/blob/main/Introduction/CH1_04-Open_File.ipynb)


part of 

<font size = "5"> **[MSE672:  Introduction to Transmission Electron Microscopy](../_MSE672_Intro_TEM.ipynb)**</font>

by Gerd Duscher, Spring 2021

Microscopy Facilities<br>
Joint Institute of Advanced Materials<br>
Materials Science & Engineering<br>
The University of Tennessee, Knoxville

Background and methods to analysis and quantification of data acquired with transmission electron microscopes.

---
Reading a dm file and translating the data in a **[pyNSID](https://pycroscopy.github.io/pyNSID/)** style hf5py file to be compatible with  the **[pycroscopy](https://pycroscopy.github.io/pycroscopy/)** package.

Because, many other packages and programs for TEM data manipulation are based on the ``hdf5`` file-formats it is relatively easy to convert back and forward between them.



## Import packages for figures and
### Check Installed Packages

In [1]:
import sys
from pkg_resources import get_distribution, DistributionNotFound

def test_package(package_name):
    """Test if package exists and returns version or -1"""
    try:
        version = get_distribution(package_name).version
    except (DistributionNotFound, ImportError) as err:
        version = '-1'
    return version

# Colab setup ------------------
if 'google.colab' in sys.modules:
    !pip install pyTEMlib -q
# pyTEMlib setup ------------------
else:
    if test_package('pyTEMlib') < '0.2021.1.9':
        print('installing pyTEMlib')
        !{sys.executable} -m pip install  --upgrade pyTEMlib -q
# ------------------------------
print('done')

done


### Load the plotting and figure packages

In [2]:
import sys
if 'google.colab' in sys.modules:
    %pylab --no-import-all inline
else:
    %pylab --no-import-all notebook

# import TEMlib from pyTEM
import pyTEMlib
import pyTEMlib.file_tools  as ft     # File input/ output library

import sidpy
import pyNSID
import h5py

# For archiving reasons it is a good idea to print the version numbers out at this point
print('pyTEM version: ',pyTEMlib.__version__)
__notebook__='CH1_04-Reading_File'
__notebook_version__='2021_01_12'

Populating the interactive namespace from numpy and matplotlib
pyTEM version:  0.2021.01.09


## Open a file 

This function opens a hfd5 file in the pyNSID style which enables you to keep track of your data anlysis.

Please see the **[Installation](CH1-Prerequisites.ipynb#TEM-Library)** notebook for installation.

We want to consolidate files into one dataset that belongs together.  For example a spectrum image dataset consists of: 
* Survey image, 
* EELS spectra 
* Z-contrast image acquired simultaneously with the spectra.


So load the top dataset first in the above example the survey image.

Please note that the plotting routine of ``matplotlib`` was introduced in **[Matplotlib and Numpy for Micrographs](CH1-Data_Representation.ipynb)** notebook.

**Use the file p1-3hr.dm3 from TEM_data directory for a practice run**

In [3]:
# ------ Input ------- #
use_file_widget = True
load_example = False
# -------------------- #

if load_example:
    drive_directory = '.'
else:
    drive_directory = ft.get_last_path()

bokeh_plot = False
if 'google.colab' in sys.modules:
    from google.colab import drive
    drive.mount("/content/drive")
    if load_example:
        drive_directory = '/content/example-data'
    else:
        drive_directory = 'drive/MyDrive/'
    bokeh_plot = True

# Open file widget and select file which will be opened in code cell below
if use_file_widget:
    file_widget = ft.FileWidget(drive_directory)
else: 
    print('File dialog will be activated in next cell')

Select(description='Select file:', layout=Layout(width='70%'), options=('.', '..', 'data_01319M9CW0G0BBMC6Y0VF…

In [6]:
try: 
    main_dataset.h5_dataset.file.close()
except:
    pass

if use_file_widget:
    main_dataset = ft.open_file(str(file_widget.file_name))
else:
    main_dataset = ft.open_file()
current_channel = main_dataset.h5_dataset.parent.parent

main_dataset.plot()

<IPython.core.display.Javascript object>

## Data Structure

The data themselves reside in a ``sidpy dataset`` which we name ``current_dataset``.

The current_dataset has additional information stored as attributes which can be accessed through their name.

In [7]:
print(main_dataset)
main_dataset

sidpy.Dataset of type IMAGE with:
 dask.array<generic, shape=(1024, 1024), dtype=float32, chunksize=(1024, 1024), chunktype=numpy.ndarray>
 data contains: intensity (counts)
 and Dimensions: 
y:  distance (nm) of size (1024,)
x:  distance (nm) of size (1024,)


Unnamed: 0,Array,Chunk
Bytes,4.19 MB,4.19 MB
Shape,"(1024, 1024)","(1024, 1024)"
Count,1 Tasks,1 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 4.19 MB 4.19 MB Shape (1024, 1024) (1024, 1024) Count 1 Tasks 1 Chunks Type float32 numpy.ndarray",1024  1024,

Unnamed: 0,Array,Chunk
Bytes,4.19 MB,4.19 MB
Shape,"(1024, 1024)","(1024, 1024)"
Count,1 Tasks,1 Chunks
Type,float32,numpy.ndarray


In [6]:
print(f'size of current dataset is {main_dataset.shape}')

size of current dataset is (2048,)


The current_dataset has additional information stored as attributes which can be accessed through their name.

In [13]:
print('title: ', main_dataset.title)
print('data type: ', main_dataset.data_type)
main_dataset.metadata
for key in current_channel:
    try:
        if key in current_channel[key]:
            print(current_channel[key][key]['original_metadata'].attrs.keys())
    except:
        pass

title:  /Measurement_000/Channel_000/SuperScan (HAADF) 45/SuperScan (HAADF) 45
data type:  DataType.IMAGE


## File Structure
The current_channel (like a directory in a file system) contains several groups.

Below I show how to access one of those groups.

In [16]:
current_dataset = main_dataset
print(current_channel.keys())
def add_data(dataset, h5_group=None):
    """Write data to hdf5 file

    Parameters
    ----------
    dataset: sidpy.Dataset
        data to write to file
    h5_group: None, sidpy.Dataset, h5py.Group, h5py.Datset, h5py.File
        identifier to which group the data are added (if None the dataset must have a valid h5_dataset)

    Returns:
    log_group: h5py.Dataset
        reference the dataset has been written to. (is also stored in h5_dataset attribute of sipy.Dataset)
    """

    if h5_group is None:
        if isinstance(dataset.h5_dataset, h5py.Dataset):
            h5_group = dataset.h5_dataset.parent.parent.parent
    if isinstance(h5_group, h5py.Dataset):
        h5_group = h5_group.parent.parent.parent
    elif isinstance(h5_group, sidpy.Dataset):
        h5_group = h5_group.h5_dataset.parent.parent.parent
    elif isinstance(h5_group, h5py.File):
        h5_group = h5_group['Measurement_000']
        
    if not isinstance(h5_group, h5py.Group):
        raise TypeError('Need a valid indentifier for a hdf5 group to store data in')

    log_group = sidpy.hdf.prov_utils.create_indexed_group(h5_group, 'Channel_')
    h5_dataset = pyNSID.hdf_io.write_nsid_dataset(dataset, log_group)
    
    if hasattr(dataset, 'meta_data'):
        if 'analysis' in dataset.meta_data:
            log_group['analysis'] = dataset.meta_data['analysis']
            
    dataset.h5_dataset = h5_dataset
    return h5_dataset

print(current_channel)
current_dataset.metadata= {'a': 'nix', 'b': 'gara'}
#new_data = pyNSID.hdf_io.write_results(current_channel.parent, dataset=current_dataset)
new_data = add_data(current_dataset, h5_group=None)

print(current_dataset.h5_dataset)
print(new_data)

<KeysViewHDF5 ['SuperScan (HAADF) 45']>
<HDF5 group "/Measurement_000/Channel_000" (1 members)>
<HDF5 dataset "SuperScan (HAADF) 45": shape (1024, 1024), type "<f4">
<HDF5 dataset "SuperScan (HAADF) 45": shape (1024, 1024), type "<f4">


  warn('Casting attribute value: {} of type: {} to str'
  warn('Casting attribute value: {} of type: {} to str'
  warn('validate_h5_dimension may be removed in a future version',


In [19]:
# print(dict(new_data['a1_ 410s']['metadata'].attrs))

An important attribute in ``current_dataset`` is the ``original_metadata`` group, where all the original metadata of your file reside in the ``attributes``. This is usually a long list for ``dm3`` files.

In [20]:
current_dataset.h5_dataset.parent['original_metadata'].keys()

<KeysViewHDF5 []>

In [21]:
for key,value in current_dataset.h5_dataset.parent['original_metadata'].attrs.items():
    print(key, value)
print(current_dataset.h5_dataset)    

category persistent
collection_dimension_count 0
created 2020-03-27T14:40:59.392578
data_dtype float32
data_modified 2020-03-27T14:40:59.408166
data_shape [1024 1024]
datum_dimension_count 2
dim-offset-0 -64.0
dim-offset-1 -64.0
dim-scale-0 0.125
dim-scale-1 0.125
dim-units-0 nm
dim-units-1 nm
intensity_calibration-offset 0.0
intensity_calibration-scale 1.0
intensity_calibration-units 
is_sequence False
metadata-hardware_source-ac_line_sync 0
metadata-hardware_source-autostem-ImageScanned:C1 ConstW 0.374
metadata-hardware_source-autostem-ImageScanned:C10 6.69832e-09
metadata-hardware_source-autostem-ImageScanned:C12.a -7.88906e-09
metadata-hardware_source-autostem-ImageScanned:C12.b 2.49269e-08
metadata-hardware_source-autostem-ImageScanned:C21.a -9.11232e-08
metadata-hardware_source-autostem-ImageScanned:C21.b -8.08957e-08
metadata-hardware_source-autostem-ImageScanned:C23.a -2.48682e-08
metadata-hardware_source-autostem-ImageScanned:C23.b -7.40319e-08
metadata-hardware_source-autoste

In [22]:
print(current_channel.keys())

<KeysViewHDF5 ['SuperScan (HAADF) 45']>


## Adding Data

To add another dataset that belongs to this measurement we will use the **h5_add_channel** from  **filetools** in the  pyTEMlib package.

Here is how we add a channel there.

We can also add a new measurement group (add_measurement in pyTEMlib) for similar datasets.

This is equivalent to making a new directory in a file structure on your computer.

In [38]:
import pyNSID

def add_dataset(dataset, h5_group=None):
    """Write data to hdf5 file

    Parameters
    ----------
    dataset: sidpy.Dataset
        data to write to file
    h5_group: None, sidpy.Dataset, h5py.Group, h5py.Datset, h5py.File
        identifier to which group the data are added (if None the dataset must have a valid h5_dataset)

    Returns:
    log_group: h5py.Dataset
        reference the dataset has been written to. (is also stored in h5_dataset attribute of sipy.Dataset)
    """

    if h5_group is None:
        if isinstance(dataset.h5_dataset, h5py.Dataset):
            h5_group = dataset.h5_dataset.parent.parent.parent
    if isinstance(h5_group, h5py.Dataset):
        h5_group = h5_group.parent.parent.parent
    elif isinstance(h5_group, sidpy.Dataset):
        h5_group = h5_group.h5_dataset.parent.parent.parent
    elif isinstance(h5_group, h5py.File):
        h5_group = h5_group['Measurement_000']

    if not isinstance(h5_group, h5py.Group):
        raise TypeError('Need a valid indentifier for a hdf5 group to store data in')

    log_group = sidpy.hdf.prov_utils.create_indexed_group(h5_group, 'Channel_')
    h5_dataset = pyNSID.hdf_io.write_nsid_dataset(dataset, log_group)

    if hasattr(dataset, 'meta_data'):
        if 'analysis' in dataset.meta_data:
            log_group['analysis'] = dataset.meta_data['analysis']

    dataset.h5_dataset = h5_dataset
    return h5_dataset


We use above functions to add the content of a (random) data-file to the current file.

This is important if you for example want to add a Z-contrast or survey-image to an spectrum image.

Therefore, these functions enable you to collect the data from different files that belong together.


In [42]:
#new_channel = h5_add_channel(current_channel)
add_dataset(current_dataset, current_channel.parent)

ft.h5_tree(current_channel)  #wraps usid.hdf_utils.print_tree(h5_file)

/Measurement_000/Channel_000
├ Channel_000
  -----------
├ Channel_001
  -----------
├ Channel_002
  -----------
  ├ SuperScan (HAADF) 45
    --------------------
    ├ SuperScan (HAADF) 45
    ├ __dict__
      --------
    ├ _axes
      -----
    ├ _metadata
      ---------
    ├ _original_metadata
      ------------------
    ├ metadata
      --------
    ├ original_metadata
      -----------------
    ├ x
    ├ y
├ Channel_003
  -----------
  ├ SuperScan (HAADF) 45
    --------------------
    ├ SuperScan (HAADF) 45
    ├ __dict__
      --------
    ├ _axes
      -----
    ├ _metadata
      ---------
    ├ _original_metadata
      ------------------
    ├ metadata
      --------
    ├ original_metadata
      -----------------
    ├ x
    ├ y
├ SuperScan (HAADF) 45
  --------------------
  ├ SuperScan (HAADF) 45
  ├ __dict__
    --------
  ├ _axes
    -----
  ├ _original_metadata
    ------------------
  ├ original_metadata
    -----------------
  ├ x
  ├ y


  warn('Casting attribute value: {} of type: {} to str'
  warn('Casting attribute value: {} of type: {} to str'
  warn('validate_h5_dimension may be removed in a future version',


## Adding additional information

Similarly we can add a whole new measurement group or a structure group.

This function will be contained in the KinsCat package of pyTEMlib.

If you loaded the example image, with graphite and ZnO both are viewed in the [1,1,1] zone axis.


In [None]:
import pyTEMlib.KinsCat as ks         # Kinematic sCattering Library
                             # with Atomic form factors from Kirklands book

def h5_add_crystal_structure(h5_file, crystal_tags):
    structure_group = pyNSID.io.hdf_utils.create_indexed_group(h5_file,'Structure')
    
    structure_group['unit_cell'] = crystal_tags['unit_cell' \
                                                '' \
                                                '' \
                                                '']
    structure_group['relative_positions'] = crystal_tags['base']
    structure_group['title'] = str(crystal_tags['crystal_name'])
    structure_group['_'+crystal_tags['crystal_name']] = str(crystal_tags['crystal_name'])
    structure_group['elements'] = np.array(crystal_tags['elements'],dtype='S')
    if 'zone_axis' in structure_group:
        structure_group['zone_axis'] = np.array(crystal_tags['zone_axis'], dtype=float)
    else:
        structure_group['zone_axis'] = np.array([1.,1.,1.], dtype=float)
        
    h5_file.flush()
    return structure_group

                                                                                 
crystal_tags = ks.structure_by_name('Graphite')
h5_add_crystal_structure(h5_file, crystal_tags)
                                                                                
crystal_tags = ks.structure_by_name('ZnO')
ft.h5_add_crystal_structure(h5_file, crystal_tags)

sidpy.hdf_utils.print_tree(h5_file)


## Keeping Track of Analysis and Results
A notebook is notorious for getting confusing, especially if one uses different notebooks for different task, but store them in the same file.

If you like a result of your calculation, log it.
|
The function will write your calculation to the pyUSID style file and attaches a time stamp.

The two functions below are part of  file_tools of pyTEMlib.

In [None]:
info_dictionary = {'analysis': 'Nothing', 'name': 'Nothing'}

log_group = ft.log_results(current_dataset, info_dictionary)

usid.hdf_utils.print_tree(h5_file)


## An example for a log
We log the Fourier Transform of the image we loaded

First we perform the calculation

In [None]:
## Access the data of the loaded image
data = current_dataset

## The data log goes in the dictionary out_tags
out_tags = {}
## data tag contains the newly calculated result
out_tags['data'] = np.fft.fftshift(np.fft.fft2(data))

## meta data (can be anything, but good practice is to be compatible with pyUSID data set)
out_tags['analysis']= 'Fourier_Transform'

out_tags['spatial_origin_x'] = data.shape[0]/2
out_tags['spatial_origin_y'] = data.shape[1]/2

for dim in current_dataset.dims:
    if dim.label == 'x': scale_x = dim[0][1]-dim[0][0]
    if dim.label == 'y': scale_y = dim[0][1]-dim[0][0]     
        
out_tags['spatial_scale_x'] = 1.0/scale_x/data.shape[0]
out_tags['spatial_scale_y'] = 1.0/scale_y/data.shape[1]
out_tags['spatial_size_x'] = data.shape[0]
out_tags['spatial_size_y'] = data.shape[1]
out_tags['spatial_units'] = '1/nm'


FOV_x = out_tags['spatial_origin_x']* scale_x
FOV_y = out_tags['spatial_origin_y']* scale_y
out_tags['image_extent'] = [-FOV_x,FOV_x,FOV_y, -FOV_y]
fig = plt.figure()
plt.imshow(np.log2(1+np.abs(out_tags['data'])),origin='upper', extent = out_tags['image_extent'])
plt.xlabel('reciprocal distance ['+ out_tags['spatial_units']+']');


Now that we like this we log it.

Please note that just saving the fourier transform would not be good as we also need the scale and such.

In [None]:
import importlib
importlib.reload(ft)


out_tags['name'] = 'fft'
out_tags['units'] = '1/nm'
out_tags['data_type'] = 'image'

log_group = ft.log_results(current_dataset, out_tags)
log_dataset = log_group['nDim_Data']
ft.h5_tree(h5_file)
fig = plt.figure()
plt.title(log_group['analysis'][()])
plt.imshow(np.log2(1+np.abs(log_dataset)),origin='upper', extent = log_group['image_extent'][()])
plt.xlabel('reciprocal distance ['+ log_group['units'][()]+']');


Please close the file

In [None]:
print(h5_file.filename)
h5_file.close()


## Open h5_file
Open the h5_file that we just created

In [None]:
h5_file = ft.h5_open_file()

current_channel = h5_file['Measurement_000/Channel_000']
current_dataset = current_channel['nDim_Data']

ft.h5_plot(current_dataset)

In [None]:
plt.figure()
plt.imshow(np.array(current_dataset));

### Short check if we got the data right
we print the tree and we plot the data

In [None]:
# See if a tree has been created within the hdf5 file:
ft.h5_tree(h5_file)
image_tags = dict(h5_file['Measurement_000/Channel_000'].attrs)
for key in image_tags:
    if 'original' not in key:
        #print(key,': ',image_tags[key])
        pass
current_channel = h5_file['Measurement_000/Channel_000']



### Add more data to this set

Often more than one data set belong together.
For instance a spectrum image has a survey image and a Z-contrast image recorded with the survey image.

Here we jsut load another image for example *p1-3-hr3b.dm3*

In [None]:
current_channel = ft.h5_add_data(current_channel)
    
measurement_group = current_channel.parent
    
for key in list(measurement_group.keys()):
    if 'title' in measurement_group[key].keys(): 
        print(key,': ',measurement_group[key]['title'][()])
    else:
        print(key,': ')   

Let's see what you selected


In [None]:
current_dataset = current_channel['nDim_Data']

ft.h5_plot(current_dataset)

## If we are done, we close the pyUID style file.

This is necessary to make the file ready to be opened by another notebook or program.

In [None]:
h5_file.close()

## Navigation

<font size = "4"> 
    
**Back: [Matplotlib and Numpy for Micrographs](CH1_03-Data_Representation.ipynb)**<br>
**Next: [Diffraction](CH2_00-Diffraction.ipynb)**<br>
**Up Chapter 1: [Introduction](CH1_00-Introduction.ipynb)**<br>
**List of Content: [Front](../_MSE672_Intro_TEM.ipynb)**
</font>