<font size = "5"> **Chapter 1: [Introduction](CH1_00-Introduction.ipynb)** </font>


<hr style="height:1px;border-top:4px solid #FF8200" />

# Open DM3 Images, Spectra, Spectrum-Images and  Image-Stacks with pyNSID 

[Download](https://raw.githubusercontent.com/gduscher/MSE672-Introduction-to-TEM//main/Introduction/CH1_04-Open_File.ipynb)
 



part of 

<font size = "5"> **[MSE672:  Introduction to Transmission Electron Microscopy](../_MSE672_Intro_TEM.ipynb)**</font>

by Gerd Duscher, Spring 2023

Microscopy Facilities<br>
Institute of Advanced Materials & Manufacturing<br>
Materials Science & Engineering<br>
The University of Tennessee, Knoxville

Background and methods to analysis and quantification of data acquired with transmission electron microscopes.

---
Reading a dm file and translating the data in a **[pyNSID](https://pycroscopy.github.io/pyNSID/)** style hf5py file to be compatible with  the **[pycroscopy](https://pycroscopy.github.io/pycroscopy/)** package.

Because, many other packages and programs for TEM data manipulation are based on the ``hdf5`` file-formats it is relatively easy to convert back and forward between them.



## Import packages for figures and
### Check Installed Packages

In [2]:
from pkg_resources import get_distribution, DistributionNotFound

def test_package(package_name):
    """Test if package exists and returns version or -1"""
    try:
        version = get_distribution(package_name).version
    except (DistributionNotFound, ImportError):
        version = '-1'
    return version

if test_package('pyTEMlib') < '0.2023.1.0':
    print('installing pyTEMlib')
    !{sys.executable} -m pip install  --upgrade pyTEMlib -q
# ------------------------------
print('done')

done


### Load the plotting and figure packages

In [1]:
%matplotlib notebook
import matplotlib.pylab as plt
import numpy as np
%gui qt

import pyTEMlib
import pyTEMlib.file_tools  as ft     # File input/ output library

import sidpy
import pyNSID
import h5py

# For archiving reasons it is a good idea to print the version numbers out at this point
print('pyTEM version: ',pyTEMlib.__version__)
__notebook__='CH1_04-Reading_File'
__notebook_version__='2021_12_14'

Symmetry functions of spglib enabled
pyTEM version:  0.2023.1.0


## Open a file 

This function opens a hfd5 file in the pyNSID style which enables you to keep track of your data analysis.

Please see the **[Installation](CH1_02-Prerequisites.ipynb#TEM-Library)** notebook for installation.

We want to consolidate files into one dataset that belongs together.  For example a spectrum image dataset consists of: 
* Survey image, 
* EELS spectra 
* Z-contrast image acquired simultaneously with the spectra.


So load the top dataset first in the above example the survey image.

Please note that the plotting routine of ``matplotlib`` was introduced in **[Matplotlib and Numpy for Micrographs](CH1_03-Data_Representation.ipynb)** notebook.

**Use the file p1-3hr.dm3 from TEM_data directory for a practice run**

In [3]:
# ------ Input ------- #
load_example = False
# -------------------- #

# Open file widget and select file which will be opened in code cell below
if not load_example:
    drive_directory = ft.get_last_path()
    file_widget = ft.FileWidget(drive_directory)

Select(description='Select file:', layout=Layout(width='70%'), options=('.',), rows=10, value='.')

In [4]:
try:
    main_dataset.h5_dataset.file.close()
except:
    pass

if load_example:
    file_name = '../example_data/p1-3-hr3.dm3'
else:
    file_name = file_widget.file_name

datasets = ft.open_file(file_name)
main_dataset = datasets['Channel_000']

view = main_dataset.plot()

DM
DocumentObjectList
0
AnnotationGroupList
0
Font
ObjectTags
ImageDisplayInfo
DimensionLabels
MainSliceId
ObjectTags
DocumentTags
Image Behavior
UnscaledTransform
ZoomAndMoveTransform
ImageList
0
ImageData
Calibrations
Brightness
Dimension
0
1
Dimensions
ImageTags
UniqueID
1
ImageData
Calibrations
Brightness
Dimension
0
1
Dimensions
ImageTags
Acquisition
Device
CCD
Configuration
Transpose
Frame
Area
Transform
Transform List
0
Transpose
CCD
Intensity
Range
Transform
Transform List
0
1
Reference Images
Dark
Sequence
Parameters
Base Detector
Detector
Environment
High Level
Shutter
Transform
Objects
0
1
2
3
DataBar
Custom elements
Microscope Info
Items
0
1
2
UniqueID
ImageSourceList
0
Id
MinVersionList
0
Page Behavior
PageTransform
SentinelList
Thumbnails
0


<IPython.core.display.Javascript object>

## Data Structure

The data themselves reside in a ``sidpy dataset`` which we name ``current_dataset``.

The current_dataset has additional information stored as attributes which can be accessed through their name.

In [5]:
print(main_dataset)
main_dataset

sidpy.Dataset of type IMAGE with:
 dask.array<array, shape=(2048, 2048), dtype=float64, chunksize=(2048, 2048), chunktype=numpy.ndarray>
 data contains: intensity (counts)
 and Dimensions: 
x:  distance (counts) of size (2048,)
y:  distance (counts) of size (2048,)


Unnamed: 0,Array,Chunk
Bytes,32.00 MiB,32.00 MiB
Shape,"(2048, 2048)","(2048, 2048)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray
"Array Chunk Bytes 32.00 MiB 32.00 MiB Shape (2048, 2048) (2048, 2048) Dask graph 1 chunks in 1 graph layer Data type float64 numpy.ndarray",2048  2048,

Unnamed: 0,Array,Chunk
Bytes,32.00 MiB,32.00 MiB
Shape,"(2048, 2048)","(2048, 2048)"
Dask graph,1 chunks in 1 graph layer,1 chunks in 1 graph layer
Data type,float64 numpy.ndarray,float64 numpy.ndarray


In [6]:
print(f'size of current dataset is {main_dataset.shape}')

size of current dataset is (2048, 2048)


The current_dataset has additional information stored as attributes which can be accessed through their name.

There are two dictionaries within that attributes:
- **metadata**
- **original_metadata**

which contain additional information about the data

In [7]:
print('title: ', main_dataset.title)
print('data type: ', main_dataset.data_type)

for key in datasets:
    print(key)
    print(datasets[key].original_metadata.keys())
    
main_dataset.metadata  

title:  p1_3_hr3
data type:  DataType.IMAGE
Channel_000
dict_keys(['ApplicationBounds', 'HasWindowPosition', 'InImageMode', 'NextDocumentObjectID', 'WindowPosition', 'DM', 'DocumentObjectList', 'DocumentTags', 'Image Behavior', 'ImageList', 'ImageSourceList', 'MinVersionList', 'Page Behavior', 'SentinelList', 'Thumbnails'])


{}

## Data Structure
The datasets variable is a dictionary (like a directory in a file system) which containes contains datasets.

Below I show how to access one of those datasets with a pull down menu.

In [8]:
chooser = ft.ChooseDataset(datasets)

Dropdown(description='select dataset:', options=('Channel_000: p1_3_hr3',), value='Channel_000: p1_3_hr3')

In [9]:
current_dataset = chooser.dataset
view = current_dataset.plot()

<IPython.core.display.Javascript object>

An important attribute in ``current_dataset`` is the ``original_metadata`` group, where all the original metadata of your file reside in the ``attributes``. This is usually a long list for ``dm3`` files.

In [10]:
current_dataset.original_metadata.keys()

dict_keys(['ApplicationBounds', 'HasWindowPosition', 'InImageMode', 'NextDocumentObjectID', 'WindowPosition', 'DM', 'DocumentObjectList', 'DocumentTags', 'Image Behavior', 'ImageList', 'ImageSourceList', 'MinVersionList', 'Page Behavior', 'SentinelList', 'Thumbnails'])

The original_metadata attribute has all information stored from the orginal file. 
> No information will get lost

In [11]:
for key,value in current_dataset.original_metadata.items():
    print(key, value)
print(current_dataset.h5_dataset)    

ApplicationBounds [   0    0 1465 2236]
HasWindowPosition 1
InImageMode 1
NextDocumentObjectID 10
WindowPosition [  30  801 1457 2228]
DM {'chosen_image': 1, 'dm_version': 3, 'file_size': 17382688, 'full_file_name': '../example_data/p1-3-hr3.dm3'}
DocumentObjectList {'0': {'AnnotationType': 20, 'BackgroundColor': array([-1, -1, -1]), 'BackgroundMode': 2, 'FillMode': 1, 'ForegroundColor': array([    -1,      0, -32640]), 'HasBackground': 0, 'ImageDisplayType': 1, 'ImageSource': 0, 'IsMoveable': 1, 'IsResizable': 1, 'IsSelectable': 1, 'IsTranslatable': 1, 'IsVisible': 1, 'Rectangle': array([   0.,    0., 1427., 1427.]), 'UniqueID': 8, 'AnnotationGroupList': {'0': {'AnnotationType': 31, 'BackgroundColor': array([0, 0, 0]), 'BackgroundMode': 1, 'FillMode': 1, 'ForegroundColor': array([-1, -1, -1]), 'HasBackground': 1, 'IsMoveable': 1, 'IsResizable': 1, 'IsSelectable': 1, 'IsTranslatable': 1, 'IsVisible': 1, 'Rectangle': array([1768.,  128., 1920., 1088.]), 'TextOffsetH': 1.0, 'TextOffsetV'

Any python object will provide a help.

In [12]:
help(current_dataset)

Help on Dataset in module sidpy.sid.dataset object:

class Dataset(dask.array.core.Array)
 |  Dataset(*args, **kwargs)
 |  
 |  ..autoclass::Dataset
 |  
 |  To instantiate from an existing array-like object,
 |  use :func:`Dataset.from_array` - requires numpy array, list or tuple
 |  
 |  This dask array is extended to have the following attributes:
 |  -data_type: DataTypes ('image', 'image_stack',  spectral_image', ...
 |  -units: str
 |  -quantity: str what kind of data ('intensity', 'height', ..)
 |  -title: title of the data set
 |  -modality: character of data such as 'STM, 'AFM', 'TEM', 'SEM', 'DFT', 'simulation', ..)
 |  -source: origin of data such as acquisition instrument ('Nion US100', 'VASP', ..)
 |  -_axes: dictionary of Dimensions one for each data dimension
 |                  (the axes are dimension datasets with name, label, units,
 |                  and 'dimension_type' attributes).
 |  
 |  -metadata: dictionary of additional metadata
 |  -original_metadata: dicti

All attributes of a python object can be viewed with the * dir* command. 
> As above: too much information for normal use, but it is there if needed.

In [13]:
dir(current_dataset)

['A',
 'T',
 '_Array__chunks',
 '_Array__name',
 '_Dataset__rearrange_axes',
 '_Dataset__reduce_dimensions',
 '_Dataset__validate_dim',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_function__',
 '__array_priority__',
 '__array_ufunc__',
 '__await__',
 '__bool__',
 '__class__',
 '__complex__',
 '__dask_graph__',
 '__dask_keys__',
 '__dask_layers__',
 '__dask_optimize__',
 '__dask_postcompute__',
 '__dask_postpersist__',
 '__dask_scheduler__',
 '__dask_tokenize__',
 '__deepcopy__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__div__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getstate__',
 '__gt__',
 '__hash__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__iter__',
 '__le__',
 '__len__',
 '__long__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__module__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__nonzero__',
 '__or__',
 '__pos

## Adding Data

To add another dataset that belongs to this measurement we will use the **h5_add_channel** from  **file_tools** in the  pyTEMlib package.

Here is how we add a channel there.

We can also add a new measurement group (add_measurement in pyTEMlib) for similar datasets.

This is equivalent to making a new directory in a file structure on your computer.

In [14]:
datasets['Copied_of_Channel_000'] = current_dataset.copy()

We use above functions to add the content of a (random) data-file to the current file.

This is important if you for example want to add a Z-contrast or survey-image to a spectrum image.

Therefore, these functions enable you to collect the data from different files that belong together.


In [15]:
datasets.keys()

dict_keys(['Channel_000', 'Copied_of_Channel_000'])

## Adding additional information

Similarly, we can add a whole new measurement group or a structure group.

This function will be contained in the KinsCat package of pyTEMlib.

If you loaded the example image, with graphite and ZnO both are viewed in the [1,1,1] zone axis.


In [16]:
import pyTEMlib.kinematic_scattering as ks         # kinematic scattering Library
                             # with Atomic form factors from Kirkland's book
import ase

                                                                                 
graphite = ks.structure_by_name('Graphite')
print(graphite)

Using kinematic_scattering library version {_version_ }  by G.Duscher
Atoms(symbols='C4', pbc=False, cell=[[2.46772414, 0.0, 0.0], [-1.2338620699999996, 2.1371117947721068, 0.0], [0.0, 0.0, 6.711]])


In [17]:
current_dataset.structures['Crystal_000'] = graphite
                                                            
zinc_oxide = ks.structure_by_name('ZnO')
current_dataset.structures['ZnO'] =zinc_oxide               


## Keeping Track of Analysis and Results
A notebook is notorious for getting confusing, especially if one uses different notebooks for different task, but store them in the same file.

If you like a result of your calculation, log it.

Use the datasets dictionary to add a analysed and/or modified dataset. Make sure the metadata contain all the necessary information, so that you will know later what you did.

The convention in this class will be to call the dataset **Log_000**.


In [18]:
new_dataset = current_dataset.T
new_dataset.metadata = {'analysis': 'Nothing', 'name': 'Nothing'}
datasets['Log_000'] = new_dataset

## An example for a log
We log the Fourier Transform of the image we loaded

First we perform the calculation

In [19]:
fft_image = current_dataset.fft().abs()
fft_image = np.log(60+fft_image)

view = fft_image.plot()

<IPython.core.display.Javascript object>

Now that we like this we log it.

Please note that just saving the fourier transform would not be good as we also need the scale and such.

In [20]:
fft_image.title = 'FFT Gamma corrected'
fft_image.metadata = {'analysis': 'fft'}
datasets['Log_001'] = fft_image

view = fft_image.plot()

<IPython.core.display.Javascript object>

We added quite a few datasets to our dictionary. 

Let's have a look


In [21]:
chooser = ft.ChooseDataset(datasets)

Dropdown(description='select dataset:', options=('Channel_000: p1_3_hr3', 'Copied_of_Channel_000: p1_3_hr3', '…

In [22]:
view = chooser.dataset.plot()

<IPython.core.display.Javascript object>

## Save Datasets to  hf5_file
Write all datasets to one h5_file, which we then close immediatedly

In [23]:
h5_group = ft.save_dataset(datasets, filename='./nix.hf5')


Cannot overwrite file. Using:  nix-1.hf5


  warn('validate_h5_dimension may be removed in a future version',
  warn('validate_h5_dimension may be removed in a future version',
  warn('validate_h5_dimension may be removed in a future version',
  warn('validate_h5_dimension may be removed in a future version',


Close the file

In [24]:
h5_group.file.close()

## Open h5_file
Open the h5_file that we just created

In [27]:
datasets2= ft.open_file(filename='./nix-1.hf5')

chooser = ft.ChooseDataset(datasets2)

Crystal_000
info
structure
ZnO
info
structure
DM
DocumentObjectList
0
AnnotationGroupList
0
Font
ObjectTags
ImageDisplayInfo
DimensionLabels
MainSliceId
ObjectTags
DocumentTags
Image Behavior
UnscaledTransform
ZoomAndMoveTransform
ImageList
0
ImageData
Calibrations
Brightness
Dimension
0
1
Dimensions
ImageTags
UniqueID
1
ImageData
Calibrations
Brightness
Dimension
0
1
Dimensions
ImageTags
Acquisition
Device
CCD
Configuration
Transpose
Frame
Area
Transform
Transform List
0
Transpose
CCD
Intensity
Range
Transform
Transform List
0
1
Reference Images
Dark
Sequence
Parameters
Base Detector
Detector
Environment
High Level
Shutter
Transform
Objects
0
1
2
3
DataBar
Custom elements
Microscope Info
Items
0
1
2
UniqueID
ImageSourceList
0
Id
MinVersionList
0
Page Behavior
PageTransform
SentinelList
Thumbnails
0


Dropdown(description='select dataset:', options=('Channel_000: p1_3_hr3', 'Copied_of_Channel_000: p1_3_hr3', '…

### Short check if we got the data right
we print the tree and we plot the data

In [28]:
view = chooser.dataset.plot()

<IPython.core.display.Javascript object>


## Navigation
- <font size = "3">  **Back  [Matplotlib and Numpy for Micrographs](CH1_03-Data_Representation.ipynb)** </font>
- <font size = "3">  **Next: [Overview](CH1_06-Overview.ipynb)** </font>
- <font size = "3">  **Chapter 1: [Introduction](CH1_00-Introduction.ipynb)** </font>
- <font size = "3">  **List of Content: [Front](../_MSE672_Intro_TEM.ipynb)** </font>
