# Big Data Analytics for 4D Scanning Transmission Electron Microscopy Data

Supporting material for paper published in:<br>
**Scientific Reports** -  https://www.nature.com/articles/srep26348

Notebook written by:<br>
**Emily Costa, Suhas Somnath, and Chris R. Smith**<br>
The Center for Nanophase Materials Science and The Institute for Functional Imaging for Materials <br>
Oak Ridge National Laboratory<br>
7/30/2019

This notebook is to demonstrate how to use Dask arrays while manipulating USID datasets. First, the notebook will use normal Numpy arrays then continue on to show that Dask arrays are just as easily used.

Here, we will be working with four dimensional datasets acquired using a scanning transmission electron microscope (STEM). These datsets have four dimensions - two (x, y) dimensions from the position of the electron beam and each spatial pixel contains a two dimensional (u, v) image, called a **ronchigram**, recorded by the detector. Though the ronchigrams are typically averaged to two values (bright field, dark field), retaining the raw ronchigrams enables deeper investigation of data to reveal the existence of different phases in the material and other patterns that would typically not be visible in the averaged data.

![notebook_rules.png](notebook_rules.png)

Image courtesy of Jean Bilheux from the [neutron imaging](https://github.com/neutronimaging/python_notebooks) GitHub repository.

## Configure the notebook first

In [None]:
# Ensure python 3 compatibility
from __future__ import division, print_function, absolute_import

import os

# Import necessary libraries:
import h5py
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as patches
from IPython.display import display, HTML
import ipywidgets as widgets
from sklearn.cluster import KMeans

import sys
sys.path.append('..')
sys.path.append('/Users/syz/PycharmProjects/pyUSID/')
sys.path.append('/Users/syz/PycharmProjects/pycroscopy/')
import pyUSID as usid
import pycroscopy as px

# Make Notebook take up most of page width
display(HTML(data="""
<style>
    div#notebook-container    { width: 95%; }
    div#menubar-container     { width: 65%; }
    div#maintoolbar-container { width: 99%; }
</style>
"""))

# set up notebook to show plots within the notebook
% matplotlib notebook
usid.plot_utils.use_nice_plot_params()

## Load pycroscopy compatible 4D STEM dataset

**Download data** using Globus, found here, and name it rochigram_example.h5 :

https://doi.ccs.ornl.gov/ui/doi/62

For simplicity we will use a dataset that has already been translated form its original data format into a **Univeral Spectroscopy and Imaging Data (USID)** hierarchical data format (HDF5 or H5) file. For more information regarding USID, HDF5, etc. please see the documentation on our github projects

In [None]:
# Select a file to work on:
h5_path = './rochigram_example.h5'
print('Working on:\n' + h5_path)
# Open the file
h5_file = h5py.File(h5_path, mode='r')

Look at the contents of this file:

In [None]:
usid.hdf_utils.print_tree(h5_file, main_dsets_only=True)

## Get reference to Raw measurement

In [None]:
# Select the dataset containing the raw data to start working with:
h5_main = usid.hdf_utils.find_dataset(h5_file, 'Raw_Data')[-1]

# Upgrade this object from a regular HDF5 dataset to a USIDataset:
h5_main = usid.USIDataset(h5_main)
print(h5_main)

### Operate on data in its original N-dimensional form:

In [None]:
stem_4d = h5_main.get_n_dim_form()
print(stem_4d.shape)

The last two axes represent the row and columns of the ``ronchigram`` (think image or photograph) - i.e. the diffraction pattern observed at the detector every time the Scanning Transmission Electron Microscope (STEM) shoots a beam of electrons at a given location on  the sample. 

The first two axes represent the row and column locations on the sample where the diffraction patterns were collected.

### Probem: Visualize the Ronchigram at a single location on the sample:

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/numpy_array00.py

### Problem: Visualize the Ronchigram at any few locations on the sample within the same figure:

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/numpy_array01.py

### Problem: Visualize the spatial map of the sample for a given spot / pixel in the ronchigram:

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/numpy_array02.py

### Problem: What does the ronchigram look like if averaged over all locations of the sample?

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/numpy_array03.py

### Problem: Visualize the spatial map of the sample when the ronchigrams are averaged to a single value:

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/numpy_array04.py

## Sub-divisions of detectors:
The Detector of the STEM is typically broken into multiple rigs and discs as shown below:
![TEM](./TEM_schematic.png "Simplified schematic of a TEM")

The portion of the ronchigram collected over each sub-detector is typically averaged to a single value for simplicity. 

Your goal in the next section will be to look at what the spatial maps look like for the signal collected at each detector.

## Masks
We essentially need to create and apply masks to the ronchigrams. For example, for the BF detector, we want to average over regions outside the central spot. Therefore, the mask should be 0 in the center and 1 outisde the perimeter. A convenient way to create the mask is to make a radially symmetric space, and plot an error function in it.

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/numpy_array05.py

### Read some necessary parameters:

In [None]:
h5_pos_inds = h5_main.h5_pos_inds
num_rows, num_cols = h5_main.pos_dim_sizes
h5_spec_inds = h5_main.h5_spec_inds
num_sensor_rows, num_sensor_cols = h5_main.spec_dim_sizes

In [None]:
from scipy.special import erf
# build matrices that define regularly spaced grids over the ronchigram space
(u_mat2,v_mat2) = np.meshgrid(np.arange(-num_sensor_rows//2,num_sensor_rows//2,1),
                              np.arange(-num_sensor_cols//2,num_sensor_cols//2,1));

fig, axes = plt.subplots(ncols=2, figsize=(10, 5))
axes[0].imshow(u_mat2)
axes[0].set_title('Gradient of the Columns')
axes[1].imshow(v_mat2)
axes[1].set_title('Gradient of the Rows');

### Problem: Create a distance-from-center matrix / radial distance map using the two matrices above and visualize it:
Hint - consider using ``usid.plot_utils.plot_map(axis, image_matrix, show_cbar=True)`` instead of ``axis.imshow(image_matrix)`` to get a color bar

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/numpy_array06.py

### Problem: Create the mask for the ``BF-STEM`` Detector and visualize it:

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/numpy_array07.py

### Problem: Apply the ``BF-STEM`` mask to the original dataset:
Hint: You will need to multiply the mask with the 4D dataset. See what that looks like.

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/numpy_array08.py

Recall that the ronchigram signal collected by each detector is averaged to a single value for each location of the electron  beam. Your masked dataset is still a 4D dataset that needs to be averaged to a 2D dataset
### Problem: Reduce this dataset to a 2D image and visualize
Hint: consider using the ``numpy.mean()`` function

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/numpy_array09.py

### Problem: Create the mask for the ``DF-STEM`` Detector and visualize it:

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/numpy_array10.py

The following will show how to handle n-dimensional data with Dask instead of NumPy

Import necessary modules for Dask

In [None]:
import dask.array as da

Convert the 4-dimensional dataset into a Dask array.

In [None]:
stem_4d_dask = da.from_array(stem_4d, chunks='auto') 

Visualize the Ronchigram at a single location on the sample

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/dask_array00.py

Visualize the Ronchigram at any few locations on the sample within the same figure

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/dask_array01.py

Visualize the spatial map of the sample for a given spot / pixel in the ronchigram

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/dask_array02.py

What does the ronchigram look like if averaged over all locations of the sample?

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/dask_array03.py

Visualize the spatial map of the sample when the ronchigrams are averaged to a single value:

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/dask_array04.py

***Masks***

Currently, Dask.arrays does not support item assignment. It is recommended that Numpy arrays with desired values is created before converting to Dask array.

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/dask_array05.py

Set up some parameters and load mask modules from Dask

In [None]:
dask_pos_inds = da.from_array(h5_pos_inds, chunks='auto')
dask_num_rows = da.from_array(num_rows, chunks='auto')
dask_num_cols = da.from_array(num_cols, chunks='auto')
dask_spec_inds = da.from_array(h5_spec_inds, chunks='auto')
dask_sensor_rows = da.from_array(num_sensor_rows, chunks='auto')
dask_sensor_cols = da.from_array(num_sensor_cols, chunks='auto')

import dask.array.ma as ma

Visualize gradient of columns and rows

In [None]:
from scipy.special import erf
# build matrices that define regularly spaced grids over the ronchigram space
u_mat2, v_mat2 = da.meshgrid(da.arange(-dask_sensor_rows//2,dask_sensor_rows//2,1),
                              da.arange(-dask_sensor_cols//2,dask_sensor_cols//2,1))

fig, axes = plt.subplots(ncols=2, figsize=(10, 5))
axes[0].imshow(u_mat2)
axes[0].set_title('Gradient of the Columns')
axes[1].imshow(v_mat2)
axes[1].set_title('Gradient of the Rows');

Create a distance-from-center matrix / radial distance map using the two matrices above and visualize it

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/dask_array06.py

Create the mask for the BF-STEM Detector and visualize it

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/dask_array07.py

Apply the BF-STEM mask to the original dataset

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/dask_array08.py

Reduce this dataset to a 2D image and visualize

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/dask_array09.py

Create the mask for the DF-STEM Detector and visualize it

In [None]:
# Your answer

In [None]:
# Solution
%load solutions/dask_array10.py

In [None]:
h5_file.close()